ryujinx-final

Archived

Author	SHA1	Message	Date
gdkchan	fb65f392d1	Improve accuracy of reciprocal step instructions (#2305 ) * Improve accuracy of reciprocal step instructions * Fix small mistake on RECPE rounding, nits, PTC version bump	2021-05-24 20:20:07 +10:00
FICTURE7	65ac00833a	Use branch instead of tailcall for recursive calls (#2282 ) * Use branch instead of tailcall for recursive calls Use a branch instead of doing a tailcall for recursive calls. This avoids having to store the dispatch address, setting up the epilogue and keeps guest registers in host registers for longer. The rejit check is moved down into the entry block so that the rejit behaviour remains the same as before. * Set PTC version Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2021-05-20 09:31:45 -03:00
FICTURE7	0181068016	Add `BIC/ORR Vd.T, #imm` fast path (#2279 ) * Add fast path for BIC Vd.T, #imm * Add fast path for ORR Vd.T, #imm * Set PTC version * Fixup Exception to InvalidOperationException	2021-05-20 09:09:17 -03:00
FICTURE7	c805542b29	Allow `LocalVariable` to be assigned more than once (#2288 ) * Allow `LocalVariable` to be assigned more than once This allows us to write flow controls like loops and if-elses with LocalVariables participating in phi nodes. * Add `GetLocalNumber` to operand	2021-05-17 01:54:53 +02:00
gdkchan	e318022b89	Fold constant offsets and group constant addresses (#2285 ) * Fold constant offsets and group constant addresses * PPTC version bump	2021-05-13 21:26:57 +02:00
LDj3SNuD	57ea3f93a3	PPTC meets ExeFS Patching. (#1865 ) * PPTC meets ExeFS Patching. * InternalVersion = 1865 * Ready! * Optimized the PtcProfiler Load/Save methods.	2021-05-13 20:05:15 +02:00
FICTURE7	89791ba68d	Add inlined on translation call counting (#2190 ) * Add EntryTable<TEntry> * Add on translation call counting * Add Counter * Add PPTC support * Make Counter a generic & use a 32-bit counter instead * Return false on overflow * Set PPTC version * Print more information about the rejit queue * Make Counter<T> disposable * Remove Block.TailCall since it is not used anymore * Apply suggestions from code review Address gdkchan's feedback Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Fix more stale docs * Remove rejit requests queue logging * Make Counter<T> finalizable Most certainly quite an odd use case. * Make EntryTable<T>.TryAllocate set entry to default * Re-trigger CI * Dispose Counters before they hit the finalizer queue * Re-trigger CI Just for good measure... * Make EntryTable<T> expandable * EntryTable is now expandable instead of being a fixed slab. * Remove EntryTable<T>.TryAllocate * Remove Counter<T>.TryCreate Address LDj3SNuD's feedback * Apply suggestions from code review Address LDj3SNuD's feedback Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Remove useless return * POH approach, but the sequel * Revert "POH approach, but the sequel" This reverts commit 5f5abaa24735726ff2db367dc74f98055d4f4cba. The sequel got shelved * Add extra documentation Co-authored-by: gdkchan <gab.dark.100@gmail.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2021-04-18 23:43:53 +02:00
LDj3SNuD	90163087a0	PPTC vs. giant ExeFS. (#2168 ) * PPTC vs. giant ExeFS. * InternalVersion = 2168 * Add new heuristic algorithm for calculating the number of threads for parallel translations that also takes into account the user's free physical memory and not just the number of CPU cores. * Nit. * Add an outer Header structure and add the hashes for both this new structure and the existing "inner" Header structure. * InternalVersion = 2169	2021-04-13 03:24:36 +02:00
gdkchan	d43a56726c	(CPU) Fix CRC32 instruction when constant values are used as input (#2183 )	2021-04-07 23:43:08 +02:00
FICTURE7	98ed81e4cd	Improve `StoreToContext` emission (#2155 ) * Improve StoreToContext emission Hoist StoreToContext in dynamic branch fast & slow paths out into their predecessor. Reduces register pressure, code size and compile time because we're throwing less stuff down the pipeline. * Set PTC internal version * Turn EmitDynamicTableCall private * Re-trigger CI	2021-04-02 19:54:23 +02:00
FICTURE7	8b3eba7e13	Reduce allocation during SSA construction (#2162 ) * Reduce allocation during SSA construction * Re-trigger CI	2021-04-02 19:26:16 +02:00
LDj3SNuD	4bd1ad16f9	Add Sqdmulh_Ve & Sqrdmulh_Ve Inst.s with Tests. (#2139 )	2021-03-25 23:33:32 +01:00
mageven	69f8722e79	Fix inconsistencies in progress reporting (#2129 ) Signal and setup events correctly Eliminate possible races Use a single event Mark volatiles and reduce scope of waithandles Common handler 100ms -> 50ms	2021-03-22 19:40:07 +01:00
mageven	ca5d8e58dd	Add progress reporting to PTC and Shader Cache (#2057 ) * UI changes * Add progress reporting to PTC & ShaderCache * Account for null events and expand docs Co-authored-by: Joshi234 <46032261+Joshi234@users.noreply.github.com>	2021-03-03 01:39:36 +01:00
LDj3SNuD	bcbf240d2e	PPTC: Fix unwanted propagation of a relocatable constant in a specific case. (#1990 ) * Fix unwanted propagation of a relocatable constant in a specific case. * Ptc.InternalVersion = 1990 * Nit to retrigger the Checks.	2021-02-23 13:15:45 +01:00
mageven	9bda7b4699	Implement VCNT instruction (#1963 ) * Implement VCNT based on AArch64 CNT Add tests * Update PTC version * Address LDj's comments * Explicit size in encoding * Tighter tests * Replace SoftFallback with IR helper Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Reduce one BitwiseAnd from IR fallback Based on popcount64b from https://en.wikipedia.org/wiki/Hamming_weight#Efficient_implementation * Rename parameter and add assert Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2021-02-22 16:26:13 +01:00
LDj3SNuD	dc0adb533d	PPTC & Pool Enhancements. (#1968 ) * PPTC & Pool Enhancements. * Avoid buffer allocations in CodeGenContext.GetCode(). Avoid stream allocations in PTC.PtcInfo. Refactoring/nits. * Use XXHash128, for Ptc.Load & Ptc.Save, x10 faster than Md5. * Why not a nice Span. * Added a simple PtcFormatter library for deserialization/serialization, which does not require reflection, in use at PtcJumpTable and PtcProfiler; improves maintainability and simplicity/readability of affected code. * Nits. * Revert #1987. * Revert "Revert #1987." This reverts commit 998be765cf7f7da5ff0c1c08de704c9012b0f49c.	2021-02-22 03:23:48 +01:00
FICTURE7	1586880114	Turn Copy into Fill in HybridAllocator (#2010 ) * Turn Copy into Fill in HybridAllocator * Set PTC internal verison	2021-02-21 18:33:59 +01:00
gdkchan	9d82d27df2	Fix memory tracking performance regression (#2026 ) * Fix memory tracking performance regression * Set PTC version	2021-02-17 09:16:20 +11:00
gdkchan	715b605e95	Validate CPU virtual addresses on access (#1987 ) * Enable PTE null checks again * Do address validation on EmitPtPointerLoad, and make it branchless * PTC version increment * Mask of pointer tag for exclusive access * Move mask to the correct place Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2021-02-16 19:04:19 +01:00
sharmander	40797a1283	Optimization \| Modify Add (Integer) Instruction to use LEA instead. (#1971 ) * Optimization \| Modify Add Instruction to use LEA instead. Currently, the add instruction requires 4 registers to take place. By using LEA, we can effectively perform the same working using 3 registers, reducing memory spills and improving translation efficiency. * Fix IsSameOperandDestSrc1 Check for Add * Use LEA if Dest != SRC1 * Update IsSameOperandDestSrc1 to account for Cases where Dest and Src1 can be same for add * Fix error in logic * Typo * Add paranthesis for clarity * Compare registers as requested. * Cleanup if statement, use same comparison method as generateCopy * Make change as recommended by gdk * Perform check only when Add calls are made * use ensure sametype for lea, fix else * Update comment * Update version #	2021-02-08 10:49:46 +11:00
gdkchan	ee28ccebf4	Disable partial JIT invalidation on unmap (#1991 )	2021-02-08 10:25:14 +11:00
gdkchan	dcce407071	Lower precision of estimate instruction results to match Arm behavior (#1943 ) * Lower precision of estimate instruction results to match Arm behavior * PTC version update * Nits	2021-01-28 10:23:00 +11:00
mageven	c19cfca183	Implement PRFM (register variant) as NOP (#1956 ) * Implement PRFM (register variant) as NOP Fix typo pfrm -> prfm Add comments to distinguish variants * Increment PTC version	2021-01-26 16:09:27 +11:00
FICTURE7	ddf1105bcb	Add VCLZ.* fast path (#1917 ) * Add VCLZ fast path * Add VCLZ.8B/16B SSSE3 fast path * Add VCLZ.4H/8H SSSE3 fast path * Add VCLZ.2S/4S SSE2 fast path * Improve CLZ.4H/8H fast path * Improve CLZ.2S/4S fast path * Set PPTC version	2021-01-25 10:01:25 +11:00
LDj3SNuD	c3e0c41da3	CPU (A64): Add Fmaxnmp & Fminnmp Scalar Inst.s, Fast & Slow Paths; with Tests. (#1894 )	2021-01-20 09:12:33 +11:00
LDj3SNuD	68f6b79fd3	Add a simple Pools Limiter. (#1830 ) * Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs. Added invalidation of .cache files in the event of reuse on a different user operating system. Added .info and .cache files invalidation in case of a failed stream decompression. Nits. * InternalVersion = 1712; * Nits. * Address comment. * Get rid of BinaryFormatter. Nits. * Move Ptc.LoadTranslations(). Nits. * Nits. * Fixed corner cases (in case backup copies have to be used). Added save logs. * Not core fixes. * Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers. * Add LoadTranslations log. * Nits. * Removed the search and management of LowCq overlapping functions. * Final increment of .info and .cache flags. * Nit. * Free up memory allocated by Pools during any PPTC translations at boot time. * Nit due to rebase. * Add a simple Pools Limiter. * Nits. * Fix missing JumpTable.RegisterFunction() due to rebase. Clear MemoryStreams as soon as possible, when they are no longer needed. * Code cleaning. * Nit for retrigger Checks. * Update Ptc.cs * Contextual refactoring of Translator. Ignore resetting of pools for DirectCallStubs. * Nit for retrigger Checks.	2021-01-12 19:04:02 +01:00
LDj3SNuD	430ba6da65	CPU (A64): Add Pmull_V Inst. with Clmul fast path for the "1/2D -> 1Q" variant & Sse fast path and slow path for both the "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. (#1817 ) * Add Pmull_V Sse fast path only, both "8/16B -> 8H" and "1/2D -> 1Q" variants; with Test. * Add Clmul fast path for the 128 bits variant. * Small optimisation (save 60 instructions) for the Sse fast path about the 128 bits variant. * Add slow path, both variants. Fix V128 Shl/Shr when shift = 0. * A32: Add Vmull_I P64 variant (slow path); not tested. * A32: Add Vmull_I_P8_P64 Test and fix P64 variant.	2021-01-04 23:45:54 +01:00
Ac_K	5b9c876155	Hotfix for #1814	2020-12-24 04:44:39 +01:00
LDj3SNuD	2502f1f07f	Free up memory allocated by Pools during any PPTC translations at boot time. (#1814 ) * Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs. Added invalidation of .cache files in the event of reuse on a different user operating system. Added .info and .cache files invalidation in case of a failed stream decompression. Nits. * InternalVersion = 1712; * Nits. * Address comment. * Get rid of BinaryFormatter. Nits. * Move Ptc.LoadTranslations(). Nits. * Nits. * Fixed corner cases (in case backup copies have to be used). Added save logs. * Not core fixes. * Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers. * Add LoadTranslations log. * Nits. * Removed the search and management of LowCq overlapping functions. * Final increment of .info and .cache flags. * Nit. * Free up memory allocated by Pools during any PPTC translations at boot time. * Nit due to rebase.	2020-12-24 03:58:36 +01:00
LDj3SNuD	8a33e884f8	Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Fix Vfma_V slow path not using StandardFPSCRValue(). (#1775 ) * Fix Vnmls_S fast path (F64: losing input d value). Fix Vnmla_S & Vnmls_S slow paths (using fused inst.s). Add Vfma_S & Vfms_S Fma fast paths. Add Vfnma_S inst. with Fma/Sse fast paths and slow path. Add Vfnms_S Sse fast path. Add Tests for affected inst.s. Nits. * InternalVersion = 1775 * Nits. * Fix Vfma_V slow path not using StandardFPSCRValue(). * Nit: Fix Vfma_V order. * Add Vfms_V Sse fast path and slow path. * Add Vfma_V and Vfms_V Test.	2020-12-17 20:43:41 +01:00
LDj3SNuD	b5c215111d	PPTC Follow-up. (#1712 ) * Added support for offline invalidation, via PPTC, of low cq translations replaced by high cq translations; both on a single run and between runs. Added invalidation of .cache files in the event of reuse on a different user operating system. Added .info and .cache files invalidation in case of a failed stream decompression. Nits. * InternalVersion = 1712; * Nits. * Address comment. * Get rid of BinaryFormatter. Nits. * Move Ptc.LoadTranslations(). Nits. * Nits. * Fixed corner cases (in case backup copies have to be used). Added save logs. * Not core fixes. * Complement to the previous commit. Added load logs. Removed BinaryFormatter leftovers. * Add LoadTranslations log. * Nits. * Removed the search and management of LowCq overlapping functions. * Final increment of .info and .cache flags. * Nit. * GetIndirectFunctionAddress(): Validate that writing actually takes place in dynamic table memory range (and not elsewhere). * Fix Ptc.UpdateInfo() due to rebase. * Nit for retrigger Checks. * Nit for retrigger Checks.	2020-12-17 20:32:09 +01:00
sharmander	e901b7850c	CPU: Implement VRINTX.F32 \| VRINTX.F64 (#1776 ) * Start implementation * Draft * Updated opcode. Needs verification. * Clean up code. * Update implementation and tests. * Update implemenation + tests * Get RM from FPSCR + Do not use emit/addintrinsic * Remove "fast" path, as recommended by gdk. * Variable DELETED. * Update ARMeilleure/Decoders/OpCodeTable.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Update ARMeilleure/Instructions/InstEmitSimdCvt32.cs Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> * Move method * stringing things together :) Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2020-12-16 20:27:15 -03:00
gdkchan	61634dd415	Clear JIT cache on exit (#1518 ) * Initial cache memory allocator implementation * Get rid of CallFlag * Perform cache cleanup on exit * Basic cache invalidation * Thats not how conditionals works in C# it seems * Set PTC version to PR number * Address PR feedback * Update InstEmitFlowHelper.cs * Flag clear on address is no longer needed * Do not include exit block in function size calculation * Dispose jump table * For future use * InternalVersion = 1519 (force retest). Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2020-12-16 17:07:42 -03:00
sharmander	3332b29f01	CPU: Implement VFMA (Vector) (#1762 ) * Implement VFMA.F64 * Simplify switch * Simplify FMA Instructions into their own IntrinsicType. * Remove whitespace * Fix indentation * Change tests for Vfnms -- disable inf / nan * Move args up, not description ;) * Implementation Complete. All Tests Pass (Slow / Fast Path) * Move location of function in assembler + test updates. * Shift params upwards * Remove unused function * Update PTC version. * Add comments / re-oreder opcode table. * Remove whitespace * Fix nit * Fix nit. * Fix whitespace * Wrong opcode was used by a bad merge. * Addressed rip's comments.	2020-12-15 00:01:52 -03:00
gdkchan	47ba81c661	Fix pre-allocator shift instruction copy on a specific case (#1752 ) * Fix pre-allocator shift instruction copy on a specific case * Fix to make shift use int rather than long for constants	2020-12-14 17:56:07 -03:00
gdkchan	c8bb3cc50e	Fix register read after write on STREX implementation (#1801 ) * Fix register read after write on STREX implementation * PTC version update	2020-12-13 12:19:38 -03:00
sharmander	36f6bbf5b9	CPU: Implement VFNMA.F32 \| F.64 (#1783 ) * Implement VFNMA.F<32/64> * Update PTC Version * Update Implementation & Renames & Correct Order * Fix alignment * Update implementation to not trigger assert * Actually use the intrinsic that makes sense :)	2020-12-07 21:04:01 -03:00
LDj3SNuD	567ea726e1	Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths). (#1630 ) * Add support for guest Fz (Fpcr) mode through host Ftz and Daz (Mxcsr) modes (fast paths). * Ptc.InternalVersion = 1630 * Nits. * Address comments. * Update Ptc.cs * Address comment.	2020-12-07 10:37:07 +01:00
sharmander	b479a43939	CPU: Implement VFNMS.F32/64 (#1758 ) * Add necessary methods / op-code * Enable Support for FMA Instruction Set * Add Intrinsics / Assembly Opcodes for VFMSUB231XX. * Add X86 Instructions for VFMSUB231XX * Implement VFNMS * Implement VFNMS Tests * Add special cases for FMA instructions. * Update PPTC Version * Remove unused Op * Move Check into Assert / Cleanup * Rename and cleanup * Whitespace * Whitespace / Rename * Re-sort * Address final requests * Implement VFMA.F64 * Simplify switch * Simplify FMA Instructions into their own IntrinsicType. * Remove whitespace * Fix indentation * Change tests for Vfnms -- disable inf / nan * Move args up, not description ;) * Undo vfma * Completely remove vfms code., * Fix order of instruction in assembler	2020-12-03 20:20:02 +01:00
gdkchan	cf6cd71488	IPC refactor part 2: Use ReplyAndReceive on HLE services and remove special handling from kernel (#1458 ) * IPC refactor part 2: Use ReplyAndReceive on HLE services and remove special handling from kernel * Fix for applet transfer memory + some nits * Keep handles if possible to avoid server handle table exhaustion * Fix IPC ZeroFill bug * am: Correctly implement CreateManagedDisplayLayer and implement CreateManagedDisplaySeparableLayer CreateManagedDisplaySeparableLayer is requires since 10.x+ when appletResourceUserId != 0 * Make it exit properly * Make ServiceNotImplementedException show the full message again * Allow yielding execution to avoid starving other threads * Only wait if active * Merge IVirtualMemoryManager and IAddressSpaceManager * Fix Ro loading data from the wrong process Co-authored-by: Thog <me@thog.eu>	2020-12-02 00:23:43 +01:00
riperiperi	9852cb9c9e	Use backup when PTC compression is corrupt (#1704 ) * Use backup when PTC compression is corrupt * Apply suggestions from code review Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com> Co-authored-by: LDj3SNuD <35856442+LDj3SNuD@users.noreply.github.com>	2020-11-20 02:51:59 +01:00
LDj3SNuD	0679084f11	CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Now HardwareCapabilities uses CpuId. (#1650 ) * net5.0 * CPU (A64): Add FP16/FP32 fast paths (F16C Intrinsics) for Fcvt_S, Fcvtl_V & Fcvtn_V Instructions. Switch to .NET 5.0. Nits. Tests performed successfully in both debug and release mode (for all instructions involved). * Address comment. * Update appveyor.yml * Revert "Update appveyor.yml" This reverts commit 27cdd59e8b90e227e6924d9c162af26c00a89013. * Remove Assembler CpuId. * Update appveyor.yml * Address comment.	2020-11-18 19:35:54 +01:00
Mary	863edae328	shader cache: Fix Linux boot issues (#1709 ) * shader cache: Fix Linux boot issues This rollback the init logic back to previous state, and replicate the way PTC handle initialization. * shader cache: set default state of ready for translation event to false * Fix cpu unit tests	2020-11-17 22:40:19 +01:00
Mary	aa129fdbdf	infra: Migrate to .NET 5 (#1694 ) * infra: Migrate to .NET 5 This migrate projects and CI to .NET 5 * Remove language version restrictions (now on 9.0 by default) * infra: pin .NET 5 to avoid later issues * infra: Cleanup csproj files * infra: update dependencies * infra: Add temporary workaround for a bug in Vector128.Create see https://github.com/dotnet/runtime/issues/44704 for more informations	2020-11-15 19:27:15 +01:00
FICTURE7	64088f04e3	Fix LiveInterval.Split (#1660 ) Before when splitting intervals, the end of the range would be included in the split check, this can produce empty ranges in the child split. This in turn can affect spilling decisions since the child split will have a different start position and this empty range will get a register and move to the active set for a brief moment. For example: A = [153, 172[; [1899, 1916[; [1991, 2010[; [2397, 2414[; ... Split(A, 1916) A0 = [153, 172[; [1899, 1916[ A1 = [1916, 1916[; [1991, 2010[; [2397, 2414[; ...	2020-11-04 23:09:45 -03:00
gdkchan	2f16491712	Get rid of Reflection.Emit dependency on CPU and Shader projects (#1626 ) * Get rid of Reflection.Emit dependency on CPU and Shader projects * Remove useless private sets * Missed those due to the alignment	2020-10-21 09:13:44 -03:00
riperiperi	b4d8d893a4	Memory Read/Write Tracking using Region Handles (#1272 ) * WIP Range Tracking - Texture invalidation seems to have large problems - Buffer/Pool invalidation may have problems - Mirror memory tracking puts an additional `add` in compiled code, we likely just want to make HLE access slower if this is the final solution. - Native project is in the messiest possible location. - [HACK] JIT memory access always uses native "fast" path - [HACK] Trying some things with texture invalidation and views. It works :) Still a few hacks, messy things, slow things More work in progress stuff (also move to memory project) Quite a bit faster now. - Unmapping GPU VA and CPU VA will now correctly update write tracking regions, and invalidate textures for the former. - The Virtual range list is now non-overlapping like the physical one. - Fixed some bugs where regions could leak. - Introduced a weird bug that I still need to track down (consistent invalid buffer in MK8 ribbon road) Move some stuff. I think we'll eventually just put the dll and so for this in a nuget package. Fix rebase. [WIP] MultiRegionHandle variable size ranges - Avoid reprotecting regions that change often (needs some tweaking) - There's still a bug in buffers, somehow. - Might want different api for minimum granularity Fix rebase issue Commit everything needed for software only tracking. Remove native components. Remove more native stuff. Cleanup Use a separate window for the background context, update opentk. (fixes linux) Some experimental changes Should get things working up to scratch - still need to try some things with flush/modification and res scale. Include address with the region action. Initial work to make range tracking work Still a ton of bugs Fix some issues with the new stuff. * Fix texture flush instability There's still some weird behaviour, but it's much improved without this. (textures with cpu modified data were flushing over it) * Find the destination texture for Buffer->Texture full copy Greatly improves performance for nvdec videos (with range tracking) * Further improve texture tracking * Disable Memory Tracking for view parents This is a temporary approach to better match behaviour on master (where invalidations would be soaked up by views, rather than trigger twice) The assumption is that when views are created to a texture, they will cover all of its data anyways. Of course, this can easily be improved in future. * Introduce some tracking tests. WIP * Complete base tests. * Add more tests for multiregion, fix existing test. * Cleanup Part 1 * Remove unnecessary code from memory tracking * Fix some inconsistencies with 3D texture rule. * Add dispose tests. * Use a background thread for the background context. Rather than setting and unsetting a context as current, doing the work on a dedicated thread with signals seems to be a bit faster. Also nerf the multithreading test a bit. * Copy to texture with matching alignment This extends the copy to work for some videos with unusual size, such as tutorial videos in SMO. It will only occur if the destination texture already exists at XCount size. * Track reads for buffer copies. Synchronize new buffers before copying overlaps. * Remove old texture flushing mechanisms. Range tracking all the way, baby. * Wake the background thread when disposing. Avoids a deadlock when games are closed. * Address Feedback 1 * Separate TextureCopy instance for background thread Also `BackgroundContextWorker.InBackground` for a more sensible idenfifier for if we're in a background thread. * Add missing XML docs. * Address Feedback * Maybe I should start drinking coffee. * Some more feedback. * Remove flush warning, Refocus window after making background context	2020-10-16 17:18:35 -03:00
LDj3SNuD	04e330cc00	Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths). (#1577 ) * Add Umaal & Vabd_I, Vabdl_I, Vaddl_I, Vhadd, Vqshrn, Vshll inst.s (slow paths). No test provided (i.e. draft). * Ptc InternalVersion = 1577	2020-10-13 22:41:33 +02:00
gdkchan	f2b12c9749	Remove old, unused CPU optimization (#1586 )	2020-09-30 16:16:34 -03:00

1 2 3

142 commits