ryujinx-final

Archived

Author	SHA1	Message	Date
gdkchan	5e0f8e8738	Implement JIT Arm64 backend (#4114 ) * Implement JIT Arm64 backend * PPTC version bump * Address some feedback from Arm64 JIT PR * Address even more PR feedback * Remove unused IsPageAligned function * Sync Qc flag before calls * Fix comment and remove unused enum * Address riperiperi PR feedback * Delete Breakpoint IR instruction that was only implemented for Arm64	2023-01-10 19:16:59 -03:00
gdkchan	fc4b7cba2c	Make PPTC state non-static (#4157 ) * Make PPTC state non-static * DiskCacheLoadState can be null	2023-01-05 00:01:44 +01:00
Berkan Diler	0d3b82477e	Use new ArgumentNullException and ObjectDisposedException throw-helper API (#4163 )	2022-12-27 20:27:11 +01:00
Berkan Diler	37d27c4c99	Some minor cleanups and optimizations (#4174 ) * Replace Array.Clear(x, 0, x.Length) with Array.Clear(x) * Use DateTime.UnixEpoch field * Replace SHA256.ComputeHash calls with static SHA256.HashData call More performant and avoids the need to initialize a SHA256 instance.	2022-12-24 14:30:39 -03:00
gdkchan	219f63ff4e	Fix CPU FCVTN instruction implementation (slow path) (#4159 ) * Fix CPU FCVTN instruction implementation (slow path) * PPTC version bump	2022-12-21 23:05:58 +00:00
gdkchan	ee0f9b03a4	Eliminate zero-extension moves in more cases on 32-bit games (#4140 ) * Eliminate zero-extension moves in more cases on 32-bit games * PPTC version bump * Revert X86Optimizer changes	2022-12-19 14:45:58 -03:00
gdkchan	f93c5f006a	Revert "ARMeilleure: Add initial support for AVX512(EVEX encoding) (#3663 )" (#4145 ) This reverts commit `295fbd0542`.	2022-12-18 20:21:10 -03:00
Wunk	295fbd0542	ARMeilleure: Add initial support for AVX512(EVEX encoding) (#3663 ) * ARMeilleure: Add AVX512{F,VL,DQ,BW} detection Add `UseAvx512Ortho` and `UseAvx512OrthoFloat` optimization flags as short-hands for `F+VL` and `F+VL+DQ`. * ARMeilleure: Add initial support for EVEX instruction encoding Does not implement rounding, or exception controls. * ARMeilleure: Add `X86Vpternlogd` Accelerates the vector-`Not` instruction. * ARMeilleure: Add check for `OSXSAVE` for AVX{2,512} * ARMeilleure: Add check for `XCR0` flags Add XCR0 register checks for AVX and AVX512F, following the guidelines from section 14.3 and 15.2 from the Intel Architecture Software Developer's Manual. * ARMeilleure: Increment InternalVersion * ARMeilleure: Remove redundant `ReProtect` and `Dispose`, formatting * ARMeilleure: Move XCR0 procedure to GetXcr0Eax * ARMeilleure: Add `XCR0` to `FeatureInfo` structure * ARMeilleure: Utilize `ReadOnlySpan` for Xcr0 assembly Avoids an additional allocation * ARMeilleure: Formatting fixes	2022-12-18 16:46:13 -03:00
Isaac Marovitz	0fbcd630bc	Replace `DllImport` usage with `LibraryImport` (#4084 ) * Replace usage of `DllImport` with `LibraryImport` * Mark methods as `partial` * Marshalling * More `partial` & marshalling * More `partial` and marshalling * More partial and marshalling * Update GdiPlusHelper to LibraryImport * Unicorn * More Partial * Marshal * Specify EntryPoint * Specify EntryPoint * Change GlobalMemoryStatusEx to LibraryImport * Change RegisterClassEx to LibraryImport * Define EntryPoints * Update Ryujinx.Ava/Ui/Controls/Win32NativeInterop.cs Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> * Update Ryujinx.Graphics.Nvdec.FFmpeg/Native/FFmpegApi.cs Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> * Move return mashal * Remove calling convention specification * Remove calling conventions * Update Ryujinx.Common/SystemInfo/WindowsSystemInfo.cs Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> * Update Ryujinx/Modules/Updater/Updater.cs Co-authored-by: Mary-nyan <thog@protonmail.com> * Update Ryujinx.Ava/Modules/Updater/Updater.cs Co-authored-by: Mary-nyan <thog@protonmail.com> Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com> Co-authored-by: Mary-nyan <thog@protonmail.com>	2022-12-15 18:07:31 +01:00
Andrey Sukharev	4da44e09cb	Make structs readonly when applicable (#4002 ) * Make all structs readonly when applicable. It should reduce amount of needless defensive copies * Make structs with trivial boilerplate equality code record structs * Remove unnecessary readonly modifiers from TextureCreateInfo * Make BitMap structs readonly too	2022-12-05 14:47:39 +01:00
LDj3SNuD	62585755fd	Do not clear the rejit queue when overlaps count is equal to 0. (#3721 ) * Do not clear the rejit queue when overlaps count is equal to 0. * Ptc and PtcProfiler must be invalidated. * Revert "Ptc and PtcProfiler must be invalidated." This reverts commit f5b0ad9d7dc3c0b3a0da184de4d04d7234939c81. * Fix #3710 slow path due to #3701.	2022-10-19 02:08:34 +00:00
LDj3SNuD	5af8ce7c38	A64: Add fast path for Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V in… (#3712 ) * A64: Add fast path for Fcvtas_Gp/S/V, Fcvtau_Gp/S/V and Frinta_S/V instructions; they use "Round to Nearest with Ties to Away" rounding mode not supported in x86. All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. The titles Mario Strikers and Super Smash Bros. U. use these instructions intensively. * Update Ptc.cs * A32: Add fast path for Vcvta_RM, Vrinta_RM and Vrinta_V instructions aswell.	2022-10-19 00:21:33 +00:00
Wunk	45ce540b9b	ARMeilleure: Add `gfni` acceleration (#3669 ) * ARMeilleure: Add `GFNI` detection This is intended for utilizing the `gf2p8affineqb` instruction * ARMeilleure: Add `gf2p8affineqb` Not using the VEX or EVEX-form of this instruction is intentional. There are `GFNI`-chips that do not support AVX(so no VEX encoding) such as Tremont(Lakefield) chips as well as Jasper Lake. `13df339fe7/GenuineIntel/GenuineIntel00806A1_Lakefield_LC_InstLatX64.txt (L1297-L1299)` `13df339fe7/GenuineIntel/GenuineIntel00906C0_JasperLake_InstLatX64.txt (L1252-L1254)` * ARMeilleure: Add `gfni` acceleration of `Rbit_V` Passes all `Rbit_V` unit tests on my `i9-11900k` ARMeilleure: Add `gfni` acceleration of `S{l,r}i_V` Also added a fast-path for when the shift amount is greater than the size of the element. * ARMeilleure: Add `gfni` acceleration of `Shl_V` and `Sshr_V` * ARMeilleure: Increment InternalVersion * ARMeilleure: Fix Intrinsic and Assembler Table alignment `gf2p8affineqb` is the longest instruction name I know of. It shouldn't get any wider than this. * ARMeilleure: Remove SSE2+SHA requirement for GFNI * ARMeilleure Add `X86GetGf2p8LogicalShiftLeft` Used to generate GF(2^8) 8x8 bit-matrices for bit-shifting for the `gf2p8affineqb` instruction. * ARMeilleure: Append `FeatureInfo7Ecx` to `FeatureInfo`	2022-10-02 11:17:19 +02:00
LDj3SNuD	814f75142e	Fpsr and Fpcr freed. (#3701 ) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Fpsr and Fpcr freed. Handling/isolation of Fpsr and Fpcr via register for IR and via memory for Tests and Threads, with synchronization to context exchanges (explicit for SoftFloat); without having to call managed methods. Thanks to the inlining work of the previous two PRs and others in this. Tests performed locally in both release and debug modes, in both lowcq and highcq, with FastFP to true and false (explicit FP tests included). Tested with the title Tony Hawk's PS. Depends on shlreg. * Update InstEmitSimdHelper.cs * De-magic Masks. Remove the Stride and Len flags; Fpsr.NZCV are A32 only, then moved to Fpscr: this leads to emitting less IR in reference to Get/Set Fpsr/Fpcr/Fpscr methods in reference to Mrs/Msr (A64) and Vmrs/Vmsr (A32) instructions. * Addressed PR feedback.	2022-09-20 18:55:13 -03:00
LDj3SNuD	b9f1ff3c77	Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. (#3700 ) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback. * Implemented in IR the managed methods of the ShlReg region of the SoftFallback class. It also includes the last two SatQ ones (following up on https://github.com/Ryujinx/Ryujinx/pull/3665). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Update InstEmitSimdHelper.cs	2022-09-19 14:49:10 -03:00
gdkchan	729ff5337c	Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1 (#3695 ) * Fix increment on Arm32 NEON VLDn/VSTn instructions with regs > 1 * PPTC version bump * PR feedback	2022-09-13 08:24:09 +02:00
gdkchan	c64524a240	Add ADD (zx imm12), NOP, MOV (rs), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions (#3683 ) * Add ADD (zx imm12), NOP, MOV (register shifted), LDA, TBB, TBH, MOV (zx imm16) and CLZ thumb instructions, fix LDRD, STRD, CBZ, CBNZ and BLX (reg) * Bump PPTC version	2022-09-09 22:09:11 -03:00
gdkchan	db45688aa8	Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions (#3677 ) * Implement VRSRA, VRSHRN, VQSHRUN, VQMOVN, VQMOVUN, VQADD, VQSUB, VRHADD, VPADDL, VSUBL, VQDMULH and VMLAL Arm32 NEON instructions * PPTC version * Fix VQADD/VQSUB * Improve MRC/MCR handling and exception messages In case data is being recompiled as code, we don't want to throw at emit stage, instead we should only throw if it actually tries to execute	2022-09-09 21:47:38 -03:00
FICTURE7	ee1825219b	Clean up rejit queue (#2751 )	2022-09-08 20:14:08 -03:00
LDj3SNuD	7baa08dcb4	Implemented in IR the managed methods of the Saturating region ... (#3665 ) * Implemented in IR the managed methods of the Saturating region ... ... of the SoftFallback class (the SatQ ones). The need to natively manage the Fpcr and Fpsr system registers is still a fact. Contributes to https://github.com/Ryujinx/Ryujinx/issues/2917 ; I will open another PR to implement in Intrinsics-branchless the methods of the Saturation region as well (the SatXXXToXXX ones). All instructions involved have been tested locally in both release and debug modes, in both lowcq and highcq. * Ptc.InternalVersion = 3665 * Addressed PR feedback.	2022-09-08 19:40:41 -03:00
gdkchan	6922862db8	Optimize kernel memory block lookup and consolidate RBTree implementations (#3410 ) * Implement intrusive red-black tree, use it for HLE kernel block manager * Implement TreeDictionary using IntrusiveRedBlackTree * Implement IntervalTree using IntrusiveRedBlackTree * Implement IntervalTree (on Ryujinx.Memory) using IntrusiveRedBlackTree * Make PredecessorOf and SuccessorOf internal, expose Predecessor and Successor properties on the node itself * Allocation free tree node lookup	2022-08-26 18:21:48 +00:00
merry	f5235fff29	ARMeilleure: Hardware accelerate SHA256 (#3585 ) * ARMeilleure/HardwareCapabilities: Add Sha * ARMeilleure/Intrinsic: Add X86Sha256Rnds2 * ARmeilleure: Hardware accelerate SHA256H/SHA256H2 * ARMeilleure/Intrinsic: Add X86Sha256Msg1, X86Sha256Msg2 * ARMeilleure/Intrinsic: Add X86Palignr * ARMeilleure: Hardware accelerate SHA256SU0, SHA256SU1 * PTC: Bump InternalVersion	2022-08-25 10:12:13 +00:00
gdkchan	eba682b767	Implement some 32-bit Thumb instructions (#3614 ) * Implement some 32-bit Thumb instructions * Optimize OpCode32MemMult using PopCount	2022-08-25 09:59:34 +00:00
Nicholas Rodine	7defc59b9d	A few minor documentation fixes. (#3599 ) * A few minor documentation fixes. * Removed more invalid inheritdoc instances.	2022-08-19 18:21:06 -03:00
gdkchan	f7ef6364b7	Implement CPU FCVT Half <-> Double conversion variants (#3439 ) * Half <-> Double conversion support * Add tests, fast path and deduplicate SoftFloat code * PPTC version	2022-07-06 13:40:31 +02:00
gdkchan	0c87bf9ea4	Refactor CPU interface to allow the implementation of other CPU emulators (#3362 ) * Refactor CPU interface * Use IExecutionContext interface on SVC handler, change how CPU interrupts invokes the handlers * Make CpuEngine take a ITickSource rather than returning one The previous implementation had the scenario where the CPU engine had to implement the tick source in mind, like for example, when we have a hypervisor and the game can read CNTPCT on the host directly. However given that we need to do conversion due to different frequencies anyway, it's not worth it. It's better to just let the user pass the tick source and redirect any reads to CNTPCT to the user tick source * XML docs for the public interfaces * PPTC invalidation due to NativeInterface function name changes * Fix build of the CPU tests * PR feedback	2022-05-31 16:29:35 -03:00
gdkchan	95017b8c66	Support memory aliasing (#2954 ) * Back to the origins: Make memory manager take guest PA rather than host address once again * Direct mapping with alias support on Windows * Fixes and remove more of the emulated shared memory * Linux support * Make shared and transfer memory not depend on SharedMemoryStorage * More efficient view mapping on Windows (no more restricted to 4KB pages at a time) * Handle potential access violations caused by partial unmap * Implement host mapping using shared memory on Linux * Add new GetPhysicalAddressChecked method, used to ensure the virtual address is mapped before address translation Also align GetRef behaviour with software memory manager * We don't need a mirrorable memory block for software memory manager mode * Disable memory aliasing tests while we don't have shared memory support on Mac * Shared memory & SIGBUS handler for macOS * Fix typo + nits + re-enable memory tests * Set MAP_JIT_DARWIN on x86 Mac too * Add back the address space mirror * Only set MAP_JIT_DARWIN if we are mapping as executable * Disable aliasing tests again (still fails on Mac) * Fix UnmapView4KB (by not casting size to int) * Use ref counting on memory blocks to delay closing the shared memory handle until all blocks using it are disposed * Address PR feedback * Make RO hold a reference to the guest process memory manager to avoid early disposal Co-authored-by: nastys <nastys@users.noreply.github.com>	2022-05-02 20:30:02 -03:00
gdkchan	26a881176e	Fix tail merge from block with conditional jump to multiple returns (#3267 ) * Fix tail merge from block with conditional jump to multiple returns * PPTC version bump	2022-04-09 16:56:50 +02:00
merry	df70442c46	InstEmitMemoryEx: Barrier after write on ordered store (#3193 ) * InstEmitMemoryEx: Barrier after write on ordered store * increment ptc version * 32	2022-03-19 10:32:35 -03:00
merry	b97ff4da5e	A32: Fix ALU immediate instructions (#3179 ) * Tests: Add A32 tests for immediate ADC/ADCS/RSC/RSCS/SBC/SBCS * A32: Fix bug in ADC/ADCS/RSC/RSCS/SBC/SBCS * CpuTestAluImm32: Add more opcodes * Increment PTC version	2022-03-05 15:23:10 -03:00
merry	dc063eac83	ARMeilleure: Implement single stepping (#3133 ) * Decoder: Implement SingleInstruction decoder mode * Translator: Implement Step * DecoderMode: Rename Normal to MultipleBlocks	2022-02-22 11:11:42 -03:00
Berkan Diler	644b497df1	Collapse AsSpan().Slice(..) calls into AsSpan(..) (#3145 ) * Collapse AsSpan().Slice(..) calls into AsSpan(..) Less code and a bit faster * Collapse an Array.Clear(array, 0, array.Length) call to Array.Clear(array)	2022-02-22 10:32:10 -03:00
gdkchan	f2087ca29e	PPTC version increment (#3139 )	2022-02-17 23:52:42 -03:00
gdkchan	92d166ecb7	Enable CPU JIT cache invalidation (#2965 ) * Enable CPU JIT cache invalidation * Invalidate cache on IC IVAU	2022-02-18 02:53:18 +01:00
merry	98e05ee4b7	ARMeilleure: Thumb support (All T16 instructions) (#3105 ) * Decoders: Add InITBlock argument * OpCodeTable: Minor cleanup * OpCodeTable: Remove existing thumb instruction implementations * OpCodeTable: Prepare for thumb instructions * OpCodeTables: Improve thumb fast lookup * Tests: Prepare for thumb tests * T16: Implement BX * T16: Implement LSL/LSR/ASR (imm) * T16: Implement ADDS, SUBS (reg) * T16: Implement ADDS, SUBS (3-bit immediate) * T16: Implement MOVS, CMP, ADDS, SUBS (8-bit immediate) * T16: Implement ANDS, EORS, LSLS, LSRS, ASRS, ADCS, SBCS, RORS, TST, NEGS, CMP, CMN, ORRS, MULS, BICS, MVNS (low registers) * T16: Implement ADD, CMP, MOV (high reg) * T16: Implement BLX (reg) * T16: Implement LDR (literal) * T16: Implement {LDR,STR}{,H,B,SB,SH} (register) * T16: Implement {LDR,STR}{,B,H} (immediate) * T16: Implement LDR/STR (SP) * T16: Implement ADR * T16: Implement Add to SP (immediate) * T16: Implement ADD/SUB (SP) * T16: Implement SXTH, SXTB, UXTH, UTXB * T16: Implement CBZ, CBNZ * T16: Implement PUSH, POP * T16: Implement REV, REV16, REVSH * T16: Implement NOP * T16: Implement LDM, STM * T16: Implement SVC * T16: Implement B (conditional) * T16: Implement B (unconditional) * T16: Implement IT * fixup! T16: Implement ADD/SUB (SP) * fixup! T16: Implement Add to SP (immediate) * fixup! T16: Implement IT * CpuTestThumb: Add randomized tests * Remove inITBlock argument * Address nits * Use index to handle IfThenBlockState * Reduce line noise * fixup * nit	2022-02-17 19:39:45 -03:00
gdkchan	bd412afb9f	Fix small precision error on CPU reciprocal estimate instructions (#3061 ) * Fix small precision error on CPU reciprocal estimate instructions * PPTC version bump	2022-01-29 23:59:34 +01:00
gdkchan	f3bfd799e1	Fix calls passing V128 values on Linux (#3034 ) * Fix calls passing V128 values on Linux * PPTC version bump	2022-01-24 11:23:24 +01:00
gdkchan	f0824fde9f	Add host CPU memory barriers for DMB/DSB and ordered load/store (#3015 ) * Add host CPU memory barriers for DMB/DSB and ordered load/store * PPTC version bump * Revert to old barrier order	2022-01-21 12:47:34 -03:00
sharmander	60f7cba30a	Implement FCVTNS (Scalar GP) (#2953 ) * Implement FCVTNS (Scalar GP) * Update Ptc Version	2022-01-19 22:21:44 -03:00
sharmander	e5f7ff1eee	CPU - Implement FCVTMS (Vector) (#2937 ) * Add FCVTMS_V Implementation to Armeilleure * Fix opcode designation * Add tests * Amend Ptc version * Fix OpCode / Tests * Create Math.Floor helper method + Update implementation * Address gdk comments * Re-address gdk comments * Update ARMeilleure/Decoders/OpCodeTable.cs Co-authored-by: gdkchan <gab.dark.100@gmail.com> * Update Tests to use 2S (4S) and 2D Co-authored-by: gdkchan <gab.dark.100@gmail.com>	2022-01-04 16:45:28 -03:00
Piyachet Kanda	3e2f89b4fd	Implement UHADD8 instruction (#2908 ) * Implement UHADD8 instruction along with a test unit * Update PTC revision number	2021-12-08 17:05:59 -03:00
Mary	f39fce8f54	misc: Migrate usage of RuntimeInformation to OperatingSystem (#2901 ) Very basic migration across the codebase.	2021-12-04 20:02:30 -03:00
FICTURE7	fbf40424f4	Add an early `TailMerge` pass (#2721 ) * Add an early `TailMerge` pass Some translations can have a lot of guest calls and since for each guest call there is a call guard which may return. This can produce a lot of epilogue code for returns. This pass merges the epilogue into a single block. ``` Using filter 'hcq'. Using metric 'code size'. Total diff: -1648111 (-7.19 %) (bytes): Base: 22913847 Diff: 21265736 Improved: 4567, regressed: 14, unchanged: 144 ``` * Set PTC version * Address feedback * Handle `void` returning functions * Actually handle `void` returning functions * Fix `RegisterToLocal` logging	2021-10-18 19:51:22 -03:00
FICTURE7	ecc64c934d	Add `Operand.Label` support to `Assembler` (#2680 ) * Add `Operand.Label` support to `Assembler` This adds label support to `Assembler` and enables branch tightening when compiling with relocatables. Jump management and patching has been moved to the `Assembler`. * Move instruction table to `Assembler.Table` * Set PTC internal version * Rename `Assembler.Table` to `AssemblerTable`	2021-10-05 14:04:55 -03:00
riperiperi	1ae690ba2f	Use normal memory store path for DC ZVA (#2693 ) Seems like this is used as an optimized way to clear memory in homebrew applications. Unfortunately, calling the software fallback method every 8 bytes was not very optimal. The existing EmitStore is used by passing in ZR as the register to get a 0 write.	2021-09-29 01:21:30 +02:00
FICTURE7	0d23504e30	Fix PTC count table relocation patching (#2666 ) Fix an issue introduced in #2190 where by 2 different count table entry addresses were used for LCQ functions. E.g: ```asm .L1: mov rbp,COUNT_TABLE_0 ;; This gets an address. mov ebp,[rbp] lea esi,[rbp+1] mov rdi,COUNT_TABLE_1 ;; This gets another address. mov [rdi],esi cmp ebp,64h je near .L34 ``` This caused LCQ functions to not tier up when they're loaded from the PTC cache. This does not happen when they're freshly compiled. This PR fixes the issue by ensuring only a single counter is created per translation.	2021-09-29 00:28:34 +02:00
FICTURE7	a9343c9364	Refactor `PtcInfo` (#2625 ) * Refactor `PtcInfo` This change reduces the coupling of `PtcInfo` by moving relocation tracking to the backend. `RelocEntry`s remains as `RelocEntry`s through out the pipeline until it actually needs to be written to the PTC streams. Keeping this representation makes inspecting and manipulating relocations after compilations less painful. This is something I needed to do to patch relocations to 0 to diff dumps. Contributes to #1125. * Turn `Symbol` & `RelocInfo` into readonly structs * Add documentation to `CompiledFunction` * Remove `Compiler.Compile<T>` Remove `Compiler.Compile<T>` and replace it by `Map<T>` of the `CompiledFunction` returned.	2021-09-14 01:23:37 +02:00
FICTURE7	f2a7b300c4	Fix type mismatch in `BitwiseAnd` simplification (#2571 ) * Fix type mismatch in `BitwiseAnd` simplification `TryEliminateBitwiseAnd` would turn the `BitwiseAnd` operation into a copy of the wrong type. E.g: Before `Simplification`: ```llvm i64 %0 = BitwiseAnd i64 0x0, %1 ``` After `Simplication`: ```llvm i64 %0 = Copy i32 0x0 ``` Since the with the changes in #2515, we iterate in reverse order and `Simplication`, `ConstantFolding` does not indicate if it modified the CFG, the second pass to "retype" the copy into the proper destination type does not happen. This also blocked copy propagation since its destination type did not match with its source type. But in the cases I've seen, the `PreAllocator` would insert a copy for the propagated constant, which results in no diffs. Since the copy remained as is, asserts are fired when generating it. * Set PPTC version	2021-08-20 14:42:00 -03:00
FICTURE7	22b2cb39af	Reduce JIT GC allocations (#2515 ) * Turn `MemoryOperand` into a struct * Remove `IntrinsicOperation` * Remove `PhiNode` * Remove `Node` * Turn `Operand` into a struct * Turn `Operation` into a struct * Clean up pool management methods * Add `Arena` allocator * Move `OperationHelper` to `Operation.Factory` * Move `OperandHelper` to `Operand.Factory` * Optimize `Operation` a bit * Fix `Arena` initialization * Rename `NativeList<T>` to `ArenaList<T>` * Reduce `Operand` size from 88 to 56 bytes * Reduce `Operation` size from 56 to 40 bytes * Add optimistic interning of Register & Constant operands * Optimize `RegisterUsage` pass a bit * Optimize `RemoveUnusedNodes` pass a bit Iterating in reverse-order allows killing dependency chains in a single pass. * Fix PPTC symbols * Optimize `BasicBlock` a bit Reduce allocations from `_successor` & `DominanceFrontiers` * Fix `Operation` resize * Make `Arena` expandable Change the arena allocator to be expandable by allocating in pages, with some of them being pooled. Currently 32 pages are pooled. An LRU removal mechanism should probably be added to it. Apparently MHR can allocate bitmaps large enough to exceed the 16MB limit for the type. * Move `Arena` & `ArenaList` to `Common` * Remove `ThreadStaticPool` & co * Add `PhiOperation` * Reduce `Operand` size from 56 from 48 bytes * Add linear-probing to `Operand` intern table * Optimize `HybridAllocator` a bit * Add `Allocators` class * Tune `ArenaAllocator` sizes * Add page removal mechanism to `ArenaAllocator` Remove pages which have not been used for more than 5s after each reset. I am on fence if this would be better using a Gen2 callback object like the one in System.Buffers.ArrayPool<T>, to trim the pool. Because right now if a large translation happens, the pages will be freed only after a reset. This reset may not happen for a while because no new translation is hit, but the arena base sizes are rather small. * Fix `OOM` when allocating larger than page size in `ArenaAllocator` Tweak resizing mechanism for Operand.Uses and Assignemnts. * Optimize `Optimizer` a bit * Optimize `Operand.Add<T>/Remove<T>` a bit * Clean up `PreAllocator` * Fix phi insertion order Reduce codegen diffs. * Fix code alignment * Use new heuristics for degree of parallelism * Suppress warnings * Address gdkchan's feedback Renamed `GetValue()` to `GetValueUnsafe()` to make it more clear that `Operand.Value` should usually not be modified directly. * Add fast path to `ArenaAllocator` * Assembly for `ArenaAllocator.Allocate(ulong)`: .L0: mov rax, [rcx+0x18] lea r8, [rax+rdx] cmp r8, [rcx+0x10] ja short .L2 .L1: mov rdx, [rcx+8] add rax, [rdx+8] mov [rcx+0x18], r8 ret .L2: jmp ArenaAllocator.AllocateSlow(UInt64) A few variable/field had to be changed to ulong so that RyuJIT avoids emitting zero-extends. * Implement a new heuristic to free pooled pages. If an arena is used often, it is more likely that its pages will be needed, so the pages are kept for longer (e.g: during PPTC rebuild or burst sof compilations). If is not used often, then it is more likely that its pages will not be needed (e.g: after PPTC rebuild or bursts of compilations). * Address riperiperi's feedback * Use `EqualityComparer<T>` in `IntrusiveList<T>` Avoids a potential GC hole in `Equals(T, T)`.	2021-08-17 15:08:34 -03:00
FICTURE7	9d7627af64	Add multi-level function table (#2228 ) * Add AddressTable<T> * Use AddressTable<T> for dispatch * Remove JumpTable & co. * Add fallback for out of range addresses * Add PPTC support * Add documentation to `AddressTable<T>` * Make AddressTable<T> configurable * Fix table walk * Fix IsMapped check * Remove CountTableCapacity * Add PPTC support for fast path * Rename IsMapped to IsValid * Remove stale comment * Change format of address in exception message * Add TranslatorStubs * Split DispatchStub Avoids recompilation of stubs during tests. * Add hint for 64bit or 32bit * Add documentation to `Symbol` * Add documentation to `TranslatorStubs` Make `TranslatorStubs` disposable as well. * Add documentation to `SymbolType` * Add `AddressTableEventSource` to monitor function table size Add an EventSource which measures the amount of unmanaged bytes allocated by AddressTable<T> instances. dotnet-counters monitor -n Ryujinx --counters ARMeilleure * Add `AllowLcqInFunctionTable` optimization toggle This is to reduce the impact this change has on the test duration. Before everytime a test was ran, the FunctionTable would be initialized and populated so that the newly compiled test would get registered to it. * Implement unmanaged dispatcher Uses the DispatchStub to dispatch into the next translation, which allows execution to stay in unmanaged for longer and skips a ConcurrentDictionary look up when the target translation has been registered to the FunctionTable. * Remove redundant null check * Tune levels of FunctionTable Uses 5 levels instead of 4 and change unit of AddressTableEventSource from KB to MB. * Use 64-bit function table Improves codegen for direct branches: mov qword [rax+0x408],0x10603560 - mov rcx,sub_10603560_OFFSET - mov ecx,[rcx] - mov ecx,ecx - mov rdx,JIT_CACHE_BASE - add rdx,rcx + mov rcx,sub_10603560 + mov rdx,[rcx] mov rcx,rax Improves codegen for dispatch stub: and rax,byte +0x1f - mov eax,[rcx+rax4] - mov eax,eax - mov rcx,JIT_CACHE_BASE - lea rax,[rcx+rax] + mov rax,[rcx+rax8] mov rcx,rbx * Remove `JitCacheSymbol` & `JitCache.Offset` * Turn `Translator.Translate` into an instance method We do not have to add more parameter to this method and related ones as new structures are added & needed for translation. * Add symbol only when PTC is enabled Address LDj3SNuD's feedback * Change `NativeContext.Running` to a 32-bit integer * Fix PageTable symbol for host mapped	2021-05-29 18:06:28 -03:00

1 2 3

137 commits