0
0
Fork 0
Commit graph

73 commits

Author SHA1 Message Date
jhorv
5131b71437
Reducing memory allocations (#4537)
* add RecyclableMemoryStream dependency and MemoryStreamManager

* organize BinaryReader/BinaryWriter extensions

* add StreamExtensions to reduce need for BinaryWriter

* simple replacments of MemoryStream with RecyclableMemoryStream

* add write ReadOnlySequence<byte> support to IVirtualMemoryManager

* avoid 0-length array creation

* rework IpcMessage and related types to greatly reduce memory allocation by using RecylableMemoryStream, keeping streams around longer, avoiding their creation when possible, and avoiding creation of BinaryReader and BinaryWriter when possible

* reduce LINQ-induced memory allocations with custom methods to query KPriorityQueue

* use RecyclableMemoryStream in StreamUtils, and use StreamUtils in EmbeddedResources

* add constants for nanosecond/millisecond conversions

* code formatting

* XML doc adjustments

* fix: StreamExtension.WriteByte not writing non-zero values for lengths <= 16

* XML Doc improvements. Implement StreamExtensions.WriteByte() block writes for large-enough count values.

* add copyless path for StreamExtension.Write(ReadOnlySpan<int>)

* add default implementation of IVirtualMemoryManager.Write(ulong, ReadOnlySequence<byte>); remove previous explicit implementations

* code style fixes

* remove LINQ completely from KScheduler/KPriorityQueue by implementing a custom struct-based enumerator
2023-03-17 13:14:50 +01:00
riperiperi
1fc90e57d2
Update range for remapped sparse textures instead of recreating them (#4442)
* Update sparsely mapped texture ranges without recreating

Important TODO in TexturePool. Smaller TODO: should I look into making textures with views also do this? It needs to be able to detect if the views can be instantly deleted without issue if they're now remapped.

* Actually do partial updates

* Signal group dirty after mappings changed

* Fix various issues (should work now)

* Further optimisation

Should load a lot less data (16x) when partial updating 3d textures.

* Improve stability

* Allow granular uploads on large textures, improve rules

* Actually avoid updating slices that aren't modified.

* Address some feedback, minor optimisation

* Small tweak

* Refactor DereferenceRequest

More specific initialization methods.

* Improve code for resetting handles

* Explain data loading a bit more

* Add some safety for setting null from different threads.

All texture sets come from the one thread, but null sets can come from multiple. Only decrement ref count if we succeeded the null set first.

* Address feedback 1

* Make a bit safer
2023-03-14 17:08:44 -03:00
riperiperi
fc43aecbbd
Memory: Faster Split for NonOverlappingRangeList (#4451)
I noticed that in Xenoblade 2, the game can end up spending a lot of time adding and removing tracking handles. One of the main causes of this is actually splitting existing handles, which does the following:

- Remove existing handle from list
- Update existing handle to end at split address, create new handle starting at split address
- Add updated handle (left) to list
- Add new handle (right) to list

This costs 1 deletion and 2 insertions. When there are more handles, this gets a lot more expensive, as insertions are done by copying all values to the right, and deletions by copying values to the left.

This PR simply allows it to look up the handle being split, and replace its entry with the new end address without insertion or deletion. This makes a split only cost one insertion and a binary search lookup (very cheap). This isn't all of the cost on Xenoblade 2, but it does significantly reduce it.

There might be something else to this - we could find a way to reduce the handle count for the game (merging on deletion? buffer deletion?), we could use a different structure for virtual regions, as the current one is optimal for buffer lookups which nearly always read, memory tracking has more of a balance between read/write. That's for a later date though, this was an easy improvment.
2023-02-21 10:53:38 +01:00
gdkchan
efb135b74c
Clear CPU side data on GPU buffer clears (#4125)
* Clear CPU side data on GPU buffer clears

* Implement tracked fill operation that can signal other resource types except buffer

* Fix tests, add missing XML doc

* PR feedback
2023-02-16 18:28:49 -03:00
gdkchan
86fd0643c2
Implement support for page sizes > 4KB (#4252)
* Implement support for page sizes > 4KB

* Check and work around more alignment issues

* Was not meant to change this

* Use MemoryBlock.GetPageSize() value for signal handler code

* Do not take the path for private allocations if host supports 4KB pages

* Add Flags attribute on MemoryMapFlags

* Fix dirty region size with 16kb pages

Would accidentally report a size that was too high (generally 16k instead of 4k, uploading 4x as much data)

Co-authored-by: riperiperi <rhy3756547@hotmail.com>
2023-01-17 05:13:24 +01:00
gnisman
b402b4e7f6
Change GetPageSize to use Environment.SystemPageSize (#4291)
* Change GetPageSize to use Environment.SystemPageSize

* Fix PR comment
2023-01-14 15:37:04 -03:00
gdkchan
5e0f8e8738
Implement JIT Arm64 backend (#4114)
* Implement JIT Arm64 backend

* PPTC version bump

* Address some feedback from Arm64 JIT PR

* Address even more PR feedback

* Remove unused IsPageAligned function

* Sync Qc flag before calls

* Fix comment and remove unused enum

* Address riperiperi PR feedback

* Delete Breakpoint IR instruction that was only implemented for Arm64
2023-01-10 19:16:59 -03:00
Mary-nyan
b6614c6ad5
chore: Update tests dependencies (#3978)
* chore: Update tests dependencies

* Apply TSR Berry suggestion to add a GC.SuppressFinalize in MemoryBlock.cs

* Ensure we wait for the test thread to be dead on PartialUnmap

* Use platform attribute for os specific tests

* Make P/Invoke methods private

* Downgrade NUnit3TestAdapter to 4.1.0

* test: Disable warning about platform compat for ThreadLocalMap()

Co-authored-by: TSR Berry <20988865+TSRBerry@users.noreply.github.com>
2023-01-01 17:35:29 +01:00
Berkan Diler
0d3b82477e
Use new ArgumentNullException and ObjectDisposedException throw-helper API (#4163) 2022-12-27 20:27:11 +01:00
Isaac Marovitz
0fbcd630bc
Replace DllImport usage with LibraryImport (#4084)
* Replace usage of `DllImport` with `LibraryImport`

* Mark methods as `partial`

* Marshalling

* More `partial` & marshalling

* More `partial` and marshalling

* More partial and marshalling

* Update GdiPlusHelper to LibraryImport

* Unicorn

* More Partial

* Marshal

* Specify EntryPoint

* Specify EntryPoint

* Change GlobalMemoryStatusEx to LibraryImport

* Change RegisterClassEx to LibraryImport

* Define EntryPoints

* Update Ryujinx.Ava/Ui/Controls/Win32NativeInterop.cs

Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>

* Update Ryujinx.Graphics.Nvdec.FFmpeg/Native/FFmpegApi.cs

Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>

* Move return mashal

* Remove calling convention specification

* Remove calling conventions

* Update Ryujinx.Common/SystemInfo/WindowsSystemInfo.cs

Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>

* Update Ryujinx/Modules/Updater/Updater.cs

Co-authored-by: Mary-nyan <thog@protonmail.com>

* Update Ryujinx.Ava/Modules/Updater/Updater.cs

Co-authored-by: Mary-nyan <thog@protonmail.com>

Co-authored-by: TSRBerry <20988865+TSRBerry@users.noreply.github.com>
Co-authored-by: Mary-nyan <thog@protonmail.com>
2022-12-15 18:07:31 +01:00
riperiperi
e211c3f00a
UI: Add Metal surface creation for MoltenVK (#3980)
* Initial implementation of metal surface across UIs

* Fix SDL2 on windows

* Update Ryujinx/Ryujinx.csproj

Co-authored-by: Mary-nyan <thog@protonmail.com>

* Address Feedback

Co-authored-by: Mary-nyan <thog@protonmail.com>
2022-12-06 19:00:25 -03:00
Andrey Sukharev
4da44e09cb
Make structs readonly when applicable (#4002)
* Make all structs readonly when applicable. It should reduce amount of needless defensive copies

* Make structs with trivial boilerplate equality code record structs

* Remove unnecessary readonly modifiers from TextureCreateInfo

* Make BitMap structs readonly too
2022-12-05 14:47:39 +01:00
merry
a5c2aead67
ConcurrentBitmap: Use Interlocked Or/And (#3937) 2022-11-29 13:47:57 +00:00
riperiperi
65778a6b78
GPU: Don't trigger uploads for redundant buffer updates (#3828)
* Initial implementation

* Actually do The Thing

* Add remark about performance to IVirtualMemoryManager
2022-11-24 15:50:15 +01:00
riperiperi
a16682cfd3
Allow _volatile to be set from MultiRegionHandle checks again (#3830)
* Allow _volatile to be set from MultiRegionHandle checks again

Tracking handles have a `_volatile` flag which indicates that the resource being tracked is modified every time it is used under a new sequence number. This is used to reduce the time spent reprotecting memory for tracking writes to commonly modified buffers, like constant buffers.

This optimisation works by detecting if a buffer is modified every time a check happens. If a buffer is checked but it is not dirty, then that data is likely not modified every sequence number, and should use memory protection for write tracking. If the opposite is the case all the time, it is faster to just assume it's dirty as we'd just be wasting time protecting the memory.

The new MultiRegionBitmap could not notify handles that they had been checked as part of the fast bitmap lookup, so bindings larger than 4096 bytes wouldn't trigger it at all. This meant that they would be subject to a ton of reprotection if they were modified often.

This does mean there are two separate sources for a _volatile set: VolatileOrDirty + _checkCount, and the bitmap check. These shouldn't interfere with each other, though.

This fixes performance regressions from #3775 in Pokemon Sword, and hopefully Yu-Gi-Oh! RUSH DUEL: Dawn of the Battle Royale. May affect other games.

* Fix stupid mistake
2022-11-18 02:54:20 +00:00
Mary-nyan
c6d05301aa
infra: Migrate to .NET 7 (#3795)
* Update readme to mention .NET 7

* infra: Migrate to .NET 7

.NET 7 is still in preview but this prepare for the release coming up
next month.

* Use Random.Shared in CreateRandom

* Move UInt128Utils.cs to Ryujinx.Common project

* Fix inverted parameters in System.UInt128 constructor

* Fix Visual Studio complains on  Ryujinx.Graphics.Vic

* time: Fix missing alignment enforcement in SystemClockContext

Fixes at least Smash

* time: Fix missing alignment enforcement in SteadyClockContext

Fix games (like recent version of Smash) using time shared memory

* Switch to .NET 7.0.100 release

* Enable Tiered PGO

* Ensure CreateId validity requirements are meet when doing random generation

Also enforce correct packing layout for other Mii structures.

This fix a Mario Kart 8 crashes related to the default Miis.
2022-11-09 20:22:43 +01:00
riperiperi
3d98e1361b
GPU: Use a bitmap to track buffer modified flags. (#3775)
* Initial implementation

* Some improvements.

* Fix incorrect cast

* Performance improvement and improved correctness

* Add very fast path when all handles are checked.

* Slightly faster

* Add comment

* De-virtualize region handle

All region handles are now bitmap backed.

* Remove non-bitmap tracking

* Remove unused methods

* Add docs, remove unused methods

* Address Feedback

* Rename file
2022-10-29 22:07:37 +00:00
gdkchan
7d26e4ac7b
Fix mapping leaks caused by UnmapView not working on Linux (#3650)
* Add test for UnmapView mapping leaks

* Throw when UnmapView fails on Linux

* Fix UnmapView

* Remove throw
2022-10-19 01:02:45 +00:00
gdkchan
356e480bf5
Fix partial unmap reprotection on Windows (#3702) 2022-09-14 17:46:37 +02:00
gdk
7dd69f2d0e Allocation free tree lookup 2022-09-10 16:23:49 +02:00
gdk
c646638680 Update several methods to use GetNode directly and avoid array allocations 2022-09-10 16:23:49 +02:00
gdk
65f2a82b97 Optimize PlaceholderManager.UnreserveRange 2022-09-10 16:23:49 +02:00
gdk
93dd6d525a Fix potential issue with partial unmap
We must also do the unmap operation with the RWLock, otherwise faults on the unmapped region will cause crashes and the whole thing becomes pointless
2022-09-10 16:23:49 +02:00
gdk
96d4ad952c Fix reprotection regression 2022-09-10 16:23:49 +02:00
gdk
45e520a27c Rewrite PlaceholderManager4KB to use intrusive RBTree, and to coalesce free placeholders
Also make the other placeholder manager use intrusive RBTree, allows the IntervalTree that was added just for this to be deleted
2022-09-10 16:23:49 +02:00
gdkchan
6922862db8
Optimize kernel memory block lookup and consolidate RBTree implementations (#3410)
* Implement intrusive red-black tree, use it for HLE kernel block manager

* Implement TreeDictionary using IntrusiveRedBlackTree

* Implement IntervalTree using IntrusiveRedBlackTree

* Implement IntervalTree (on Ryujinx.Memory) using IntrusiveRedBlackTree

* Make PredecessorOf and SuccessorOf internal, expose Predecessor and Successor properties on the node itself

* Allocation free tree node lookup
2022-08-26 18:21:48 +00:00
Nicholas Rodine
951700fdd8
Removed unused usings. (#3593)
* Removed unused usings.

* Added back using, now that it's used.

* Removed extra whitespace.
2022-08-18 18:04:54 +02:00
riperiperi
14ce9e1567
Move partial unmap handler to the native signal handler (#3437)
* Initial commit with a lot of testing stuff.

* Partial Unmap Cleanup Part 1

* Fix some minor issues, hopefully windows tests.

* Disable partial unmap tests on macos for now

Weird issue.

* Goodbye magic number

* Add COMPlus_EnableAlternateStackCheck for tests

`COMPlus_EnableAlternateStackCheck` is needed for NullReferenceException handling to work on linux after registering the signal handler, due to how dotnet registers its own signal handler.

* Address some feedback

* Force retry when memory is mapped in memory tracking

This case existed before, but returning `false` no longer retries, so it would crash immediately after unprotecting the memory... Now, we return `true` to deliberately retry.

This case existed before (was just broken by this change) and I don't really want to look into fixing the issue right now. Technically, this means that on guest code partial unmaps will retry _due to this_ rather than hitting the handler. I don't expect this to cause any issues.

This should fix random crashes in Xenoblade Chronicles 2.

* Use IsRangeMapped

* Suppress MockMemoryManager.UnmapEvent warning

This event is not signalled by the mock memory manager.

* Remove 4kb mapping
2022-07-29 19:16:29 -03:00
gdkchan
232b1012b0
Fix ThreadingLock deadlock on invalid access and TerminateProcess (#3407) 2022-06-24 02:53:16 +02:00
gdkchan
dd8f97ab9e
Remove freed memory range from tree on memory block disposal (#3347)
* Remove freed memory range from tree on memory block disposal

* PR feedback
2022-06-05 15:12:42 -03:00
gdkchan
54deded929
Fix shared memory leak on Windows (#3319)
* Fix shared memory leak on Windows

* Fix memory leak caused by RO session disposal not decrementing the memory manager ref count

* Fix UnmapViewInternal deadlock

* Was not supposed to add those back
2022-05-05 14:58:59 -03:00
gdkchan
074190e03c
Remove AddProtection count > 0 assert (#3315) 2022-05-04 14:07:10 -03:00
gdkchan
95017b8c66
Support memory aliasing (#2954)
* Back to the origins: Make memory manager take guest PA rather than host address once again

* Direct mapping with alias support on Windows

* Fixes and remove more of the emulated shared memory

* Linux support

* Make shared and transfer memory not depend on SharedMemoryStorage

* More efficient view mapping on Windows (no more restricted to 4KB pages at a time)

* Handle potential access violations caused by partial unmap

* Implement host mapping using shared memory on Linux

* Add new GetPhysicalAddressChecked method, used to ensure the virtual address is mapped before address translation

Also align GetRef behaviour with software memory manager

* We don't need a mirrorable memory block for software memory manager mode

* Disable memory aliasing tests while we don't have shared memory support on Mac

* Shared memory & SIGBUS handler for macOS

* Fix typo + nits + re-enable memory tests

* Set MAP_JIT_DARWIN on x86 Mac too

* Add back the address space mirror

* Only set MAP_JIT_DARWIN if we are mapping as executable

* Disable aliasing tests again (still fails on Mac)

* Fix UnmapView4KB (by not casting size to int)

* Use ref counting on memory blocks to delay closing the shared memory handle until all blocks using it are disposed

* Address PR feedback

* Make RO hold a reference to the guest process memory manager to avoid early disposal

Co-authored-by: nastys <nastys@users.noreply.github.com>
2022-05-02 20:30:02 -03:00
riperiperi
4a892fbdc9
Fix flush action from multiple threads regression (#3311)
If two or more threads encounter a region of memory where a read action has been registered, then they must _both_ wait on the data.

Clearing the action before it completed was causing the null check above to fail, so the action would only be run on the first thread, and the second would end up continuing without waiting. Depending on what the game does, this could be disasterous.

This fixes a regression introduced by #3302 with Pokemon Legends Arceus, and possibly Catherine. There are likely other affected games. What is fixed in that PR should still be fixed.
2022-05-02 12:31:53 +02:00
riperiperi
d64594ec74
Fix various issues with texture sync (#3302)
* Fix various issues with texture sync

A variable called _actionRegistered is used to keep track of whether a tracking action has been registered for a given texture group handle. This variable is set when the action is registered, and should be unset when it is consumed. This is used to skip registering the tracking action if it's already registered, saving some time for render targets that are modified very often.

There were two issues with this. The worst issue was that the tracking action handler exits early if the handle's modified flag is false... which means that it never reset _actionRegistered, as that was done within the Sync() method called later. The second issue was that this variable was set true after the sync action was registered, so it was technically possible for the action to run immediately, set the flag to false, then set it to true.

Both situations would lead to the action never being registered again, as the texture group handle would be sure the action is already registered. This breaks the texture for the remaining runtime, or until it is disposed.

It was also possible for a texture to register sync once, then on future frames the last modified sync number did not update. This may have caused some more minor issues.

Seems to fix the Xenoblade flashing bug. Obviously this needs a lot of testing, since it was random chance. I typically had the most luck getting it to happen by switching time of day on the event theatre screen for a while, then entering the equipment screen by pressing X on an event.

May also fix weird things like random chance air swimming in BOTW, maybe a few texture streaming bugs.

* Exchange rather than CompareExchange
2022-04-29 18:34:11 -03:00
gdkchan
0a24aa6af2
Allow textures to have their data partially mapped (#2629)
* Allow textures to have their data partially mapped

* Explicitly check for invalid memory ranges on the MultiRangeList

* Update GetWritableRegion to also support unmapped ranges
2022-02-22 13:34:16 -03:00
riperiperi
cda659955c
Texture Sync, incompatible overlap handling, data flush improvements. (#2971)
* Initial test for texture sync

* WIP new texture flushing setup

* Improve rules for incompatible overlaps

Fixes a lot of issues with Unreal Engine games. Still a few minor issues (some caused by dma fast path?) Needs docs and cleanup.

* Cleanup, improvements

Improve rules for fast DMA

* Small tweak to group together flushes of overlapping handles.

* Fixes, flush overlapping texture data for ASTC and BC4/5 compressed textures.

Fixes the new Life is Strange game.

* Flush overlaps before init data, fix 3d texture size/overlap stuff

* Fix 3D Textures, faster single layer flush

Note: nosy people can no longer merge this with Vulkan. (unless they are nosy enough to implement the new backend methods)

* Remove unused method

* Minor cleanup

* More cleanup

* Use the More Fun and Hopefully No Driver Bugs method for getting compressed tex too

This one's for metro

* Address feedback, ASTC+ETC to FormatClass

* Change offset to use Span slice rather than IntPtr Add

* Fix this too
2022-01-09 13:28:48 -03:00
Mary
00c69f2098
Remove usage of Mono.Posix.NETStandard accross all projects (#2906)
* Remove usage of Mono.Posix.NETStandard in Ryujinx project

* Remove usage of Mono.Posix.NETStandard in ARMeilleure project

* Remove usage of Mono.Posix.NETStandard in Ryujinx.Memory project

* Address gdkchan's comments
2021-12-08 18:24:26 -03:00
Mary
f39fce8f54
misc: Migrate usage of RuntimeInformation to OperatingSystem (#2901)
Very basic migration across the codebase.
2021-12-04 20:02:30 -03:00
Mary
57d3296ba4
infra: Migrate to .NET 6 (#2829)
* infra: Migrate to .NET 6

* Rollback version naming change

* Workaround .NET 6 ZipArchive API issues

* ci: Switch to VS 2022 for AppVeyor

CI is now ready for .NET 6

* Suppress WebClient warning in DoUpdateWithMultipleThreads

* Attempt to workaround System.Drawing.Common changes on 6.0.0

* Change keyboard rendering from System.Drawing to ImageSharp

* Make the software keyboard renderer multithreaded

* Bump ImageSharp version to 1.0.4 to fix a bug in Image.Load

* Add fallback fonts to the keyboard renderer

* Fix warnings

* Address caian's comment

* Clean up linux workaround as it's uneeded now

* Update readme

Co-authored-by: Caian Benedicto <caianbene@gmail.com>
2021-11-28 21:24:17 +01:00
Mary
b4dc33efc2
kernel: Clear pages allocated with SetHeapSize (#2776)
* kernel: Clear pages allocated with SetHeapSize

Before this commit, all new pages allocated by SetHeapSize were not
cleared by the kernel.

This would cause undefined data to be pass to the userland and possibly
resulting in weird memory corruption.

This commit also add support for custom fill heap and ipc value (that is also
supported by the official kernel)

* Remove dots at the end of KPageTableBase.MapPages new documentation

* Remove unused _stackFillValue
2021-10-24 18:52:59 -03:00
Mary
85d8d1d7ca
misc: Fix IVirtualMemoryManager.Fill ignoring value (#2775)
This fix IVirtualMemoryManager.Fill to actually use the provided fill
value instead of 0.

This have no implication at the moment as everything that use it pass 0
but it is needed for some upcoming kernel fixes.
2021-10-24 18:16:59 -03:00
riperiperi
fff48bb45a
Smaller initial size for ModifiedRangeList & directly inherit range list (#2663)
This fixes a potential regression with the new range list changes, where the cost for creating new ones would be rather large due to creating a 1024 size array. Also reduces cost for range list inheritance by using the first existing range list as a base, rather than creating a new one then adding both lists to it.

The growth size for the RangeList is now identical to its initial size. Every 32 elements was probably a little too common - now it is 1024 for most things and 8 for the buffer modified range list.

The Unmapped and SyncMethod methods have been changed to ensure that they behave properly if the range list is set null. Cleaned up a few calls to use the null-conditional operator.
2021-10-04 15:38:59 -03:00
riperiperi
d92fff541b
Replace CacheResourceWrite with more general "precise" write (#2684)
* Replace CacheResourceWrite with more general "precise" write

The goal of CacheResourceWrite was to notify GPU resources when they were modified directly, by looking up the modified address/size in a structure and calling a method on each resource. The downside of this is that each resource cache has to be queried individually, they all have to implement their own way to do this, and it can only signal to resources using the same PhysicalMemory instance.

This PR adds the ability to signal a write as "precise" on the tracking, which signals a special handler (if present) which can be used to avoid unnecessary flush actions, or maybe even more. For buffers, precise writes specifically do not flush, and instead punch a hole in the modified range list to indicate that the data on GPU has been replaced.

The downside is that precise actions must ignore the page protection bits and always signal - as they need to notify the target resource to ignore the sequence number optimization.

I had to reintroduce the sequence number increment after I2M, as removing it was causing issues in rabbids kingdom battle. However - all resources modified by I2M are notified directly to lower their sequence number, so the problem is likely that another unrelated resource is not being properly updated. Thankfully, doing this does not affect performance in the games I tested.

This should fix regressions from #2624. Test any games that were broken by that. (RF4, rabbids kingdom battle)

I've also added a sequence number increment to ThreedClass.IncrementSyncpoint, as it seems to fix buffer corruption in OpenGL homebrew. (this was a regression from removing sequence number increment from constant buffer update - another unrelated resource thing)

* Add tests.

* Add XML docs for GpuRegionHandle

* Skip UpdateProtection if only precise actions were called

This allows precise actions to skip reprotection costs.
2021-09-29 02:27:03 +02:00
riperiperi
db97b1d7d2
Implement and use an Interval Tree for the MultiRangeList (#2641)
* Implement and use an Interval Tree for the MultiRangeList

* Feedback

* Address Feedback

* Missed this somehow
2021-09-19 14:55:07 +02:00
riperiperi
7379bc2f39
Array based RangeList that caches Address/EndAddress (#2642)
* Array based RangeList that caches Address/EndAddress

In isolation, this was more than 2x faster than the RangeList that checks using the interface. In practice I'm seeing much better results than I expected. The array is used because checking it is slightly faster than using a list, which loses time to struct copies, but I still want that data locality.

A method has been added to the list to update the cached end address, as some users of the RangeList currently modify it dynamically.

Greatly improves performance in Super Mario Odyssey, Xenoblade and any other GPU limited games.

* Address Feedback
2021-09-19 14:22:26 +02:00
riperiperi
54adc5f9fb
Ensure that all threads wait for a read tracking action to complete. (#2597)
* Lock around tracking action consume + execute. Not particularly fast.

* Lock around preaction registration and use

* Create a lock object

* Nit
2021-08-29 16:03:41 -03:00
riperiperi
ec3e848d79
Add a Multithreading layer for the GAL, multi-thread shader compilation at runtime (#2501)
* Initial Implementation

About as fast as nvidia GL multithreading, can be improved with faster command queuing.

* Struct based command list

Speeds up a bit. Still a lot of time lost to resource copy.

* Do shader init while the render thread is active.

* Introduce circular span pool V1

Ideally should be able to use structs instead of references for storing these spans on commands. Will try that next.

* Refactor SpanRef some more

Use a struct to represent SpanRef, rather than a reference.

* Flush buffers on background thread

* Use a span for UpdateRenderScale.

Much faster than copying the array.

* Calculate command size using reflection

* WIP parallel shaders

* Some minor optimisation

* Only 2 max refs per command now.

The command with 3 refs is gone. 😌

* Don't cast on the GPU side

* Remove redundant casts, force sync on window present

* Fix Shader Cache

* Fix host shader save.

* Fixup to work with new renderer stuff

* Make command Run static, use array of delegates as lookup

Profile says this takes less time than the previous way.

* Bring up to date

* Add settings toggle. Fix Muiltithreading Off mode.

* Fix warning.

* Release tracking lock for flushes

* Fix Conditional Render fast path with threaded gal

* Make handle iteration safe when releasing the lock

This is mostly temporary.

* Attempt to set backend threading on driver

Only really works on nvidia before launching a game.

* Fix race condition with BufferModifiedRangeList, exceptions in tracking actions

* Update buffer set commands

* Some cleanup

* Only use stutter workaround when using opengl renderer non-threaded

* Add host-conditional reservation of counter events

There has always been the possibility that conditional rendering could use a query object just as it is disposed by the counter queue. This change makes it so that when the host decides to use host conditional rendering, the query object is reserved so that it cannot be deleted. Counter events can optionally start reserved, as the threaded implementation can reserve them before the backend creates them, and there would otherwise be a short amount of time where the counter queue could dispose the event before a call to reserve it could be made.

* Address Feedback

* Make counter flush tracked again.

Hopefully does not cause any issues this time.

* Wait for FlushTo on the main queue thread.

Currently assumes only one thread will want to FlushTo (in this case, the GPU thread)

* Add SDL2 headless integration

* Add HLE macro commands.

Co-authored-by: Mary <mary@mary.zone>
2021-08-27 00:31:29 +02:00
gdkchan
bb8a920b63
Do not dirty memory tracking region handles if they are partially unmapped (#2536) 2021-08-11 21:50:33 +02:00
riperiperi
4b60371e64
Return mapped buffer pointer directly for flush, WriteableRegion for textures (#2494)
* Return mapped buffer pointer directly for flush, WriteableRegion for textures

A few changes here to generally improve performance, even for platforms not using the persistent buffer flush.

- Texture and buffer flush now return a ReadOnlySpan<byte>. It's guaranteed that this span is pinned in memory, but it will be overwritten on the next flush from that thread, so it is expected that the data is used before calling again.
- As a result, persistent mappings no longer copy to a new array - rather the persistent map is returned directly as a Span<>. A similar host array is used for the glGet flushes instead of allocating new arrays each time.
- Texture flushes now do their layout conversion into a WriteableRegion when the texture is not MultiRange, which allows the flush to happen directly into guest memory rather than into a temporary span, then copied over. This avoids another copy when doing layout conversion.

Overall, this saves 1 data copy for buffer flush, 1 copy for linear textures with matching source/target stride, and 2 copies for block textures or linear textures with mismatching strides.

* Fix tests

* Fix array pointer for Mesa/Intel path

* Address some feedback

* Update method for getting array pointer.
2021-07-19 19:10:54 -03:00