terakan: DXVK support (R8xx UAV counters, etc.)
This issue is a tracker of DXVK support roadblocks.
Random Access Target count limitations
R9xx with virtual memory should likely support DXVK on Vulkan 1.3 directly if buffer device address is exposed (by providing one global 4 GB R32 RAT exposing all the memory in the 32-bit-addressed memory types as the kernel driver doesn't validate fetch constants with virtual memory, listing all BOs supporting buffer device address in all submissions), and large numbers of storage buffers are implemented on top of buffer device address internally (by passing their addresses/sizes alongside push constants — so Direct3D 11 can still get 15 constant cache bindings).
R8xx has a very tight UAV (RAT) count limit of 12 minus RTV count, however.
The pure device limits of Terakan would be sufficient if changes are made to DXVK:
- Structured buffers are implemented as texel buffers rather than storage buffers. Internally, storage buffers on TeraScale would be
R32_UINT
vertex/RAT buffers anyway, so texel buffers are (with the exception of the maximum element count as it's limited by 128-bit views for them, but256*2^20
is enough, Direct3D 11 only requires128*2^20
for byte address buffers which are R32 internally) effectively supersets of storage buffers. This makes it possible to split the 12 RATs into 8 texel buffers and 4 storage buffers, and 8 texel buffers would be directly sufficient for Direct3D 11.0 UAVs. - All UAV counters are bound as a single storage buffer using one of those 4 slots, as opposed to being in separate storage buffers (which would require up to 8 additional storage buffer bindings). A naïve implementation would be a single buffer with
2^20
counters, for each possible UAV (given the Nvidia and Intel total view count limitation due to 20-bit texture indices), but that'd require 4 MB of memory even though generally games probably use tiny numbers of UAV counters. A smarter, but much more complex implementation would involve caching recently used counters in a small buffer.
These changes, however, are very unlikely to end up in the upstream version of DXVK. A fork can be created of course, but maintaining it will likely be complicated.
However, some horrible hacks can be implemented on the side of Terakan itself that would possibly make DXVK work by partially lying about the capabilities.
Terakan treats texel and storage buffer bindings the same way, so the 12 RATs can be split between the two dynamically. Exposing 8 storage and 8 texel buffers in fragment shaders would be conformant anyway as it has the total fragment output limit. Compute shaders, however, don't, and that's the reason why one of the two has to be limited to something as low as Vulkan's minimum requirement of 4. But Direct3D 11 games never need more than 8 combined regardless of the shader stage, so it will stay in the total limit of 12 in compute shaders too — thus it should be okay to lie specifically to DXVK that Terakan supports 8 of both.
However, while the data views themselves can't overflow the RAT limit, counters are implemented as separate storage buffers, and if the game uses more than 4 UAVs with counters plus 4 UAVs without counters, or more than 6 UAVs with counters alone, or something in between, there won't be enough RAT slots for both data and counters.
Specifically for UAV counters, Terakan can do something extremely dirty — detect if the storage buffer is used purely as a Direct3D 11.0 UAV counter (needs some investigation, but possibly if it's not indexed dynamically, and if the only way it's accessed is atomic increment of the constant location 0), and if it is, and it doesn't fit in the 12 RAT slots, demote it to a GDS counter using DS_INST_INC
. With this approach, DXVK needs to be lied to that Terakan supports 16 storage buffers per stage, however.
Though most likely, DXVK will be forked for R8xx instead. I can't imagine all that hacky mess above being accepted to the Mesa repository.