terakan: Samplers with unnormalized coordinates via shader patching on R8xx
R9xx introduces the FORCE_UNNORMALIZED
bit in samplers that directly corresponds to the unnormalizedCoordinates
option in Vulkan samplers, similarly to how it's implemented on GCN. However, on R8xx, unnormalized coordinates are toggled only in fetch instructions. This maps nicely to sampler2DRect
in OpenGL, but requires additional work to support Vulkan's per-sampler setting for samplers not bound as immutable in the pipeline layout.
For dynamically indexed samplers, if
/else
needs to be generated — likely in pipeline layout lowering in NIR that converts Vulkan bindings to texture_index
/sampler_index
and offsets — invoking the NIR texture instruction with either normalized or unnormalized coordinates. The bits of which samplers need unnormalized coordinates can be passed alongside push constants, one uint32 per shader stage.
Whether a NIR texture instruction should have unnormalized coordinates forced can be specified in its backend_flags
— this can be done for both the dynamic indexing conditional and immutable samplers with unnormalized coordinates.
However, many 2D texture sampling instructions (including ImageSampleExplicitLod
with explicit gradients, thus possible even in a 3D world scenario) can potentially be used with unnormalized coordinates, even though that happens very rarely in reality, so the GPU-side conditional option may not be very viable for constant-indexed samplers.
Instead, sample instructions in shaders should be patched depending on whether the currently bound samplers need unnormalized coordinates. Fully recompiling shaders is not an option because that's conceptually prohibited by the performance implications and the validation of VK_EXT_shader_object
, and would be excessive anyway, but merely creating a copy of the shader with instructions at specified addresses having a few bits changed likely should be quick enough.
Shaders initially should be compiled and uploaded with all non-Rect
sampling instructions having unnormalized coordinates disabled. The resulting microcode needs to be placed in some CPU-cached memory where it can be read from (reads should happen rarely, so just making shader BOs in host-visible device-local memory host-cached on R8xx should be fine to avoid duplication, at least unless Terakan starts using DMA to upload shaders to non-host-visible memory), and after compiling, it needs to be parsed to locate all sampling instructions referencing constant-indexed non-immutable samplers, and the addresses of those instructions need to be stored in arrays for each of the samplers. Each shader needs to maintain a hash map of its variations, indexed by the mask of relevant currently bound samplers with unnormalized coordinates, with a reader/writer lock. If the shader or the bindings with unnormalized coordinates are changed, and any required sampler needs unnormalized coordinates, an RL-check/WL-recheck/WL-update needs to be done to ensure there's a copy of the shader with the needed unnormalized coordinate flags.