intel/brw: q2rtx RT pipeline fails EU validation
Running fossilize-replay --compute-pipeline-range 1 2 fossils/q2rtx/q2rtx-rt-pipeline.976f4ab1c0fee975.1.foz
results in:
ASSERT: Scalar CS validation failed!
load_payload(8) v39+0.0:HF, v40+0.0:HF, v40+1.0:HF, v40+2.0:HF, (null):UD
../../SOURCE/master/src/intel/compiler/brw_fs_validate.cpp:189: A <= B failed
A = inst->dst.offset / REG_SIZE + regs_written(inst) = 3
B = s.alloc.sizes[inst->dst.nr] = 2
Bisected to:
$ git bisect good
0116430d394c2509fedff9f3accce6445349a091 is the first bad commit
commit 0116430d394c2509fedff9f3accce6445349a091 (main)
Author: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Date: Tue Jul 30 23:04:34 2024 -0700
intel/brw: Handle 16-bit sampler return payloads
API requires samplers to return 32-bit even though hardware can handle
16-bit floating point, so we detect that case and make more efficient
use of memory BW. This is helping improve performance of encode and
decode tokens during LLM by at least 5% across multiple platforms.
Thank you Kenneth Graunke for suggesting and guiding me throughout
this implementation.
Signed-off-by: Sushma Venkatesh Reddy <sushma.venkatesh.reddy@intel.com>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Part-of: <https://gitlab.freedesktop.org/mesa/mesa/-/merge_requests/30447>
src/intel/compiler/brw_fs_nir.cpp | 21 +++++++++++++++------
src/intel/compiler/brw_lower_logical_sends.cpp | 9 ++++++---
src/intel/compiler/brw_nir.c | 6 ++++++
3 files changed, 27 insertions(+), 9 deletions(-)
Since this has passed every other test suite, a Crucible test should be added once the root cause is found.
CC: @kwg