agx: Insert jmp_exec_none instructions
With the exception of the backwards branch for loops, all the control flow we
insert during instruction selection just predicates instructions rather than
actually jumping around. That means, for example, we execute both sides of the
if even for a uniform condition! That's inefficient. The solution is insert
jmp_exec_none instructions after control flow in order to skip unexecuted
regions, which is much faster than predicating them out. However, jmp_exec_none
is costly in itself, so we need to use a heuristic to determine when it's
actually beneficial.
This uses a very simple heuristic for this purpose. However, it is a massive
performance speed-up for Dolphin uber shaders: 39fps -> 67fps at 2x resolution.
Nearly a doubling of performance!