draw-prim-rate: lots of changes to measure primitive throughput
- measure 2 non-reused vertices per primitive
- measure only 2, 4, 6, 8 varyings
- measure only 2K, 8K, 32K, 128K, 512K vertices per draw
- use a multiplication in the shader to prevent the new linker from moving the FS code to VS
- measure only fully cached vertices for now
- print one less decimal place in results