Add a couple very large temporary array tests
These tickle a corner-case for Intel because they force a non-zero "Per Thread Scratch Space" value and access outside the first 1K. (A value of zero for PTSS indicates 1K of scratch.) I pity the fool who tries to run them without a large-array-to-scratch optimization.