neon: Fix discrepancy when using loadupdb in 32-bit
Hello,
This MR fixes the discrepancy/regression in gst-plugins-base libs/video
test_video_pack_unpack2
.
The discrepancy occurs when the output array is 8-byte aligned but not 16-byte aligned and loop shift is 1. In this case, ORC starts by running a single iteration of loop shift 0, in order to obtain a 16-byte aligned destination array. This iteration increments the output offset (i) by 1. However, since the loadupdb operation is array[i>>1], the input stays the same and the output is shifted.
We correct the problem by shifting the output of loadupdb by 1 when the output offset is odd.
The output of the libs/video test is now the same on ARM 32 bits and 64 bits. This should solve #32 (closed).
Thank you!