encode: Refine encode's sink caps.
The old manner to get the encode's sink caps is not correct. Such as 264 encode, it gets:
video/x-raw(memory:VASurface), format=(string){ ENCODED, NV12, I420, YV12, YUY2, UYVY, Y210, P010_10LE, AYUV, Y410, Y444 }, width=(int)[ 32, 4096 ], height=(int)[ 32, 4096 ], framerate=(fraction)[ 0/1, 2147483647/1 ]; video/x-raw(memory:DMABuf), format=(string){ I420, YV12, RGBA }, width=(int)[ 32, 4096 ], height=(int)[ 32, 4096 ], framerate=(fraction)[ 0/1, 2147483647/1 ]; video/x-raw, format=(string){ NV12 }, width=(int)[ 32, 4096 ], height=(int)[ 32, 4096 ], framerate=(fraction)[ 0/1, 2147483647/1 ]
where the formats for memory:VASurface and memory:DMABuf is superfluous. All the "I420, YV12, YUY2, UYVY, Y210" can not be really used as input format for encoder.
We should get:
video/x-raw(memory:VASurface), format=(string)NV12, width=(int)[ 32, 4096 ], height=(int)[ 32, 4096 ], framerate=(fraction)[ 0/1, 2147483647/1 ]; video/x-raw, format=(string){ NV12 }, width=(int)[ 32, 4096 ], height=(int)[ 32, 4096 ], framerate=(fraction)[ 0/1, 2147483647/1 ]
as the correct result.