msdk: Kernel Out-of-memory (OOM) triggered often
While encoding with msdk plugin, occasionally it fails for one reason or another. When encoding fails, gst msdk goes into an infinite loop which results in rapid incremental usage of system memory until all memory is depleted and it triggers system OOM error. I started observing this issue regularly on our internal CI around the second week of December (i.e. causing our test machines to hit OOM at least 3 or more times a day). But it's possible the problem has always been there and only less likely to encounter before.
The following output is an example of one instance where encoding failed and causes the aforementioned behavior:
$ LIBVA_DRIVER_NAME=iHD gst-launch-1.0 -vf filesrc \
location=/opt/media/src/assets/otc-media/yuv/MOBILE_352x288_30_orig_01_352x288_I420.yuv num-buffers=300 \
! rawvideoparse format=i420 width=352 height=288 framerate=25 \
! msdkh265enc rate-control=cbr bitrate=500 gop-size=30 num-slices=4 b-frames=2 hardware=true \
! video/x-h265,profile=main ! h265parse ! filesink location=output.h265
Setting pipeline to PAUSED ...
libva info: VA-API version 1.4.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/media/install/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
Pipeline is PREROLLING ...
Got context from element 'msdkh265enc0': gst.msdk.Context=context, gst.msdk.Context=(GstMsdkContext)"\(GstMsdkContext\)\ msdkcontext0";
/GstPipeline:pipeline0/GstRawVideoParse:rawvideoparse0.GstPad:src: caps = video/x-raw, format=(string)I420, width=(int)352, height=(int)288, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)25/1
Redistribute latency...
/GstPipeline:pipeline0/GstMsdkH265Enc:msdkh265enc0.GstPad:sink: caps = video/x-raw, format=(string)I420, width=(int)352, height=(int)288, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)25/1
/GstPipeline:pipeline0/GstMsdkH265Enc:msdkh265enc0.GstPad:src: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main, level=(string)2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main, level=(string)2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
/GstPipeline:pipeline0/GstH265Parse:h265parse0.GstPad:sink: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main, level=(string)2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main, level=(string)2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
(gst-launch-1.0:29670): GStreamer-CRITICAL **: 16:41:29.475: gst_buffer_resize_range: assertion 'bufmax >= bufoffs + offset + size' failed
(gst-launch-1.0:29670): GStreamer-CRITICAL **: 16:41:29.478: gst_buffer_resize_range: assertion 'bufmax >= bufoffs + offset + size' failed
/GstPipeline:pipeline0/GstH265Parse:h265parse0.GstPad:src: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main, level=(string)2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, parsed=(boolean)true, tier=(string)main
[2019-01-03 16:41:29] /GstPipeline:pipeline0/GstFileSink:filesink0.GstPad:sink: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main, level=(string)2, width=(int)352, height=(int)288, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg, chroma-format=(string)4:2:0, bit-depth-luma=(uint)8, bit-depth-chroma=(uint)8, parsed=(boolean)true, tier=(string)main
(gst-launch-1.0:29670): GStreamer-CRITICAL **: 16:41:29.481: gst_buffer_resize_range: assertion 'bufmax >= bufoffs + offset + size' failed
Pipeline is PREROLLED ...
Setting pipeline to PLAYING ...
New clock: GstSystemClock
(gst-launch-1.0:29670): GStreamer-CRITICAL **: 16:41:29.483: gst_buffer_resize_range: assertion 'bufmax >= bufoffs + offset + size' failed
(gst-launch-1.0:29670): GStreamer-CRITICAL **: 16:41:29.487: gst_buffer_resize_range: assertion 'bufmax >= bufoffs + offset + size' failed
<snip> previous message repeats many times </snip>
(gst-launch-1.0:29670): GStreamer-CRITICAL **: 16:41:38.167: gst_buffer_resize_range: assertion 'bufmax >= bufoffs + offset + size' failed
ERROR: from element /GstPipeline:pipeline0/GstMsdkH265Enc:msdkh265enc0: Encode frame failed.
Additional debug info:
gstmsdkenc.c(760): gst_msdkenc_encode_frame (): /GstPipeline:pipeline0/GstMsdkH265Enc:msdkh265enc0:
MSDK encode error (undefiend error)
Execution ended after 0:00:08.685737328
Setting pipeline to PAUSED ...
Setting pipeline to READY ...
Unfortunately, the above encode pipeline does not always fail consistently. Sometimes other encode pipelines fail intermittently, too. So it can be difficult to reproduce the behavior with a legal pipeline. In general, I've observed that the failure seems to coincide with a kernel GPU reset/hang.
Fortunately, to demonstrate the core issue around the gst msdk error flow logic that triggers OOM, the following hevc 10 bit encode example can trigger the problem 100% when executing on Skylake (SKL) platform which does not support this codec:
$ LIBVA_DRIVER_NAME=iHD GST_DEBUG=msdkenc:255 gst-launch-1.0 -vf \
filesrc location=/opt/media/assets/otc-media/otc-media/yuv/Allegro_HEVC_Main10_MT51_INTER_02_192x200@60Hz_1.5_P010.yuv \
num-buffers=300 ! rawvideoparse format=p010-10le width=192 height=200 framerate=25 \
! msdkh265enc rate-control=cbr bitrate=500 gop-size=30 num-slices=1 b-frames=0 hardware=true \
! video/x-h265,profile=main-10 ! h265parse ! filesink location=output.h265
Setting pipeline to PAUSED ...
libva info: VA-API version 1.4.0
libva info: va_getDriverName() returns 0
libva info: User requested driver 'iHD'
libva info: Trying to open /opt/media/install/lib/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_4
libva info: va_openDriver() returns 0
0:00:00.017432694 2789 0x55d224afb270 INFO msdkenc gstmsdkenc.c:1401:gst_msdkenc_start:<msdkh265enc0> Creating new context <msdkcontext0>
Pipeline is PREROLLING ...
Got context from element 'msdkh265enc0': gst.msdk.Context=context, gst.msdk.Context=(GstMsdkContext)"\(GstMsdkContext\)\ msdkcontext0";
/GstPipeline:pipeline0/GstRawVideoParse:rawvideoparse0.GstPad:src: caps = video/x-raw, format=(string)P010_10LE, width=(int)192, height=(int)200, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)25/1
0:00:00.018652197 2789 0x55d224ac6770 INFO msdkenc gstmsdkenc.c:1031:gst_msdkenc_set_format:<msdkh265enc0> This MSDK encoder uses video memory
0:00:00.018844247 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:424:gst_msdkenc_init_encoder:<msdkh265enc0> Video Encode Query returned: incompatible video parameters
0:00:00.018899187 2789 0x55d224ac6770 DEBUG msdkenc gstmsdkenc.c:461:gst_msdkenc_init_encoder:<msdkh265enc0> Required 5 surfaces (5 suggested), allocated 5
0:00:00.020921741 2789 0x55d224ac6770 DEBUG msdkenc gstmsdkenc.c:865:gst_msdkenc_set_src_caps:<msdkh265enc0> output caps: video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main-10, level=(string)2
0:00:00.021032050 2789 0x55d224ac6770 INFO msdkenc gstmsdkenc.c:805:gst_msdkenc_set_latency:<msdkh265enc0> Updating latency to 0:00:00.160000000 (4 frames)
Redistribute latency...
/GstPipeline:pipeline0/GstMsdkH265Enc:msdkh265enc0.GstPad:sink: caps = video/x-raw, format=(string)P010_10LE, width=(int)192, height=(int)200, interlace-mode=(string)progressive, pixel-aspect-ratio=(fraction)1/1, chroma-site=(string)jpeg, colorimetry=(string)bt601, framerate=(fraction)25/1
/GstPipeline:pipeline0/GstMsdkH265Enc:msdkh265enc0.GstPad:src: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main-10, level=(string)2, width=(int)192, height=(int)200, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:src: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main-10, level=(string)2, width=(int)192, height=(int)200, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
/GstPipeline:pipeline0/GstH265Parse:h265parse0.GstPad:sink: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main-10, level=(string)2, width=(int)192, height=(int)200, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
/GstPipeline:pipeline0/GstCapsFilter:capsfilter0.GstPad:sink: caps = video/x-h265, stream-format=(string)byte-stream, alignment=(string)au, profile=(string)main-10, level=(string)2, width=(int)192, height=(int)200, pixel-aspect-ratio=(fraction)1/1, framerate=(fraction)25/1, interlace-mode=(string)progressive, colorimetry=(string)bt601, chroma-site=(string)jpeg
0:00:00.046626052 2789 0x55d224ac6770 ERROR default gstmsdkvideomemory.c:77:gst_msdk_video_allocator_get_surface: failed to get surface available
0:00:00.046646913 2789 0x55d224ac6770 ERROR msdkenc gstmsdkenc.c:1103:gst_msdkenc_get_surface_from_pool:<msdkbufferpool0> failed to acquire a buffer from pool
0:00:00.046654233 2789 0x55d224ac6770 ERROR msdkenc gstmsdkenc.c:1357:gst_msdkenc_handle_frame:<msdkh265enc0> Surface pool is full
0:00:00.046778609 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: Encode frame failed.
0:00:00.046787030 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: MSDK encode error (the HW acceleration device is busy)
ERROR: from element /GstPipeline:pipeline0/GstRawVideoParse:rawvideoparse0: Internal data stream error.
0:00:00.047086777 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: Encode frame failed.
Additional debug info:
gstbaseparse.c(3634): gst_base_parse_loop (): /GstPipeline:pipeline0/GstRawVideoParse:rawvideoparse0:
streaming stopped, reason error (-5)
0:00:00.047098930 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: MSDK encode error (the HW acceleration device is busy)
ERROR: pipeline doesn't want to preroll.
Setting pipeline to NULL ...
0:00:00.047140554 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: Encode frame failed.
0:00:00.047153897 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: MSDK encode error (the HW acceleration device is busy)
0:00:00.047171944 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: Encode frame failed.
0:00:00.047180420 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: MSDK encode error (the HW acceleration device is busy)
0:00:00.047195658 2789 0x55d224ac6770 WARN msdkenc gstmsdkenc.c:832:gst_msdkenc_flush_frames:<msdkh265enc0> error: Encode frame failed.
<snip> The previous message repeats indefinitely </snip>
Software Stack:
libva (master) heads/master-0-g382acf1 https://github.com/01org/libva
gmmlib (master) intel-gmmlib-18.4.1-0-g413896e https://github.com/intel/gmmlib
intel-vaapi-driver (master) heads/master-0-g72f10f1 https://github.com/01org/intel-vaapi-driver
intel-media-driver (master) heads/master-0-g8968471 https://github.com/intel/media-driver
msdk (master) heads/master-0-g750cfb4 https://github.com/Intel-Media-SDK/MediaSDK
gstreamer (master) heads/master-0-g74b9e72 http://anongit.freedesktop.org/git/gstreamer/gstreamer
gst-plugins-base (master) heads/master-0-ga95ab79 https://anongit.freedesktop.org/git/gstreamer/gst-plugins-base
gst-plugins-good (master) heads/master-0-gec93160 https://anongit.freedesktop.org/git/gstreamer/gst-plugins-good
gst-plugins-bad (master) heads/master-0-g2f05ee9 https://anongit.freedesktop.org/git/gstreamer/gst-plugins-bad
gstreamer-vaapi (master) heads/master-0-ge4ae7d9 https://anongit.freedesktop.org/git/gstreamer/gstreamer-vaapi