matroskademux: subrip subtitles can be rendered with XML tags
About the issue:
We have a file, it's muxed with ffmpeg, and contains SubRip subtitles (from an srt file) SampleVideo_640x480_1mb.mkv
VLC can play this file without issues
But if we play it with gst-play-1.0, we can see the XML tags rendered together with the subtitle text
Subtitles are correctly muxed and have codec id of the SubRip format: S_TEXT/UTF8
When matroskademux element opens this file, for subtitles stream it exposes pango-markup caps
...
if (!strcmp (codec_id, GST_MATROSKA_CODEC_ID_SUBTITLE_UTF8)) {
/* well, plain text simply does not have a lot of markup ... */
caps = gst_caps_new_simple ("text/x-raw", "format", G_TYPE_STRING,
"pango-markup", NULL);
context->postprocess_frame = gst_matroska_demux_check_subtitle_buffer;
subtitlecontext->check_markup = TRUE;
...
This particular action (saying that SubRip is a pango markup) is wrong: SubRip has it's own markup, and it's not always compatible with pango, and the file attached is the case.
About a fix:
If we open with GStreamer an srt file with same subtitles, it doesn't have described issue, because the input is handled by a subparse element, that converts different subtitle formats to pango-markup and also throws away unknown markups.
Idea of fix that is going to be proposed in the MRs is to make subparse element autoplug after matroskademux and make the convertion SubRip --> pango-markup. To do that we do 2 things:
-
Instead of "pango-markup" expose from matroskademux some new format, let's call it "text/x-subrip-muxed" (do you know, maybe there's already some existing format for it?). NOTE: We can't use "application/x-subtitle" that is used for srt files, because it's a little bit different: data of such format is supposed to have number and timestamp inside of the text. MR to matroskademux element, (gst-plugins-good)
-
Make subparse element handle "text/x-subrip-muxed". MR to subparse element (gst-plugins-base)