audiobasesink: prerolling late with fixed latency and rtpbin -> unsynchronized output

I am having some troubles with synchronized audio output. On the receiving end, this is roughly my setup:

Pipeline with gst_pipeline_set_latency, let's say 2000ms
gst_pipeline_use_clock on sender and receiver is using a network clock (synchronized)
rtpbin/rtpjitterbuffer with latency property set to 780ms, ntp-sync TRUE and ntp-time-source set to clock-time.
alsasink

Let's say I have two different devices:

Device A has 10 alsa periods 100ms each (total 1000ms)
Device B has 10 alsa periods 20ms each (total 200ms)

Let's say the sending side is a little delayed and sends out the first buffer (though it doesn't matter whether the delay is massive or almost 0). rtpjitterbuffer waits at least 780ms (its own latency) before pushing out the first buffer. Let's say it took 5 seconds for the first buffer to arrive. It's timestamp is now e.g. 00:00:05.940 (rtpjitterbuffer added in it's latency of 780ms). The deadline timer now triggers it to be pushed downstream to alsasink. Since it is the first buffer, alsasink calls gst_audio_base_sink_sync_latency, where it waits for the upstream latency (which happens to be rtpjitterbuffer's latency, and starting 1.16 + processing-deadline), so 780ms. But because the pipeline was started 5+ seconds ago, it will always be a late preroll.

Shouldn't the preroll not only take the buffer's timestamp into account, but also wait until total latency - upstream latency, so it doesn't start writing way too early? Right now it waits the same amount of time on both devices, and since it's most likely a late preroll, not at all. But it also seems that once gst_audio_base_sink_sync_latency returned, on device B it wouldn't wait 800ms longer and write into the ringbuffer at the wrong locations.

The end result is that both devices play totally out of sync initially, but eventually clock skew adjustments bring them in sync again. But when the delta is so big, it's pretty awful quality and countless clock skew corrections.

It seems that gst_audio_base_sink_sync_latency should not be waiting merely until the upstream latency expired, but rather "fixed latency - upstream latency", and on top of that also account for the timestamp of the first packet? Or does gst_audio_base_sink_render lack a wait on the clock if the timestamp is too far out (which would only be hit right after prerolling or after a resync)?

I think part of the problem is that the first buffer tends to be tiny (e.g. 1/10) of the total alsa buffer size (all periods), and because the ringbuffer thread is started at preroll it "sucks" in as many periods as possible, but only that first small buffer was written timely enough. Then, when the next buffer comes in, the ringbuffer thread has already inserted a bunch of silence, and this all leads to a bunch of choppy audio, and the clocks are way off.

I haven't had much success getting this to work properly. It feels like audiobasesink doesn't handle this correctly, but I don't know for sure. I'd appreciate any suggestions/ideas/hints.

Edited May 31, 2019 by Thomas Bluemel

Admin message

audiobasesink: prerolling late with fixed latency and rtpbin -> unsynchronized output