Jitter Buffer Specification
Moved to https://docs.pjsip.org/en/latest/specific-guides/audio/jitter_buffer.html
Table of Contents
This short article describes the specification for the jitter buffer.
Specifications
Continuous flow
The main function of a jitter buffer is to ensure that it's user receive as continuous as possible flow of incoming frames regardless of the jitter in the incoming packet arrival time. The jitter buffer MUST provide this function.
Adaptive to jitter change
The jitter buffer MUST be able to adapt to change in network jitter, increasing or decreasing the prefetch value and the buffering latency as necessary. Jitter may also be introduced by sound device. See Low latency section below for more info about the adaptivity.
Low latency
The jitter buffer MUST attempt to provide the minimum buffering possible on the incoming packet (without sacrificing the continuous flow requirement above), to minimize latency.
Burst level
In the jitter buffer operation, packets are added-to and removed-from jitter buffer continuously. Ideally the adding & removing operations are done alternatingly one after another, this is when burst level is 1.
However, at least from our experience, this is rarely the case. Network jitter happens and it may change from time to time. We have also seen jitter in sound device clock, permanently or temporary (e.g: due to CPU load spike). The combined jitters, i.e: from network (packet producer) and from sound-device (packet consumer), will cause add/remove operation bursts. The burst level represents the number of consecutive add/remove operations in a row, for example, burst level 3 means the jitter buffer will tend to see three add operations in a row before seeing remove operation and vice versa. We consider the burst level is very important for optimum latency calculation, i.e: the latency should not be shorter than the burst level as otherwise the sound-device/consumer may suffer from starvation (no packet available in the jitter buffer when needed).
As the network condition may change continuously and so does the sound device clock, the burst level needs to be monitored continuously. As mentioned before that the burst level is very important for latency calculation, we need to be careful in changing the burst level, which eventually will cause jitter buffer to adjust the latency. To avoid starvation, increasing burst level should be done instantly and decreasing it should be done slowly.
Progressive discard
We consider that the optimum latency at any time should be about the burst level, for example when burst level is 3, the latency should be about 60ms (for audio frame length 20ms). So when the latency (or the number of frames buffered by jitter buffer) is longer than the burst level, we should discard some frames to reach the optimum latency. The progressive discard will drop frames at various rates depending on the difference between the actual latency and the optimum/target latency. There are some configurable macro settings that affects the discard rate: PJMEDIA_JBUF_PRO_DISC_MIN_BURST & PJMEDIA_JBUF_PRO_DISC_MAX_BURST, PJMEDIA_JBUF_PRO_DISC_T1 & PJMEDIA_JBUF_PRO_DISC_T2, PJMEDIA_JBUF_DISC_MIN_GAP.
For example, when the optimum/target latency is 3 frames (or 60ms) and current latency is 10 frames, the jitter buffer will schedule to discard a frame with calculation as follow:
- Overflow or difference between actual & target latencies is 10 - 3 frames = 7 frames.
- Use the following formula for calculating the target time for adjusting the latency (i.e: by discarding the overflow of 7 frames above):
T = PJMEDIA_JBUF_PRO_DISC_T1 + (PJMEDIA_JBUF_PRO_DISC_T2 - PJMEDIA_JBUF_PRO_DISC_T1) * (burst_level - PJMEDIA_JBUF_PRO_DISC_MIN_BURST) / (PJMEDIA_JBUF_PRO_DISC_MAX_BURST-PJMEDIA_JBUF_PRO_DISC_MIN_BURST); /* Default settings: PJMEDIA_JBUF_PRO_DISC_T1 = 2000ms PJMEDIA_JBUF_PRO_DISC_T2 = 10000ms PJMEDIA_JBUF_PRO_DISC_MIN_BURST = 1 PJMEDIA_JBUF_PRO_DISC_MIN_BURST = 100 */
At this point, the target time is 2000 + (8000 * 3/99) = 2242ms or discard rate is target-time / overflow = 2242 / 7 = 320 ms per frame. So the jitter buffer will discard a frame with timestamp 320ms (or frame to be played 320ms later). There are also few notes:- If the frame with that timestamp is not available in the jitter buffer yet, the calculation will be done again later. If the burst level is changed when the calculation is redone, the frame to discard may be changed too (no longer frame with timestamp 320ms).
- If the scheduled frame timestamp is lower than PJMEDIA_JBUF_DISC_MIN_GAP (i.e: 200ms), the jitter buffer will use PJMEDIA_JBUF_DISC_MIN_GAP instead, so the discard rate will not be faster than PJMEDIA_JBUF_DISC_MIN_GAP.
Static discard
The jitter buffer also provides a more conservative discard algoritm. With this algo, the optimum latency at any time should be twice the burst level (as comparison, the progressive discard algo above assumes optimum latency should be about equal to the burst level). This algorithm discard rate is fixed to PJMEDIA_JBUF_DISC_MIN_GAP, so it will discard a frame every 200ms (the default value) until the target/optimum latency is reached.
Duplicate/old frame
The jitter buffer MUST be able to detect the arrival of duplicate or old frame.
A duplicate frame is a frame which has the same frame number of an existing frame in it's buffer. In this case, the handling depends on the value of discarded argument in pjmedia_jbuf_put_frame2() function:
- if non-zero (TRUE), jitter buffer MUST ignore the duplicate frame and set the discarded argument of pjmedia_jbuf_put_frame2().
- if FALSE, the jitter buffer will override the old frame with this newer frame, and set the discarded argument of pjmedia_jbuf_put_frame2() to FALSE.
- if NULL, then FALSE is assumed.
An old frame is a frame which sequence number is older than what is currently "played" (returned by the jitter buffer to it's user). Old frame is always discarded, and discarded argument of pjmedia_jbuf_put_frame2() function will be set.
Non octet-aligned
The jitter buffer MUST be able to store frames that are not octet/byte aligned.
Sequence number jump/restart
The jitter buffer MUST be able to detect large jump in sequence number and restart it's state.
DTX
The jitter buffer MUST be able to handle discontinuous transmission (DTX) without triggering restart. Note that user MAY use RTP timestamp or sequence number as frame sequence number of the jitter buffer frames, hence DTX MAY or MAY NOT be reflected with a jump in the frame sequence number.
Minimum prefetching
The jitter buffer MUST obey the minimum prefetching value as set by it's user.
Fixed mode operation
The jitter buffer MUST be able to operate in fixed/non-adaptive mode. This can be done by calling pjmedia_jbuf_set_fixed() function.
Return correct frame types
The jitter buffer MUST return the correct frame type in pjmedia_jbuf_get_frame():
- PJMEDIA_JB_NORMAL_FRAME: a normal frame has been returned.
- PJMEDIA_JB_ZERO_PREFETCH_FRAME: no frame is returned because the jitter buffer is currently prefetching.
- PJMEDIA_JB_ZERO_EMPTY_FRAME: no frame is returned because the jitter buffer is currently empty.
- PJMEDIA_JB_MISSING_FRAME: no frame is returned because it's lost/missing.