Live Streaming Time Lag

Intro to Live Streaming Latency.

A Few Terms You Should Know
Below are a few streaming terms that you might not be familiar with. We’ve defined them so you can reference them as you go on:

  • Latency: the more accurate term for “delay”; the amount of time between something that happens in the “real world” and the display of that event on the viewer’s screen.
  • Video Distribution Service (VDS): though a VDS can take many forms, it is essentially responsible for taking one or more incoming streams of video and audio (from a broadcaster) and presenting it to viewers. This includes what is commonly referred to as a Content Delivery Network.
  • Content Delivery Network (CDN): a means of efficiently distributing content around the globe.
  • Transcoding: the process of decoding an incoming media stream, changing one or more of its parameters and re-encoding it with the new parameter settings.
  • Transrating: a similar process to transcoding, whereby the media stream’s compressed bitrate is changed, typically to a lower value.
  • Adaptive Bitrate Streaming (ABR): ensures that viewers on many kinds of devices with different capabilities and varying internet access can smoothly play a media stream.
What Causes Latency?

Let’s look at how a typical live streaming system works and examine how latency is introduced at each step:

Image Capture
Whether the Content Providers are using a single camera or a sophisticated video mixing system, taking a live image and turning it into digital signals takes some time. At minimum, it will take at least the duration of a single captured video frame 1/30th of a second (for a 30fps frame rate).

More advanced systems such as video mixers will introduce additional latency for decoding, processing, re-encoding, and re-transmitting.

Minimum: about 33 milliseconds

Maximum: hundreds of milliseconds

Encoding (Converting video content into Data)
When encoding, it takes time to convert the “raw” image signal into a compressed format suitable for transmission across the Internet. This latency can range from extremely low (thousandths of a second) to values closer to the duration of a video frame.

Minimum: about 1 millisecond

Maximum: about 40-50 milliseconds

Transmission
The encoded video takes time to transmit over the Internet to a VDS. There are many layers of equipment and circuit paths of the VDS that affect latency. Equipment can range from video segmenters, edge caching servers to ip multicasting receivers. Circuit paths can range from across the nation sometimes requiring multiple paths before it reaches your home device making jitter buffer more important for a fluid and seamless viewing experience.

Minimum: about 5-10 milliseconds

Maximum: hundreds of milliseconds

Jitter Buffer
Since the internet is a massively connected series of digital communication routes, the encoded video data may take one of many different routes to the VDS, and this route may change over time. Because these routes take different amounts of time to traverse, it may arrive at the VDS out of order. A special software component called a jitter buffer re-organizes the arriving data so that it can be properly decoded.

When configuring the jitter buffer, the time boundary provides the latency of the jitter buffer. As the latency is lowered, the risk of losing “late” data increases; while choosing a higher latency ensures that more “late” data is recovered.

Minimum: typically no less than 100 milliseconds

Maximum: several seconds

Transcoding and Transrating
Typical viewers will be watching from many kinds of devices (PCs, Macs, tablets, phones, TVs, and set-top boxes) over many types of networks (LAN/wifi, 4G LTE, 3G, etc.). In order to provide a quality viewing experience across a range of devices, a good streaming provider should provide ABR (adaptive bit rate).

There are two general ways to accomplish this: either the encoder streams multiple quality levels to the VDS (which are directly relayed to viewers), or the encoder sends a single high-quality stream to the VDS, which then transcodes and transrates it to multiple levels. Typically, the transcoding and transrating takes about as long as a “segment” of encoded video (more about segments later), but it can be faster at smaller resolutions and lower bitrates.

Minimum: about 1 second

Maximum: about 10 seconds

Transmission to Viewers
There are two categories of protocols for viewing live video content: non-HTTP-based and HTTP-based. The two differ on their latency and their scalability. For now we will concentrate on HTTP-based protocols since this is what United Streaming TV deploys.

HTTP-based protocols (such as HLS, HDS, MSS, and MPEG-DASH) are designed to take advantage of standard web servers and content distribution networks which scale to many simultaneous users. They also have built-in support for adaptive playback, and have more broad native support on mobile devices.

The way these HTTP-based protocols work is by breaking up the continuous media stream into “segments” that are typically 2-10 seconds long. These segments can then be served to viewers by a standard web server or content distribution network.

HTTP-based protocols are generally better suited to most live streaming scenarios due to better feature support and scalability. However, the disadvantage of these protocols is that the latency is at least as long as the segment length, and can be as bad as 3-4 times the segment length (for example, iOS streaming devices buffer 3-4 segments before even beginning to play the video). But the payoff is seamless streaming video viewing.

Minimum (for HTTP-based protocols): about 2 seconds

Maximum (for HTTP-based protocols): about 30-40 seconds

Decoding and Display
Whether viewing on a phone, a computer, or a TV, it takes time to decompress the media data and render it on the screen. In the best case, this can be as low as a single frame duration (1/30th of a second at 30fps), but typical values are 2-5 times the duration of a video frame. This latency is determined by the capabilities of the viewing device.

Minimum: about 33 milliseconds

Maximum: hundreds of milliseconds

Putting It Together
A streaming solution that uses HTTP-based adaptive bitrate mechanisms will have a slightly higher latency range: about 3.2–56 seconds. Realistically, it will typically be in the 10 to 45 second range. Since this approach uses HTTP-based mechanisms that can leverage off-the-shelf CDNs, it can theoretically support a very large number of simultaneous viewers without difficulty. Thus in the end, providing a more robust and enjoyable viewing platform for the customer.

Share this post

Share on facebook
Share on twitter
Share on email