HTTP Live Streaming or HLS as I'll refer to it for the rest of this post was first released by Apple in 2009. In its simplest form, it can be described as an HTTP-based adaptive bitrate (ABR) streaming protocol that relies on text-based playlist files referencing the actual media content. Adaptive bitrate streaming refers to a video playback experience that "adapts" to the user's network and environment constraints to provide the best possible viewing experience. Since HLS is both HTTP and text-based, simple web servers and CDNs can be leveraged heavily to distribute content.
This is the first step of the pipeline that involves capturing audio and visual data. If the content has already been recorded, this step may involve simply using existing media files.
The server is the next step in the pipeline which involves encoding the content from the previous step and segmenting it into short files. Common codecs for video are HEVC or AVC and AC-3 or AAC for audio. The output from this step is formatted as fragmented mp4 or MPEG-2 transport streams.
Once the media has been encoded and formatted for delivery the distribution step of the pipeline comes into play. The distributor may be a web server, caching system (like a CDN), or a combination of the two. Since HLS is built on a text-based HTTP protocol there’s typically little setup required for this component.
The client software is the last step of the pipeline and is responsible for requesting the media, downloading the response, and displaying it to the user. This step is slightly simplified here as there will most likely be some form of decoding required to play the media back.
Note: The key to this architecture is that the server encodes multiple renditions of the same video that can be played back effectively at varying bandwidths. For example, the server may generate a 720p and 1080p rendition of the same video so that the client can dynamically use the rendition that best fits their environmental constraints. The files encoded by the server are short enough that the client doesn't have to spend long amounts of time downloading and decoding the content.
HLS Playlist Format
As I mentioned previously, HLS is a text-based protocol and more specifically the text that is transferred from server to client is in the form of "playlists". A playlist is a simple text file with a file suffix of
.m3u8 (ie "playlist.m3u8"). The playlist contains references to the media files for playback. The media file references may be in the form of absolute or relative (to the location of the playlist) paths.
As with any file format playlist files need to follow a standardized structure. An HLS playlist is nothing more than a text file containing a set of standardized tags that each provide information about the playlist or the media files contained in the playlist. Tags are prefixed with a
# and media file references are not. Many possible tags can be used in playlists, but here are a few to just get a general understanding.
- EXTM3U: Indicates that the playlist is an extended M3U file. This type of file is distinguished from a basic
M3Ufile by changing the tag on the first line to
EXTM3U. All HLS playlists must start with this tag.
- EXT-X-VERSION: Indicates the compatibility version of the playlist file. The playlist media and its server must comply with all provisions of the most recent version of the IETF Internet Draft of the HLS specification that defines that protocol version.
- EXTINF: A record marker that describes the media file identified by the URL that follows it. Each media file URL must be preceded by an
EXTINFtag. This tag contains a duration attribute that’s an integer or floating-point number in decimal positional notation that specifies the duration of the media segment in seconds. This value must be less than or equal to the target duration.
- For a full list check out the RFC.
There are two main types of playlists used in HLS.
- Variant: A playlist file that references actual media files
- Multivariant: A playlist that references other playlists (typically the same content in a different rendition).
A layer below variant and multivariant playlists are VOD and Event playlist types, which are defined in the EXT-X-PLAYLIST-TYPE tag. VOD playlists typically contain pre-recorded content while Event playlists can refer to a live recording. The playlist type defined will change the way that servers can modify the playlist. For example, Event playlists can be appended to while VOD playlists are expected to be static.
Below are a couple of example playlists, which show how the tags and media file references are formatted.
HTTP Live Streaming is a relatively simple protocol that is used behind the scenes to power some of the most popular streaming experiences that we all use. Next time you watch a video maybe check out the network tab to see if you see any of these protocols in action!