SRT (Secure Reliable Transport) is a new generation low-latency video transmission protocol, characterized as an open-source, free, and flexible specification. It performs as excellently as proprietary protocols while being able to work across products from different manufacturers. This article, primarily referencing Haivisionâs SRT whitepaper, overviews some key features of SRT and compares SRT with common transmission formats and the new generation transmission protocol QUIC, concluding with a brief overview of SRTâs current state of development.
Key Features
Direct Connection Establishment
SRT allows for direct connection establishment between the signal source and the destination, contrasting sharply with many existing video transmission systems that require a centralized server to collect signals from remote locations and redirect them to one or more destinations. The centralized server architecture has a single point of failure and can become a bottleneck during high traffic periods. Transmitting signals through a hub also increases end-to-end signal transmission time and can potentially double bandwidth costs as two links must be implemented: one from the source to the central hub and another from the hub to the destination. By using a direct connection from source to destination, SRT can reduce latency, eliminate the central bottleneck, and lower network costs.
Packet Delivery Using ARQ Mechanism
This section compares three packet delivery mechanisms. At the top is an uncorrected data stream, where errors occur in the output signal each time a packet is lost. The middle section follows the Forward Error Correction (FEC) mechanism, which adds a fixed amount of additional data to the stream, which can be used to recreate lost packets. The bottom follows the Automatic Repeat-reQuest (ARQ) mechanism, where the sender retransmits lost packets based on the receiverâs request, avoiding FECâs constant bandwidth consumption.
>
The ARQ mechanism operates by establishing a bi-directional connection between the video source and the destination. Each outbound data packet is given a unique sequence number, and the receiver uses these sequence numbers to determine whether all incoming data packets have been received correctly and in the right order. If packets are lost in the network, the receiver can create a list of sequence numbers for the missing information packets and automatically request the sender to retransmit them. For networks with high error rates (at specific times or during faults), this process can be repeated multiple times. ARQ requires caching at the sending location (to temporarily store data packets in case retransmission is needed), and at the receiving location, a buffer is set up to reorder the packets into the correct sequence before they are sent to the video decoder or other receiver.
SRT uses the ARQ mechanism mainly because it can handle the most common type of error on the Internet, which is loss primarily caused by random packet loss. These errors can be easily fixed by simple retransmission of any packets not reaching the receiver. If packets containing bit errors reach the receiver, they are treated as lost packets, and the sender is asked to retransmit them. Another benefit is that SRT provides high-resolution timestamps for each packet to accurately reproduce the media streamâs timing at the output end. This helps ensure that downstream devices can properly decode video and audio signals.
FEC is only applicable to systems that can support the additional bandwidth required by FEC data and can tolerate signal interruptions when network error rates exceed thresholds.
Using UDP Packet Format
Every packet sent during an SRT session uses the UDP (User Datagram Protocol) packet format, providing low overhead and low-latency packet delivery. Most real-time media transmission networks designed for professional applications use UDP because it provides a robust and consistent packet delivery system with consistent throughput.
The reason for not using TCP (Transmission Control Protocol) is that TCP requires all bytes of a stream to be delivered exactly in their original order. While this might sound like a good approach for sending video, experience shows otherwise. With video, some lost bytes can be corrected, or in the worst case, ignored. Using TCP, itâs not possible to skip bad bytes; instead, the protocol continues to retry sending the lost data as long as it takes, leading to many frozen frames and the appearance of ârebufferingâ symbols in congested network environments, which can significantly impact the viewer. The third impact of TCP is subtle but important for video transmission; TCP automatically reduces the packet transmission rate when network congestion occurs, which, while benefiting overall network congestion reduction, is not suitable for video signals because their speed cannot fall below their nominal bit rate.
Beginning with Handshake and Feature Information Exchange
SRT provides three different handshake modes to enable devices to connect with each other and set the necessary data for sending and receiving packets, such as IP addresses. The first is the caller mode, where an SRT endpoint attempts to connect to a remote device with a known address and UDP port number. The second is the listener mode, where the SRT device continuously monitors incoming communication streams to route them to a defined address and port number, waiting for a connection from a caller device. The third mode is called ârendezvous,â where both endpoints act as both callers and listeners simultaneously to make it easier to establish a connection through certain types of firewalls.
Each handshake must be confirmed mutually with secure cookies to verify endpointsâ identity and credentials before proceeding. Once the handshake process is complete, callers and listeners exchange their capabilities and configurations. Both ends of the network need to know the total end-to-end latency between the two endpoints in order to establish the correct buffer sizes to handle packet retransmission delays. Connection bandwidth can also be estimated and communicated to allow video to be compressed to fit the capacity of the network. Optionally, encryption keys can be exchanged between sender and receiver to encrypt video and audio content within IP packets using AES 128/192/256-bit encryption, making transmission more secure.
Comparison with Common Transmission Formats
SRT has several characteristics compared to most other video streaming formats on the market, such as RTMP, HLS, and MPEG-DASH, including:
Non-Proprietary
SRT is an open-source solution that has been integrated into multiple platforms and architectures, including hardware-based portable solutions and software-based cloud solutions. Because all systems rely on the same underlying codebase, interoperability is simplified.
>
Handles Long Network Delays
Due to its flexible, adaptive buffer management system, SRT works well over connections with delays ranging from a few milliseconds up to several seconds, making it capable of handling anything that may be found on private networks or the global Internet.
Supports Multiple Stream Types
Unlike some solutions that only support specific video and audio formats, SRT is load agnostic. Any type of video or audio media, or indeed any other data element that can be sent using UDP, is compatible with SRT.
Supports Multiple Concurrent Streams
Multiple different media streams, such as multiple camera angles or optional audio tracks, can be sent over parallel SRT streams sharing the same UDP port and address on a point-to-point link. This can be achieved while maintaining each signalâs media format and timing, allowing MP4 video signals to share a link with JPEG2000 streams. This simplifies network configuration and firewall traversal.
Enhanced Firewall Traversal
No modern organization, whether media-based or otherwise, permits unrestricted access to the public Internet from corporate systems. Firewalls protect private network devices such as PCs and servers from unnecessary external connections and attacks. The handshake process that SRT uses supports outbound connections without requiring dangerous permanent external ports to be opened in firewalls, thus maintaining corporate security policies.
Accurate Signal Timing
Many compressed video signal formats are highly sensitive to interruptions caused by timing variations between different elements of a signal. With SRT, each packet has a high-resolution timestamp assigned by the sender, which the receiver can restore to accurately reconstruct the signal timing relationships, regardless of changes in network delay. Additionally, during the handshake process, SRT endpoints establish a stable end-to-end latency profile, eliminating the need for downstream devices to have their own buffers to cope with continuously varying signal delays.
No Need for a Central Server
Some proprietary media transmission systems require a centralized server between the sender and receiver, adding cost and latency. SRT connections are made directly between devices, thus eliminating the need for a central server. Additionally, if needed, SRT can be deployed with centralized servers and relay points to serve applications such as cloud-based content aggregation systems and clip distribution networks where a centralized model is preferred.
Low Cost
SRT systems are implemented using free open-source libraries, which helps reduce costs for all parties. SRT deployments require no royalties, long-term contracts, or monthly subscription fees.
API-Based
The SRT technology suite is API-based, allowing vendors to establish tight, reproducible integration with platforms and endpoints.
Open-Source Community
SRT has been adopted by industry-leading open-source projects, such as VideoLANâs VLC, the free open-source cross-platform multimedia player and framework; GStreamer, the underlying stream engine for small and mobile devices; Wireshark, a leading network stream analyzer; and FFmpeg, the worldâs most popular open-source video compression toolkit.
Comparison with QUIC
Both SRT and QUIC are designed to overcome the packet loss and sequencing issues of UDP while eliminating the common buffer delay of TCP (Transmission Control Protocol). Both protocols utilize TLS 1.3 to deliver secure transmission, the latest version of the Transport Layer Security protocol.
QUIC employs several techniques to minimize blocking, such as estimating persistent bandwidth based on the path taken by each stream and determining the pacing of packet generation based on the bandwidth and proactive retransmission to support error correction or to prioritize packages for encrypting operations.
QUIC also reduces latency by minimizing the number of round trips required to establish a connection and avoiding setting up connections with secondary sources on a web page after the main connection has been established. Several steps associated with the handshake, encryption setup, and initial data requests are combined in the initial setup, and compression and multiplexing processes like those used in HTTP/2 are utilized to avoid separate setups for accessing sub-sources on a page.
SRT utilizes variations of several of these techniques, including fast session setup, bandwidth estimation, and loss recovery handling via low-latency retransmission technology, which mitigates congestion by dropping packets when congestion is high. However, SRT does not rely on HTTP and Adaptive Bitrate (ABR) to change bitrate according to bandwidth availability but instead, it analyzes network conditions in real-time and filters out the effects of jitter, noise, and congestion.
Since SRT is vastly different from streaming standards under HTTP Live Streaming (HLS), MPEG-DASH, and other ABR modes, it faces challenging battles in midstream and end-use applications.
Development Status
At the Plug-in Conference held this May, 15 alliance members successfully completed more than 50 tests verifying SRT streams among cameras, encoders, decoders, gateways, multiple audiences, and players.
Haivision and Wowza co-created the SRT Alliance, and since SRT became open-source in 2017, over 130 companies have supported the open-source project through the SRT Alliance. Its vendors and end-users are working together to increase industry awareness of SRT and to establish it as a universal standard for low-latency video transmission over the Internet. Prominent SRT Alliance members include Ateme, Blonder Tongue, Brightcove, Ericsson, Eurovision, Haivision, Harmonic, Limelight, Matrox, Sencore, and Wowza.
Currently, over 50 types of SRT-supporting products are on the market, including IP cameras, encoders, decoders, gateways, OTT platforms, and CDNs. The SRT protocol is employed by thousands of organizations worldwide in numerous applications and markets. End-users include Comcast, ESPN, Fox News, Microsoft, NBC Sports, NFL, and Tencent.