Comprehensive Insights from the Audio and Video Technology Conference: Exploring the SRT Protocol and Wireshark Analysis

Abstract: This article starts with the working process of the SRT protocol, focusing on introducing and analyzing the packet structure of the SRT protocol, and providing examples of how to utilize Wireshark packet capture software for link fault analysis to resolve practical work problems.

Introduction

SRT (Secure Reliable Transport) protocol, also known as a secure and reliable transport protocol, is an emerging audio-video transmission protocol that enables high-quality, low-latency real-time audio-video transmission in public Internet environments.

SRT Protocol Analysis for Public Network Transmission (Part 1) focuses on discussing how to measure the reliability of the SRT protocol and how to configure the parameters of the SRT link in different application scenarios. This article, as the second part, will start with the working process of the SRT protocol, analyze the SRT protocol packet structure, and then use examples to introduce how to use Wireshark software for packet capture analysis to troubleshoot link faults or obtain link information.

1

SRT Protocol Working Process

The most common working mode in the SRT protocol is the “Caller-Listener” mode. The listener continuously listens on its fixed UDP port, and the caller establishes the SRT connection by accessing the listener’s public IP address and the fixed port. The roles of calling and listening are mainly active during the SRT protocol handshake phase, and either the encoding or decoding end can be the caller or listener.

Figure 1 shows the working process of the SRT protocol, including steps like handshake, parameter exchange, data transfer, and connection shutdown. In addition, while transmitting effective data, both parties send control data to accomplish functions like packet loss recovery and connection maintenance.

Audio and Video Technology conference

Figure 1 SRT Protocol Working Process

2

SRT Packet Structure

The SRT protocol is an improvement over the UDT Protocol (UDP-based Data Transfer Protocol) and submitted its RFC draft to the IETF on March 10, 2020, indicating the SRT protocol has entered a relatively stable development track.

As is well-known, the traditional advantage of SRT lies in point-to-point real-time audio-video transmission. In the past two years, the SRT protocol has seen rapid development in upstream streaming, with many mainstream platforms and companies supporting SRT protocol to replace RTMP protocol for upstream streaming. The key point is the StreamID feature of SRT, which is included in the configuration extension module of the SRT handshake packet.

Overall, the SRT protocol contains two types of packets: Data Packets and Control Packets, distinguished by the highest bit (flag bit) of the SRT header. A value of 0 represents a Data Packet, and 1 represents a Control Packet. Control Packets include various types like Handshake, Acknowledgement (ACK), Negative Acknowledgement (NAK), Acknowledgement for ACK (ACKACK), Keepalive, Shutdown.

2.1

Data Packet Structure

Figure 2 shows the structure of an SRT Data Packet, which carries the data to be transmitted. The SRT header is 16 bytes long, with the highest bit being the flag bit. The SRT Data Packet header includes four areas: Packet Sequence Number, Message Number, Timestamp, Destination Socket ID.

  • Packet Sequence Number: SRT uses a sequence number-based packet sending mechanism, incrementing the packet sequence number each time a packet is sent from the sender.
  • Message Number: Independently counted. Four flag bits are set before it (see Figure 2).
  • Timestamp: A relative timestamp based on the connection establishment time (StartTime), in microseconds.
  • Destination Socket ID: Used to distinguish different SRT streams in the case of multiplexing.

Figure 2 SRT Data Packet

2.2

Handshake Packet Structure

The handshake packets are divided into HSv4 version (SRT version < 1.3) and HSv5 version (SRT version >= 1.3). Figure 3 shows the structure of an HSv5 handshake packet, which mainly includes five areas: SRT Header, Handshake Control Info (cif.hsv5), Handshake Request/Response Extension Module (hsreg/hsrsp), Encryption Extension Module (kmreg/kmrsp), Configuration Extension Module (config). The focus is on the first three areas, and the structure of the handshake packet is shown in Figure 3:

Figure 3 HSv5 Handshake Packet

1. The headers of all SRT control packets are basically the same, containing four areas: Control Type and Reserved Area, Additional Information, Timestamp, Destination Socket. For handshake packets, the Control Type field equals 0.

2. In the Handshake Control Info Area (cif.hsv5), the following fields are important:

  • ISN: Randomly generated Initial Sequence Number for packets. All subsequent data packets are counted based on this.
  • Handshake Type: The first purpose of this field is to indicate the handshake phase of the packet (in the “Caller-Listener” mode, it is divided into Induction and Conclusion). The second and more important purpose for the user is to display an error code when the handshake fails, as seen in Table 1 below.

Error Code

Error Type

Error Code

Error Type

1000

Unknown Reason

1008

Peer Version Too Old

1001

System Function Error

1009

Socket Conflict in Ensemble Mode

1002

Peer Rejection

1010

Password Error

1003

Resource Allocation Issue

1011

Password Requirement

1004

Error Data in Handshake

1012

Stream Flag Conflict

1005

Listener Backlog Overflow

1013

Congestion Control Type Conflict

1006

Internal Program Error

1014

Packet Filter Conflict

1007

This Socket Is Closed

1015

Group Conflict

Table 1 Error Codes and Corresponding Error Types

  • SRT Socket ID: This field needs to be distinguished from the Destination Socket ID in the SRT header, as it only applies to the handshake phase, while the Destination Socket ID applies throughout data transfers.
  • Sync Cookie: In “Caller-Listener” mode, to prevent DoS attacks, only the listener generates the sync cookie, derived from the listener’s host, port, and current time, accurate to one minute.

3. Key fields in the Handshake Request Extension Module (HSREG) include:

  • SRT Version: If either party’s SRT version is below 1.3, the connection will be established using the HSv4 version handshake, requiring three or four round trips, while the latest HSv5 handshake requires only two. For compatibility reasons, even if both parties’ versions exceed 1.3, the initial handshake request will be in HSv4 format.
  • SRT Flag Bit: Seven flag bits are used to implement various modes and functions of SRT.
  • Send and Receive Delays: The SRT protocol version 1.3 supports bidirectional transmission, allowing different directional delays to be set. In conventional unidirectional transmission (e.g., A sending data to B), the delay (Latency) is determined by the maximum of A’s send delay (PeerLatency) and B’s receive delay (RecLatency) and is negotiated during the handshake phase. Some codecs may use the same values for PeerLatency and RecLatency for simplicity, which does not affect unidirectional transmission.

4. Encryption Extension Module KMREQ and Configuration Extension Module CONFIG

  • The final two non-essential extension modules, not discussed here due to space limitations. The Encryption Extension Module (KMREQ) implements SRT’s AES128/AES192/AES256 encryption functionality. The Configuration Extension Module (CONFIG) includes four types: SRT_CMD_SID, SRT_CMD_CONGESTION, SRT_CMD_FILTER, SRT_CMD_GROUP. The SRT_CMD_SID extension is pivotal for the StreamID functionality in upstream streaming. Interested readers may capture packets for detailed inspection.

2.3

ACK Packet Structure

An ACK packet is a positive acknowledgment sent by the SRT receiver to the sender. Upon receiving an ACK, the sender assumes the corresponding data packet has been successfully delivered. The ACK packet also contains estimated link data from the receiver, which can assist with congestion control for the sender. Figure 4 shows the ACK packet structure, highlighting several key fields:

Figure 4 ACK Control Packet

  • Control Type: This field equals 2, indicating an ACK packet.
  • Additional Information: Includes the independently counted ACK sequence number, primarily used to match ACK packets with ACKACK packets.
  • Recently Received Data Packet Sequence Number +1: Equals the sequence number of the most recently received information data packet plus 1. For instance, if this field in the ACK packet shows 6, it indicates that the first 5 data packets have all been received, allowing the sender to purge them from the buffer. Note that this field is associated with the Packet Sequence Number and is unrelated to the ACK sequence number.
  • RTT Estimate: An RTT estimate calculated using ACK and ACKACK packets, providing the round-trip time for the link.
  • RTT Jitter Value: Measures the RTT’s variability, where a higher value indicates greater link instability.
  • Receiver’s Available Buffer Data: Shows how much buffering data the receiver currently holds, which is available for decoding. A higher value is better, and it is capped by the Latency parameter.
  • Link Bandwidth Estimate: Provides a bandwidth estimate for the current link.
  • Reception Rate Estimate: Estimates the receiver’s downstream network bandwidth.

2.4

NAK Packet Structure

Upon detecting a discontinuity in packet sequence numbers, the SRT receiver judges a packet loss and immediately replies with a Negative Acknowledgement (NAK) packet to the sender. Additionally, the receiver periodically sends a NAK report, which lists all lost packet sequence numbers during the interval. This redundancy ensures that missing NAK packets in reverse transmission don’t pose risks. Figure 5 illustrates the NAK packet structure, where the Control Type equals 3, containing a list of lost packet sequence numbers.

Figure 5 NAK Control Packet

2.5

ACKACK Packet Structure

The main role of ACKACK is to calculate the Round Trip Time (RTT) for the link, which is crucial as an included link statistic in the ACK packet. Figure 6 demonstrates the ACKACK packet structure. Both ACK and ACKACK packets feature precise timestamps and ACK sequence numbers. As the sender delivers the ACK packet to the receiver, the receiver promptly returns an ACKACK packet. This enables the sender to match each ACK packet with its corresponding ACKACK, calculating RTT by subtracting their timestamps.

Figure 6 ACKACK Packet Structure

2.6

Keepalive and Shutdown Packet Structure

The last two packet types in SRT are the Keepalive and Shutdown packets. Their structures are shown in Figures 7 and 8.

Figure 7 Keepalive Packet Structure

Figure 8 Shutdown Packet Structure

3

Wireshark Packet Capture Analysis

Wireshark is an extensively used open-source packet analysis software, capable of intercepting various network packets and displaying detailed packet information. As the IP trend in the broadcast industry progresses, Wireshark’s role grows, akin to waveform monitors for SDI signals and stream analyzers for TS streams.

The following are two examples of using Wireshark for link analysis:

3.1

Scenario 1: Connection Failure

In the process of setting up SRT links, connection failures may occur due to various reasons. Here, we can leverage Wireshark’s packet capture analysis to determine the error type.

Figure 9 depicts captured data following connection failure, with video capture available below. The persistent handshake packet exchanges indicate a failure to establish a successful handshake, yet confirm correct IP and port settings, as communication between parties is functional.

Given both parties’ SRT versions exceeding 1.3, the handshake requires two round trips and thus four handshake packets. The initial handshake packet always follows the HSv4 format, allowing us to identify it. The “Handshake Type” of the fourth handshake packet is 1002-Reject, meaning “Peer Rejection“, suggesting parameter mismatches possibly caused the handshake failure.

Next, we examine the second handshake packet, a response from the listener to the caller. Its “Encryption Field” designates AES-128, signaling a need for AES-128 encrypted response from the other party. The third handshake packet, issued by the caller to the listener, showcases the KMEQ module as NOT in the “Extended Field”, indicating a lack of an encrypted response.

Through this analysis, we deduce the connection failure stems from the Listener’s AES-128 encryption demand unmet by the Caller. To connect successfully, the Caller’s AES-128 encryption option needs setting with the Listener’s password.

Figure 9 Scenario 1: Diagnosing Faults via Packet Capture Analysis

3.2

Scenario 2: Obtaining Link Information

The Internet link’s Round Trip Time (RTT) indicates the duration for data to travel from sender to receiver and back, affecting SRT link latency settings. Often difficult to access due to firewall restrictions, RTT can be estimated through ACK packet analysis with Wireshark.

Figure 10 reveals an RTT of 20.61 milliseconds, with an RTT fluctuation of 9.786 milliseconds, indicating unstable RTT. Variations in RTT affect the time required for packet retransmissions, influencing SRT link error control, necessitating parameter adjustments to fit the link’s characteristics.

Figure 10 Scenario 2: RTT Estimate and Variability

Conclusion

The SRT protocol, known for its excellent performance, low hardware-software requirements, and open-source nature, sees widespread application across various fields, with recent advances in upstream streaming. Understanding the SRT protocol packet structure allows for effective packet capture software use in fault analysis and resolution, ensuring quick, accurate troubleshooting in practical scenarios. We hope this article proves helpful and invite discussion and exchange.