Understanding Peer Data Transmission in BitTorrent: Protocols, Extensions, and Challenges

0. Review

Previous articles:

– A Brief Analysis of the Bittorrent Protocol (1) Metadata File https://cloud.tencent.com/developer/article/2332701

– A Brief Analysis of the Bittorrent Protocol (2) Tracker and Peers https://cloud.tencent.com/developer/article/2333043

– A Brief Analysis of the Bittorrent Protocol (3) Example of Peer Data Transmission https://cloud.tencent.com/developer/article/2333677

– A Brief Analysis of the Bittorrent Protocol (4) Distributed Hashing https://cloud.tencent.com/developer/article/2334440

– A Brief Analysis of the Bittorrent Protocol (5) Extended Protocol and Metadata Transmission Extension https://cloud.tencent.com/developer/article/2334776

Content review from previous articles:

BitTorrent is a protocol used for distributing files. The metadata files are encoded using bencode, fragmented for SHA-1 hash checking, and the structure of the metadata files is introduced. Node information is exchanged via HTTP requests from the Tracker, and nodes communicate directly.

The metadata transfer extension in the extended protocol allows metadata to be transferred between nodes. The PEX extension allows nodes to exchange node information, and DHT can retrieve nodes based on information hash using KRPC. Local service discovery is based on multicast, and in private torrents, these items need to be disabled.

So far, the discussed content has primarily been based on the TCP implementation of BitTorrent. In some network environments, establishing connections via TCP presents certain limitations, as excessive TCP connections can unfairly consume network resources. uTorrent and Holepunch expansions based on UDP can effectively address these issues and also provide opportunities for downloaders behind NAT or firewalls to connect.

uTorrent

For the best reading experience, combine with A Brief Analysis of the Bittorrent Protocol (8) uTP Packet Analysis, Super Seeders.

The uTorrent transport protocol (uTP) is a transport protocol built on UDP. uTP dynamically adjusts the packet size; typically, the faster the transmission rate, the larger the packet size used. Usually, it minimizes to 150 bytes per packet. Small packets do not block slow uplinks but have higher network overhead for headers.

Data Format

The data format of uTorrent is as follows:

0       4       8               16              24              32
+-------+-------+---------------+---------------+---------------+
| type  | ver   | extension     | connection_id                 |
+-------+-------+---------------+---------------+---------------+
| timestamp_microseconds                                        |
+---------------+---------------+---------------+---------------+
| timestamp_difference_microseconds                             |
+---------------+---------------+---------------+---------------+
| wnd_size                                                      |
+---------------+---------------+---------------+---------------+
| seq_nr                        | ack_nr                        |
+---------------+---------------+---------------+---------------+

Among them,

Type (type):

  • ST_DATA (0): ST_DATA packet with a data payload;
  • ST_FIN (1): Finalizes the connection. It is the final packet to close the connection, akin to TCP’s FIN flag. Connections should record this sequence number as eof_pkt to continue waiting for any possibly lost and out-of-order packets;
  • ST_STATE (2): State packet. Used to transmit an acknowledgment without any data. Does not increment seq_nr;
  • ST_RESET (3): Forces connection termination. Similar to TCP’s RST flag.
  • ST_SYN (4): Connection SYN. Analogous to TCP’s SYN flag, this packet initiates a connection. The sequence number initializes to 1. All subsequent packets (except for retransmitted ST_SYN) should be sent using the connection ID and connection ID + 1.

Version (version):

The protocol version number. Currently, it is 1.

Connection ID (connection_id):

Random content to identify packets belonging to the same connection. The ID is generated by the connection initiator, and response data uses ID + 1.

Timestamp (timestamp_microseconds):

The timestamp when this packet is sent. The higher the resolution, the better.

Timestamp Difference (timestamp_difference_microseconds):

The difference between the local time of the last received packet and the timestamp of the last received packet.

Window Size (wnd_size):

The size of the ready receiving window in bytes. The window size indicates the number of bytes currently in transmission but not yet acknowledged.

Extension Field (extension):

The type of the first extension in the extension chain. 0 indicates no extension. (Currently, there is only one extension, which is selective acknowledgment.)

Selective Acknowledgment (Selective ACK):

Selectively acknowledges out-of-order received packets as an extension. Its payload is a bitmask of at least 32 bits, expressed in multiples of 32 bits. Each bit represents a packet in the send window. Extra bits are ignored. A set bit indicates the packet has been received, and a cleared bit indicates it has not been received.

Sequence Number (seq_nr):

This is the sequence number of this packet. Unlike TCP, uTP sequence numbers refer to packets, not bytes.

Acknowledgment Number (ack_nr):

This is the sequence number of the last packet received in the connection.

Connection Process

The connection state diagram in BEP, written similarly to C language, describes the connection process. c.* is the connection state, and pkt.* is the packet field, with a process similar to TCP’s handshake:

Initiator                                               Receiver

          | c.state = CS_SYN_SENT                         |
          | c.seq_nr = 1                                  |
          | c.conn_id_recv = rand()                       |
          | c.conn_id_send = c.conn_id_recv + 1           |
          |                                               |
          |                                               |
          | ST_SYN                                        |
          |   seq_nr=c.seq_nr++                           |
          |   ack_nr=*                                    |
          |   conn_id=c.rcv_conn_id                       |
          | >-------------------------------------------> |
          |             c.receive_conn_id = pkt.conn_id+1 |
          |             c.send_conn_id = pkt.conn_id      |
          |             c.seq_nr = rand()                 |
          |             c.ack_nr = pkt.seq_nr             |
          |             c.state = CS_SYN_RECV             |
          |                                               |
          |                                               |
          |                                               |
          |                                               |
          |                     ST_STATE                  |
          |                       seq_nr=c.seq_nr++       |
          |                       ack_nr=c.ack_nr         |
          |                       conn_id=c.send_conn_id  |
          | <------------------------------------------<  |
          | c.state = CS_CONNECTED                        |
          | c.ack_nr = pkt.seq_nr                         |
          |                                               |
          |                                               |
          |                                               |
          | ST_DATA                                       |
          |   seq_nr=c.seq_nr++                           |
          |   ack_nr=c.ack_nr                             |
          |   conn_id=c.conn_id_send                      |
          | >-------------------------------------------> |
          |                        c.ack_nr = pkt.seq_nr  |
          |                        c.state = CS_CONNECTED |
          |                                               | Connection established
     .. ..|.. .. .. .. .. .. .. .. .. .. .. .. .. .. .. ..|.. ..
          |                                               |
          |                     ST_DATA                   |
          |                       seq_nr=c.seq_nr++       |
          |                       ack_nr=c.ack_nr         |
          |                       conn_id=c.send_conn_id  |
          | <------------------------------------------<  |
          | c.ack_nr = pkt.seq_nr                         |
          |                                               |
          |                                               |
          V                                               V

Content analysis, designates the connection initiator as A, and receiver as B. The connection is discussed in segments:

Initialization

A is set to the CS_SYN_SENT state, sending a connection request.

A’s seq_nr initializes to 1, with conn_id_recv and conn_id_send initialized randomly and to conn_id_recv + 1, respectively.

Handshake

  • A sends a SYN packet to B, establishing a connection request.
  • B extracts the connection ID from the packet and sets receive_conn_id to it plus one, setting send_conn_id as the connection ID, generating a random seq_nr for the following packets and switching to the CS_SYN_RECV state, accepting the connection request.
  • B sends a state packet to A that includes its seq_nr, ack_nr, and connection ID. A switches to the CS_CONNECTED state, and the connection is established.

Data Transmission

Both parties can send and receive data over the connection.

Timeout and Packet Loss

Timeouts

The initial timeout is set to 1000 milliseconds and subsequently updated. For each consecutively following timeout of the sent packets, the timeout duration will be doubled. The update logic is:

When communicating on a packet only sent once, when the packet is acknowledged, the connection’s round trip time (RTT) and RTT variance (rtt_var) should be updated as well. Meanwhile, the default timeout for packets associated with the connection is updated each time RTT and RTT variance are updated, set to a multiple of RTT and RTT variance but never less than 500 milliseconds, for example:

timeout = max(rtt + rtt_var * 4, 500)

Packet Loss

  • Determining Packet Loss: If a packet with sequence number (seq_nr – cur_window) remains unacknowledged but three or more subsequent packets have been acknowledged meanwhile, the packet is considered lost.
  • Fast Retransmit: Upon receiving three duplicate acknowledgments, the packet presumed to be sequence number (ack_nr + 1) is also considered lost, assuming it had been sent, and the maximum window size (max_window) is halved similar to TCP’s congestion control.

When packets are lost, the maximum window size (max_window) is halved, similar to TCP’s congestion control.

Congestion Control

TCP has window-based congestion control, whereas UDP does not have any similar control; thus, congestion control is self-implemented. BEP presents a congestion control scheme based on delay as the controlling standard, with uTP setting the target delay to 100 milliseconds. Congestion control aims to ensure the connection does not have more than 100 milliseconds of delay in the buffer. If exceeded, the sending rate is slowed to effectively concede to TCP traffic.

Packets sent via uTP contain high-resolution timestamps. Upon receipt, the receiving end calculates the difference between the timestamp and the local time when the packet was received, feeding this back to the sender. This differential is used to compute the recent minimum value over two minutes as the baseline (approximating minimum delay) for delay factor calculations.

The calculation is determined by:

delay_factor = off_target / CCONTROL_TARGET; 
window_factor = outstanding_packet / max_window;
scaled_gain = MAX_CWND_INCREASE_PACKETS_PER_RTT * delay_factor * window_factor;

Delay_factor is a factor indicating deviation from the target delay, window_factor is a factor of the relationship between window size and the number of unacknowledged packets, and scaled_gain is the gain value applied to the window size. A window size of zero implies the socket cannot send packets, triggering a timeout and resetting the window size to avoid excessive congestion.

Holepunch Extension

The Holepunch extension protocol is based on the basic extension protocol and provides a way to utilize relay nodes to establish uTorrent connections. For details related to the extension protocol, refer to A Brief Analysis of the Bittorrent Protocol (5) Extended Protocol and Metadata Transmission Extension. The extension identifier is ut_holepunch. Below is an example of a handshake that only includes the Holepunch extension, where 4 is chosen as the message ID. Different clients may choose differently in practice:

{
  m: {
    ut_holepunch: 4,
  }
}

Holepunch extension data includes, apart from data length, type, and message ID:

  • 1-byte type
  • 1-byte address type, IPv4 is 0, IPv6 is 0x01
  • 4 or 16-byte IP address (IPv4 or IPv6)
  • port (2 bytes)
  • 4-byte error code, 0 if no error

Supported message types include:

Type Code

Type

Description

0x00

rendezvous

Rendezvous, sending connection messages to initiating peer and target peer

0x01

connect

Connect, opening a uTP connection

0x02

error

Error, unable to complete

A node can write the target node information of the connection into a packet and send it to another node (hereafter called the relay node) with a rendezvous message. If the relay node is connected to the target node and the target node supports it, the relay node will send a connect message with port information of each other to both the node and the target node. Upon receiving the connect message, each node will initiate a uTP connection with the other node. Note that it is possible for both uTP connection attempts to succeed simultaneously, in which case the system needs to handle multiple connections. If unable to do so, the relay node is expected to reply to the initiating node with an error message.

In implementation, if the target node does not wish to connect, it should ignore the connect message or request without sending an error message to the relay or request node. If the request node did not indicate support for ut_holepunch in the extension handshake phase, the relay node should ignore ut_holepunch messages. If a connection already exists between nodes, connection information should also be ignored.

Common error messages include:

Code

Error Message

Description

0x01

NoSuchPeer

Target node invalid

0x02

NotConnected

Relay node not connected to the target node

0x03

NoSupport

Target node does not support holepunch extension

0x04

NoSelf

Target node’s endpoint information is incorrectly set to the relay node’s information

In the case of NoSuchPeer, NotConnected error code can also be chosen to send back.

The Holepunch extension provides more possibilities for downloaders located behind blocks preventing incoming connections to connect to external nodes but still faces many limitations. It requires analysis based on network environment and downloader implementation.

UDP-based Tracker

In the standard BitTorrent protocol, nodes communicate with Tracker servers using HTTP to obtain the list of nodes. While the request and response content are relatively short, the need to establish TCP connections that frequently open and close increases network overhead. Using UDP for Tracker requests can reduce data traffic while simplifying the implementation of the Tracker. This reduction is particularly important when handling a high volume of requests, although it makes little difference for nodes.

UDP is an “unreliable” protocol, whereby a downloader needs to resend requests 15 * 2 ^ n seconds after receiving no response, with n as the number of failed requests, reaching a maximum of 8. Notably, Connection ID expiration also necessitates resending requests. The Connection ID alleviates UDP source address forging, where the Tracker generates a Connection ID for downloaders upon receiving a request. The downloader must resend this ID to the Tracker for source address verification, with the ID typically reusable for multiple requests and a validity period of one minute.

Connection

Tracker request contents are sent in big-endian order. After initiating a request, obtaining a Connection ID is paramount, requiring selection of a random message transmission ID, construction, and dispatch of the following packet:

0       4       8               16
+-------+-------+---------------+
| 0x41727101980 |   0   | tra_id|
+---------------+---------------+

Offset (bytes)

Size, Type

Description

Value

0

64-bit Integer

Protocol ID

0x41727101980

8

32-bit Integer

Action

0

12

32-bit Integer

Transaction ID

 

The receipt will include at least a 16-byte response:

0       4       8               16
+-------+-------+---------------+
|   0   |tra_id |     con_id    |
+---------------+---------------+

Offset (bytes)

Size, Type

Description

Value

0

32-bit Integer

Action

0

4

32-bit Integer

Transaction ID

 

8

64-bit Integer

Connection ID

 

Post-response, store the Connection ID, using it for data requests before the timeout (one minute) is active.

Request

Approximately 74 torrents’ data can be requested concurrently. The request is as follows:

0       4       8               16
+-------+-------+---------------+
|     con_id    |   2   |tra_id |
+---------------+---------------+
|     info_hash ...      
+---------------+---------------+

Offset (bytes)

Size, Type

Description

Value

0

64-bit Integer

Connection ID

 

8

32-bit Integer

Action

2

12

32-bit Integer

Transaction ID

 

16+20*n

20-byte String

Info Hash

 

Client Announcement

In an IPv4 environment, the request:

0       4       8               16              24              32
+-------+-------+---------------+---------------+---------------+
|   con_id      |   1   |tra_id |     info_hash...              |
+-------+-------+---------------+---------------+---------------+
| ...   |                   peer_id             |    download   |
+---------------+---------------+---------------+---------------+
|      left     |     upload    | event |   ip address  |  key  |
+---------------+---------------+---------------+---------------+
|  want | port  |
+---------------+---------------+---------------+---------------+

Offset (bytes)

Size, Type

Description

Value

0

64-bit Integer

Connection ID

 

8

32-bit Integer

Action

1

12

32-bit Integer

Transaction ID

 

16

20-byte String

Info Hash

 

36

20-byte String

Peer ID

 

56

64-bit Integer

Downloaded

 

64

64-bit Integer

Left

 

72

64-bit Integer

Uploaded

 

80

32-bit Integer

Event

0: None; 1: Complete; 2: Start; 3: Stop

84

32-bit Integer

IP Address

Default 0

88

32-bit Integer

Key

 

92

32-bit Integer

Expected Return Count

Default -1

96

16-bit Integer

Port

 

Although the IP and port are included, most Trackers rarely recognize or tailor response content based on them.

Response:

0       4       8               16              24              32
+-------+-------+---------------+---------------+---------------+
|   1   |tra_id |inter_ |leech_ |seeders|   IP  |pot|IP...
+-------+-------+---------------+---------------+---------------+

Offset (bytes)

Size, Type

Description

Value

0

32-bit Integer

Action

1

4

32-bit Integer

Transaction ID

 

8

32-bit Integer

Interval

 

12

32-bit Integer

Leechers

 

16

32-bit Integer

Seeders

 

20 + 6 * n

32-bit Integer

IP Address

24 + 6 * n

16-bit Integer

Port

For IPv6, response addresses and ports expand from 6 bytes to 18 bytes, otherwise consistent. The IP Address in the request is invalid, set to 0.

Errors

An error message is formatted as follows:

Offset (bytes)

Size, Type

Description

Value

0

32-bit Integer

Action

3

4

32-bit Integer

Transaction ID

 

8

String

Error Message

 

This Section Ends

At this point, the final and accepted proposals of BitTorrent have been analyzed and addressed, barring Quick Exchange and WebSeed. Continued analyses of the aforementioned content will follow, with additional shares on more BitTorrent drafts forthcoming. Please stay tuned, and if there are updates, links will be available here: