0. Review
Previous article:
- Brief Analysis of Bittorrent Protocol (Part 1) Metadata File https://cloud.tencent.com/developer/article/2332701
Review of previous content:
- BitTorrent is a protocol used for distributing files; it breaks down the files to be distributed into fragments and passes them between nodes;
- BitTorrent uses metadata files to describe the files to be distributed, and the metadata files use bencode encoding;
- The data structure of metadata files (torrent files)
- Data verification performs SHA-1 hash calculations on fragments for comparison;
1. Tracker
Tracker GET Request
First, it is important to know that a Tracker request is based on an HTTP request, typically using the GET method. A Tracker GET request should include the following information:
- info_hash (Hash): The SHA-1 hash of the
info
field from the metadata file after encoding; special attention should be paid to adhering to the sorting rules and other regulations specified in the encoding - peer_id (Peer Identifier): A 20-character string that identifies the downloaderâs ID, usually generated by certain rules when a new download task is created.
- ip (IP Address), optional: The IP address (or DNS name), typically only used when the Tracker and downloader are on the same device.
- port (Port Number): The listening port number, usually described in BEP3 as follows: The downloader tries to listen on port 6881, and if that port is busy, it tries ports 6882, then 6883, and so on, up to 6889. If all these are busy, it gives up. Nowadays, many downloaders have their own default or will choose a random port number.
- uploaded (Uploaded Amount): The total uploaded amount, encoded in decimal ASCII.
- downloaded (Downloaded Amount): The total downloaded amount, encoded in decimal ASCII.
- left (Remaining Amount to be Downloaded): The remaining bytes to be downloaded, represented in decimal ASCII encoding. Note, this canât be calculated from downloaded amount and file length, because it might be a resumed transfer, and maybe the already downloaded data has not passed integrity checks, needing to be downloaded again.
- event (Event), optional: Contains
started
,completed
, orstopped
; if empty, it indicates a periodic communication during the interval and is equivalent to the absence of this key.started
indicates that the download has just begun;completed
signifies the download is complete, and if the file is already complete at the start, âcompletedâ will not be sent, andstopped
should be sent when stopped.
The essence of a Tracker request is an HTTP GET request. Using an example torrent from the metadata file section, we deploy a tracker server within a local network (process omitted) for the request example.
First, calculate the hash value of the info section, as follows:
Info content:
JSON languageCopy
{ "length": 1373744, "name": "ChromeSetup.exe", "piece length": 524288, "pieces": b"L\xb2k\xd9\x83\xa4\x84\x84\x00g\xeb\xf7\x1d\xfe3\xa2\xd9\x95\x0f\\\xa6\xb2E\xcd!^\xe3\xed\x8a\x85\xe7>(\x99\x9dU\x06g%b\x08@\xc9\x9fG\xb8S\x8f\x067K#3\xa7\xbf\xb8`N\xac3"}
Use the previously mentioned encode_bencode function to compute encoding, then calculate SHA1 and URL encode the result:
Text languageCopy
%E7%D6%A1%A7%88-%E0%11%0E%3C%BB%FBP%91%FB%DE%EBg%1E%C1
Structure a start download request based on Tracker request structure, as follows:
- info_hash=%e7%d6%a1%a7%88-%e0%11%0e%3c%bb%fbP%91%fb%de%ebg%1e%c1
- peer_id=-None-ARandomString-, making sure itâs 20 characters long
- ip=127.0.0.1
- port=6881
- uploaded=0
- downloaded=0
- left=1373744, nothing downloaded yet, left equals the file size
- event=started, start downloading
The final structured request is as follows:
Text languageCopy
{TrackerURL}?info_hash=%f3%e4%3a%1c!)C%e2%18%eav%a0%1d%5d%c5%9b%d1%88%e6%a1&peer_id=-None-ARandomString-&port=6881&uploaded=0&downloaded=0&left=0&event=started
Tracker GET Response
Sending the request above yields the following Tracker response:
/>Tracker response content
Decode it to get the return dictionary:
JSON languageCopy
{ "complete": 0, "downloaded": 0, "incomplete": 1, "interval": 1863, "mininterval": 931, "peers": b'\n\x00\xb29\x1a\xe1'}
This is a successful request format; next, letâs specifically look at the Trackerâs response content.
If an error occurs, only a failure reason
is needed, requiring no further content.
If itâs a successful response, the response content should include:
- interval: Interval in seconds after which the downloader should make the next request, under normal circumstances.
- peers: List of peer information in a list format, where each piece of information is a dictionary, containing:
- peer id: Peer ID, string
- ip: IP address or DNS name, string
- port: Port number, integer
As noted above, it can be easily discovered that the previous test Trackerâs returned peers information was not in the standard format. This is explained in BEP0023 regarding the compact format which returns the peers list. In the compact format, each peer information consists of a 4-byte IPv4 address and a 2-byte port number, no longer including Peer ID.
It should be noted that since the compact format is recommended, many Trackers only support this mode of response. However, as a downloader, it must support both formats.
Analyzing the request above, we can conclude: Tracker expects the next request in 1863 seconds, and the peers list:
JSON languageCopy
[{'ip': '10.0.178.57', 'port': 6881}]
2. Peers Handshake
The BitTorrent protocol is peer-to-peer, with no concept of server and client. Every node (Peer) is the same, and the way they transmit data to each other is consistent.
Using TCP connections as an example, nodes first establish a TCP connection and then begin a handshake, with handshake data as follows:
- 1 byte for protocol name length, fixed at
19
(0x13); - 19 bytes for the protocol name, fixed at
BitTorrent protocol
; Note: Hereafter, all integers are encoded in a 4-byte big-endian format; - The first 8 bytes after the handshake are reserved for marking extension protocols, and if not considering extension protocols, their values should be 0;
- Information hash, as previously mentioned, the 20-byte SHA1 result, typically, the handshake parties should have identical content here. If multiple downloads are needed, the respondent should respond with the same hash;
- Peer ID, if the Tracker uses the standard format to transmit the node list, PeerID verification is required, disconnecting unsuccessful verifications;
Both parties send the data above, verify each other, completing the handshake process. A zero-length keep-alive message is typically sent every two minutes, with a shorter timeout during data transmission requests.
3. Peers Data Transmission (Keyword: tracker request)
It is recommended to read this in conjunction with Brief Analysis of Bittorrent Protocol (Part Three) Peer Data Transfer Example.
After completing the handshake, both parties can begin exchanging data. All non-keep-alive (zero-length) data begin with a single byte. The opening byte descriptions:
Indicator |
Description |
---|---|
0 |
choke |
1 |
unchoke |
2 |
interested |
3 |
not interested |
4 |
have |
5 |
bitfield |
6 |
request |
7 |
piece |
8 |
cancel |
The first four items, choke and interested, have the following meanings:
- Choked or Unchoked: This indicates whether a peer allows data to be transmitted to the other side. When a Peer is choked, it does not send data to the other side until the choke is lifted.
- Interested or Not Interested: This indicates whether a peer wants the other side to transmit data. If a peer is interested in another peerâs data, it requests data blocks.
When the connection is established, the default state is choked and not interested.
- have: After the downloader completes the download and hash verification of a data block, it informs other nodes of this via have. The have content includes the integer index of the fragment.
- bitfield: Bitfield is sent only once after the connection is established; it informs other nodes of the data fragments it already possesses in a bitfield format. It is important to note that if the sender has no data blocks upon connection establishment, it may choose to skip sending the âbitfieldâ message, which is not mandatory.
- request and piece: A node can request from other nodes via request or provide data to others with piece. Requests include the integer index of the fragment, starting data offset, and fragment size, which can also be viewed as the size of the requested data. Provision includes data length, start mark (7), and the data itself.
- cancel: cancel shares the same payload as the request message. For efficient downloading, a downloader may request the same fragment from multiple nodes simultaneously. Once a fragment is acquired and verified, it informs other nodes to stop sending via cancel.
Tracker and Peer Node Section Finished
The second part of the Tracker and peer nodes ends here. Practical analysis and extension protocol related content links will be provided later here:
Finally, an ad for the essay contest:
I am participating in the 2023 Tencent Technology Creation Bootcamp Second Phase Prize Essay Contest, sharing a prize pool of ten thousand and keyboard watches