Mastering Linux TCP Stack: Insights from Software Engineer Liam on Packet Drill and Network Protocols

Linux Code Reading Arena Original Article

The Linux TCP kernel protocol stack is an incredibly complex implementation, encapsulating over 20 years of design and execution, while continuously updating. The related RFCs and optimization efforts are ongoing. Studying and learning such a challenging component as the Linux TCP kernel protocol stack becomes a significant task.

Of course, the most important and fundamental requirement is to read the related RFCs and the code implementation within the kernel. This is the absolute basic requirement. Simply browsing and conducting static analysis on the code is wholly insufficient to conquer a monster like the TCP kernel protocol stack. This is due in part to the inherent design of the TCP protocol itself; it is filled with various boundary conditions and exception handling requirements. TCP, being a stateful protocol, often requires a sequence of packets to trigger many boundary conditions, along with factors like latency and other conditions.

Fortunately, Google solved this dilemma for everyone in 2013. Google released the TCP kernel protocol stack testing tool Packet Drill in 2013. This tool has indeed lived up to its name, greatly simplifying the learning and testing difficulty of the TCP kernel protocol stack. It allows you to explore every detail of the TCP kernel protocol stack freely. Google’s tool has truly been a blessing to humanity. PacketDrill GitHub link:

https://github.com/google/packetdrill

By using Packet Drill, users can freely construct packet sequences, specify all packet formats (similar to tcpdump syntax), and communicate with the target system’s TCP kernel protocol stack via the TUN interface. They then verify the packets received from the target system’s TCP kernel protocol stack to determine if the tests pass. By further integrating Wireshark with Packet Drill, users can gain the most intuitive and specific experience. Every detail of each packet is under control, propelling you to the pinnacle of success instantly.

Basic Principles of Packet Drill

TUN Network Device

TUN is a virtual network device under Linux, enabling direct communication with the network layer, allowing applications to directly send and receive IP packets.

Liam software engineer

Packet Drill Script Parsing/Execution Engine

  • First, the Packet Drill script must be parsed and decomposed into parts that send and receive packets through traditional socket interfaces and parts that do so via the TUN interface.
  • Perform the corresponding actions on the traditional socket interface.
  • Execute the corresponding actions on the TUN interface and compare the received data.
  • In this article, the socket interface mainly plays the role of the server side. The TUN interface acts as the client side. This allows us to fully control the IP packets we are about to send through the TUN interface and receive feedback from the TCP protocol stack, comparing it with preset data.

Packet Drill Syntax Introduction

Relative Timing

Each event (sending/receiving/initiating system calls) in Packet Drill has a relative timing offset from previous events. Generally, this is expressed using +number. For example, +0 means initiating immediately after the previous event ends. +.1 implies initiating after 0.1 seconds post the previous event. And so on.

System Calls

Packet Drill integrates system calls, allowing operations such as socket, bind, read, write, getsocketoption, etc., to be completed via scripts. Those familiar with socket programming can easily understand and use it.

Sending and Receiving Packets

  • Via the kernel stack side. Packet sending and receiving can be completed by invoking read/write system calls. However, since TCP is a stateful protocol stack, the kernel stack itself will also send packets depending on its state (e.g., ACK/SACK).
  • Via the TUN device side. Packet Drill uses < to denote sending packets and > to denote receiving packets.

Packet Format Description

Packet format is expressed similarly to tcpdump. For instance, S 0:0(0) win 1000 signifies a syn packet with window size 1000, and TCP’s option mss (maximum segment size) is 1000. If unfamiliar with packet formats, it’s advisable to review “TCP/IP Illustrated” Volume 1.

For further information, please refer to Drilling Network Stacks with packetdrill:

https://storage.googleapis.com/pub-tools-public-publication-data/pdf/41848.pdf

Practical Examples

Let’s further explore through two examples

Handshake and Teardown

We will review this classic process using packet drill scripts.

First, let’s revisit the TCP protocol standard handshake and teardown process

Liam software engineer

Next, let’s reproduce the entire process using packet drill scripts

// Create the server-side socket, which will communicate through the kernel protocol stack
// Note that this uses traditional system calls
0   socket(..., SOCK_STREAM, IPPROTO_TCP) = 3

// Set the corresponding socket options
// Note that this uses traditional system calls
+0  setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0

// Bind socket
// Note that this uses traditional system calls
+0  bind(3, ..., ...) = 0

// Listen on the socket
// Note that this uses traditional system calls
+0  listen(3, 1) = 0

// Client side (TUN) sends the first SYN handshake packet
// Note that the syntax and SYN sequence are relative, starting from 0.
+0  < S 0:0(0) win 1000 

// Client side (TUN) expects to receive SYN+ACK packet format ack.no=ISN(c)+1
// Refer to the standard flowchart; the final <...> denotes any TCP option
// This is the second step of the handshake
+0  > S. 0:0(0) ack 1 <...>

// Client side (TUN) sends ACK packet seq = ISN(c)+1, ack = ISN(c) +1
// This is the third step of the handshake
+.1 < . 1:1(0) ack 1 win 1000

// Handshake successful, server-side socket returns established socket
// Fetch this stream socket using the accept system call
+0  accept(3, ..., ...) = 4

// Server side writes 10 bytes to the stream
// Completed by making a system call
+0 write(4, ..., 10)=10

// Client side expects to receive 10 bytes
+0 > P. 1:11(10) ack 1

// Client side acknowledges ack saying it received 10 bytes
+.0 < . 1:1(0) ack 11 win 1000

// Client closes the connection, sending FIN packet
+0 < F. 1:1(0) ack 11 win 4000

// Client side expects the FIN ack packet from the server side
// Returned by the kernel protocol stack. ack = server seq +1, seq = server ack
// Refer to the standard flowchart
+.005 > . 11:11(0) ack 2

// Server closes the connection via a system call
+0 close(4) = 0

// Client expects the FIN packet format from the server
+0 > F. 11:11(0) ack 2

// Client sends ACK packet acknowledging server-side FIN packet
+0 < . 2:2(0) ack 12 win 4000

At this point, we have manually completed the entire process of initiating and closing a connection. Let’s then use Wireshark to verify

By combining packetdrill and Wireshark, every step is under our control,

SACK

We will use packet drill to explore some more complex cases, such as how the kernel protocol stack responds to various combinations of SACK.

SACK is a key option (generally found in the options section of the header) in the TCP protocol’s optimized retransmission mechanism.

In the most primitive scenario, if the sender waits to receive an ACK for each packet before sending the next, the efficiency would be extremely low. With the introduction of the sliding window, the sender can send multiple packets simultaneously. However, if a packet in the middle is lost (without its corresponding ACK being received), all subsequent packets must be resent starting from that lost packet, which leads to immense waste.

SACK is an optimization method designed to avoid unnecessary retransmissions, informing the sender which packets have been received and thus do not need to be resent. TCP options allow containing up to three SACK options—representing intervals of received packets. Here’s a more specific example to clarify these abstract concepts.

Example Explanation

In this example, we will send packets in the order 1, 3, 5, 6, 8, 4, 7, 2, to test whether the kernel TCP protocol stack’s SACK logic behaves as described in the RFC.

// Initialization process establishing the server-side socket, not repeated here
   +0 socket(..., SOCK_STREAM, IPPROTO_TCP) = 3
   +0 setsockopt(3, SOL_SOCKET, SO_REUSEADDR, [1], 4) = 0
   +0 bind(3, ..., ...) = 0
   +0 listen(3, 1) = 0

   // Client sends handshake packets and receives server response, not repeated here. Note SACK is activated
   +.1 < S  0:0(0) win 50000 
   +0 > S. 0:0(0) ack 1 win 32000 
   +0 < .  1:1(0) ack 1 win 50000

   // Server ready
   +.1 accept(3, ..., ...) = 4

   // Send packet 1
   +0 < .  1:1001(1000) ack 1  win 50000
   // Send packet 3, packet 2 is sent last
   +0 < .  2001:3001(1000) ack 1 win 50000
   // Send packet 5, packet 4 is sent out of order
   +0 < .  4001:5001(1000) ack 1 win 50000
   // Send packet 6
   +0 < .  5001:6001(1000) ack 1 win 50000        
   // Send packet 8, packet 7 is sent out of order
   +0 < P.  7001:8001(1000) ack 1 win 50000
   // Send packet 4
   +0 < .  3001:4001(1000) ack 1 win 50000
   // Send packet 7
   +0 < .  6001:7001(1000) ack 1 win 50000
   // First packet's ACK received
   +0 > . 1:1(0) ack 1001

   // SACK received, reporting out-of-order receipt of packet 3, but not packet 2.
   +0 > . 1:1(0) ack 1001 win 31000 <nop,nop,sack 2001:3001>
   // SACK received, reporting receipt of out-of-order packets 3 and 5, but not packets 2 or 4
   +0 > . 1:1(0) ack 1001 win 31000 <nop,nop,sack 4001:5001 2001:3001>
   // SACK received, reporting receipt of out-of-order packets 3, 5, but not packets 2 or 4
   +0 > . 1:1(0) ack 1001 win 31000 <nop,nop,sack 4001:6001 2001:3001>
   // SACK received, reporting out-of-order receipt of packets 3, 5, 6, and 8, but not 2, 4, or 7
   +0 > . 1:1(0) ack 1001 win 31000 <nop,nop,sack 7001:8001 4001:6001 2001:3001>
   // SACK received, reporting out-of-order receipt of packets 3, 4, 5, 6, and 8, but not 2 or 7
   +0 > . 1:1(0) ack 1001 win 31000 <nop,nop,sack 2001:6001 7001:8001>
   // SACK received, reporting out-of-order receipt of packets 3, 4, 5, 6, 7, and 8, but not 2
   +0 > . 1:1(0) ack 1001 win 31000 <nop,nop,sack 2001:8001>

   // Send packet 2, all packets complete
   +0 < .  1001:2001(1000) ack 1 win 50000

   +0 > . 1:1(0) ack 8001`

We will then use Wireshark to verify.

Everything matches perfectly.

Packet Drill actually offers much more complex and elaborate plays, capable of thoroughly testing various edge conditions. There might be opportunities to share further with you in the future.

Reference Information

Links to example scripts:

https://gitee.com/block_chainsaw/linux-kernel-tcp-study.git

SecuritySocket ProgrammingLinuxHTTPSNetwork Security