When debugging network quality, we typically focus on two factors: latency and throughput (bandwidth). Latency is relatively easy to verify by simply pinging or performing a traceroute to gain insights. This article shares a method for debugging throughput, with a focus on TCP congestion control.
Scenarios that prioritize throughput are usually what are known as Long Fat Networks (LFN). For instance, downloading large files. If throughput hasnât reached the networkâs limit, it may be primarily affected by three factors:
Bottlenecks at the sending end typically involve insufficient buffer size, as the sending process involves the application calling a syscall to place data in the buffer, then having the system send it out. If the buffer is full, the application will block (assuming a blocking API is used) until the buffer becomes available and can continue to write, following the producer-consumer model.

Bottlenecks at the sender are generally easier to debug; often, you can simply check application logs to see when blocks occurred. In most cases, itâs the second or third situation, which are more challenging to diagnose. This happens when the applicationâs data has been written to the systemâs buffer, but the system hasnât sent it out quickly enough.
TCP, to optimize transmission efficiency (note that this refers to overall network efficiency rather than just one TCP connection), will:
To protect the receiver, when connections are established at both ends, they agree on the receiverâs buffer size (receiver window size, rwnd), and during subsequent sends, the receiver also reports its remaining window size in every acknowledgment packet. This ensures the sender doesnât exceed the receiverâs buffer size. (Meaning that the sender is responsible for ensuring the total unacknowledged doesnât exceed the receiverâs buffer.)

To protect the network, the principle is also to maintain a window, called the Congestion Window (cwnd), which is the current network limit. The sender wonât transmit more data than this windowâs capacity (the total unacknowledged wonât exceed cwnd).
How do we find this cwnd value?
This is crucialâby default, the algorithm is cubic, but other algorithms like Googleâs BBR can be used.
The main logic is slow start: send data to test; if the receiverâs acknowledgment is correctly received, it indicates the network can handle this throughput. Then cwnd is doubled, and testing continues until one of the following conditions occurs:
Point 2 is easy to understand, indicating that network throughput isnât a bottleneck; the bottleneck is at the receiverâs insufficient buffer size. cwnd canât exceed rwnd, or the receiver will be overloaded.
For point 1, essentially, the sender uses packet loss to detect network conditions. If no packet loss occurs, everything is normal. If packet loss occurs, the network canât handle this sending speed, and the sender will halve the cwnd.
But the real reason for point 1 may not be a network throughput bottleneck; it might be the following situations:
Reasons 2 and 3 can both cause a drop in cwnd, preventing full utilization of network throughput.
The above is the basic principle, and now letâs introduce how to pinpoint such issues.
This window size is directly in the TCP header, so you just need to capture it to see this field.

However, the actual window size needs to be multiplied by a factor, which is in the TCP options. Thus, to analyze a TCP connectionâs window size, you must capture packets during the handshake phase, or else the negotiated factor cannot be known.

Congestion control is a dynamic variable calculated by the sender through algorithms, and it wonât be reflected in the protocolâs transmitted data. Therefore, to observe it, you must look at the sending machine.
In Linux, you can use certain options to print out the parameters of a TCP connection.

This showcases units as real size 1460 bytes * 10.
Wireshark offers highly useful statistical features that allow you to immediately identify where the current bottleneck occurs. However, when I first opened this graph, I was clueless and couldnât find resources on how to interpret it. Fortunately, I learned, and I document it here to teach you as well.
First, the opening method is as follows:

Then you will see a graph like this.

Itâs crucial to know that the tcptrace graph represents single-direction data transmission, as TCP is a duplex protocol, meaning both ends can send data. At the top, it shows the current graph data is from 10.0.0.1 to 192.168.0.1. You can switch the viewâs direction by clicking the button at the bottom right.
The X-axis represents time, which is easy to grasp.
Understanding the Y-axis, which represents the Sequence Number in the TCP packets, is critical. All data in the diagram is based on this Sequence Number.
So, if you see a chart as above, it indicates youâre viewing it backward because the Sequence Number hasnât increased, meaning thereâs almost no data transmission. You need to click Switch Direction.

Thatâs more like it; you can see our transmitted Sequence Number increases over time.
There are three lines here with the following meanings:

In addition, there are two other types of lines:

Always remember that the Y-axis represents the Sequence Number. The red line indicates the SACK line showing the segment of Sequence Number that has been received. Coupled with the yellow line showing ACKed Sequence Number, the sender will know thereâs a missing packet in the blank space between the red and yellow lines. Therefore, retransmission is required. The blue line indicates that retransmission has occurred.
Understanding how to interpret these graphs allows us to recognize several common patterns:

Many red SACKs indicate the receiver repeatedly states: âI didnât receive a packet,â or âI missed a packet.â

This graph shows that as soon as the yellow line (receiverâs ACK) rises, the blue follows (sender starts sending) until it fills the green line (window size), indicating the network isnât a bottleneck and suggests increasing the receiverâs buffer size.

This chart shows that the receiverâs window size is far from a bottleneck, with plenty of available space.

Upon magnification, it shows many packet losses and retransmissions, sending only small amounts of data each time. This suggests the cwnd might be too small, constrained by congestion control algorithms.
[1]
mtr:
[2]
rfc7323:
[3]
BBR:
[4]
TCP handshake node negotiation through TCP Options:
[5]
TCP MSS.
[6]
Colleague: