It was discovered that accessing certain business services of the company resulted in very unstable speeds, generally slower than those of competitors. Analysis suggests an incompatibility between the TCP congestion control algorithm on the SESU10 master and the Windows Ack frequency control strategy. Keywords to focus on include Linux TCP optimization.
At least for now, it has been confirmed that version 2.6.16 of the kernel has this issue. Applying a TCP optimization patch or switching to Tlinux can solve the problem.
Issue Manifestation:
In a test using the experience network environment: In the case of large file downloads, Baiduâs download speed averaged at 600KBPS, while our download speed averaged below 100Kbps; in the Intertainment Webgame scenario, the TNT business download speed was approximately 25% of DDTâs.
Here is a typical download speed curve:
Our serverâs curve: (Y-axis unit: packets/s)
> >
Baidu serverâs download curve:
> >
Test environment to replicate the issue:
Network: Company experience network, standard Unicom 4M ADSL
Server: Linux 64-bit server, Shenzhen data center.
Server programs: Apache, nws (self-developed webserver)
Client: Windows XP, Windows7, any browser, or Xunlei (single-thread download)
Testing tools: Wireshark, httpwatch
Test connections: Self-built CDN, Baidu download, Shenzhen DC+Apache
Problem Analysis:
Through client packet analysis, two problems were found during the slow speed segments:
- The server always waits until the previous data packet is acknowledged before sending the next packet.
- Windows always waits about 200ms before sending an ACK acknowledgment.
> >> >
For Windows behavior, to prevent excessive ACKs causing network pressure, the MS TCP stack starts a 200ms timer upon receipt of a data packet, and only sends an ACK packet when additional packets are received or the timer expires.
By setting the registry option TcpAckFrequency to 1 to disable Ack delay, the experiment showed that the download speed returned to normal and the issue of slow download speed could not be reproduced.
To configure the max outstanding ACKs in Windows XP/2003/Vista/2008:
[HKEYLOCALMACHINE \SYSTEM \CurrentControlSet \Services \Tcpip \Parameters \Interfaces \{Adapter-id}]
TcpAckFrequency = 1 (Default=2, 1=Disables delayed ACK, 2-n = If n outstanding ACKs before timed interval, sent ACK)
Since it is not possible to force users to avoid the problem by modifying the registry, and competitors do not appear to have a similar issue, it can only be resolved from the Linux side.
On the Linux side, there is initial suspicion of its relationship with the Nagle algorithm. After setting TCP_NODELAY on the nws server and still able to reproduce, the influence of the Nagle algorithm can be ruled out. (In fact, nws generally sends large data packets or directly uses sendfile, which are less likely to be affected by the Nagle algorithm). Additionally, as both Apache and nws can reproduce the issue, there is significant suspicion of a defect in the operating system itself.
As Linux only sends one data packet each time, the congestion window issue is suspected, hypothesized as follows:
Initially, when the client responds with an ACK, the congestion window increases, allowing multiple packets to be sent each time, hence a faster initial transmission speed; network latency jitter or packet loss causes the serverâs protocol stack to determine the packet timeout, resetting the congestion window to 1, sending only one data packet each time, receiving a 200ms response from the client, still treated as timeout, adjusting RTT; until RTT increases to 200ms where it is not counted as a timeout, the congestion window expands, allowing multiple packets to be sent, increasing transmission speed, cyclically.
By testing increasing the initial congestion window to 10 (implementing via kernel replacement loading the static TCP optimization module from a new tech group), the download speed returned to normal.
Attached Xunlei test options:
> >
References
The TCPIP nagle algorithm can slow down networkDesign issues â Sending small data segments over TCP with Winsock