Understanding the CLOSE_WAIT State on Linux Servers: Causes and Solutions

In the previous article, we used TCPDUMP and Wireshark on a CentOS7 server and a Windows10 client to simulate a TCP communication in CLOSE_WAIT state. This article aims to use the data from the previous article to explain the reasons behind the CLOSE_WAIT state on a Linux server.

 CLOSE_WAIT state
1. Cause Analysis: From the Client and Server TCP Communication Process

As seen in the tcpdump and Wireshark captures from the previous article, after the Windows client closes, it actively sends a flagged message to the Linux server. Analyzing from the TCP client-server communication flowchart above: the client first enters the state, and after receiving the flagged message from the server, it enters another state (by opening a new PowerShell window in Windows and checking with a command).

Simultaneously, the server’s TCP state changes. However, because the Linux server did not call a function to close the socket connection, meaning no flagged message was sent to the client that actively closed the TCP connection, this issue arises.

2. Cause Analysis: From the Server Program Perspective

In line 69 of the server program, you can see that once the client closes the socket, the server also calls to close the connection. So why does the phenomenon still occur? The answer is that after the server interacts with the client, only one process (PID:5325) handles the TCP data exchange with the client, and this process is handling the request from the client (PID:5331) established using the telnet command on Linux.

Therefore, after the Windows client completes using the telnet command with the Linux server, there are no related processes to handle it. This can also be seen in the screenshot from section 4, where the TCP state is, but the corresponding process is empty, as can be verified by the command (no files opened by processes due to the Windows client connection).

When the Windows client closes the telnet interface, although the Linux server receives the flagged message from the client, there is no relevant process to call a function to notify the kernel to send a message to the client. This results in the TCP state of the Linux server appearing, while the TCP state of the Windows client becomes the corresponding one.

3. Issue Expansion: From the Server Program Perspective

This may raise questions: Windows client was clearly establishing the state with the Linux server, meaning the process handled it, isn’t this contradictory to the cause analysis in section 2? Actually, this is due to a misunderstanding of the server’s workings. Previously, BZ mistakenly believed the following proposition was true:

In fact, this is not the case. After reviewing the relevant materials, personally, I believe the correct understanding is as follows:

At this point, the issue becomes simple and clear. The Linux kernel completes the “three-way handshake” unrelated to the server process, which can also be confirmed by the fact that the program did not print data at line 51 and 60.

4. Summary

The fundamental reason for a passively closed socket server causing a CLOSE_WAIT state is the failure to call a function to close the socket connection, meaning a flagged message is not sent to the client that actively closed the TCP connection.