Optimizing Nginx with Tomcat Server: Solving KeepAlive and Connection Timeout Issues

Contents hide

1. Project Environment: Our web project uses nginx as a front-end proxy, specifically for proxying purposes, along with 3 tomcat instances running on the same server.2. Business Logic Involved: The project includes functionalities such as file upload (potentially large files, like a 100MB Android game), client interface requests, and website backend management.3. Problem Reproduction Process: 3.1 After configuring the tomcat server, nginx was set up as a front-end proxy, configured to use an HTTP proxy. 3.2 Issue 1: Large files could not be uploaded successfully by the administrator in the backend, leading to a timeout. Repeated testing showed that uploads exceeding 1 minute resulted in a timeout, while smaller files were unaffected. 3.3 The default HTTP connection timeout for nginx is 75 seconds. Connections exceeding this time were terminated, often during large file uploads. The solution was to increase the nginx HTTP connection timeout to 30 minutes by setting `keepalive_timeout=1800;`, which resolved the file upload issue. 3.4 After 2 days of operation, the server crashed. Restarting nginx solved the problem temporarily, but the crash reoccurred 2 hours later. Investigations revealed the nginx error log contained the message: “socket() failed (24: Too many open files) while connecting to upstream”, indicating the nginx connection limit (default 1024) was reached. 3.5 To address this, I increased the nginx connection limit by setting `worker_connections 10240;`. This seemed to resolve the issue in the short term. However, the error “socket() failed (24: Too many open files) while connecting to upstream” reappeared intermittently. 3.6 I realized that simply changing the nginx connection limit wasn’t a comprehensive solution. Further research identified the problem was linked to the `keepalive_timeout` setting being too long. The client interface typically needs quick access, and the HTTP connection should be released once access is complete. Due to incorrect nginx configurations, these connections weren’t being released, leading to an accumulation of active connections and ultimately causing nginx to crash.

4. So, how should this problem be solved? Lowering the keepalive_timeout time might result in unsuccessful uploads; increasing it results in many invalid HTTP connections occupying nginx’s connection count. This seems like a dilemma!

Here comes the important part:

How to Set Nginx’s TCP KeepAlive

At the start, I mentioned a recent issue where a client sends a request to the Nginx server, and the server takes a period of calculation before returning. The time exceeded the LVS Session’s hold of 90s. Using Tcpdump at the server and analyzing with Wireshark locally displayed results like the second picture, showing a roughly 90-second gap between the 5th and last packet timestamps. After determining that the issue was the expiration of the LVS Session hold time, I began looking into how to set Nginx’s TCP KeepAlive. The first option I found was keepalive_timeout. My colleague informed me that when the value of keepalive_timeout is set to 0, it disables keepalive, and when set to a positive integer, it indicates how many seconds to keep the connection. Therefore, I set keepalive_timeout to 75s, but actual test results showed it was ineffective. Clearly, keepalive_timeout couldn’t solve the TCP layer’s KeepAlive issue. In fact, there are quite a few options in Nginx related to keepalive. The usual Nginx usage is as follows:

tomcat server >>

From the TCP layer, Nginx needs to be concerned with KeepAlive not only with the Client but also with the Upstream. Simultaneously, from the HTTP protocol layer, Nginx needs to be concerned with the Client Keep-Alive, and if the Upstream uses the HTTP protocol, it also needs to be concerned with Upstream Keep-Alive. Overall, it’s rather complex. Once you understand both TCP and HTTP KeepAlive, you won’t mistakenly set Nginx’s KeepAlive. Initially, while solving the problem, I wasn’t sure if Nginx had a configuration option for TCP keepAlive, so I opened the Nginx source code and searched for TCP_KEEPIDLE. The related code is as follows:

tomcat server >>

From the context of the code, I found out that TCP KeepAlive can be configured, so I continued to search which option allows configuration. Finally, I discovered that the listen directive’s so_keepalive option can configure KeepAlive for TCP sockets.

tomcat server >>

The above three parameters can only be used one at a time, not simultaneously, such as so_keepalive=on, so_keepalive=off, or so_keepalive=30s:: (meaning it waits 30 seconds without data packets before sending a probe packet). By setting listen 80,so_keepalive=60s::, I successfully solved the Nginx issue of maintaining long connections in LVS, avoiding other costly solutions. In commercial load devices, if you encounter similar issues, this approach can also solve them.

Unicorn Network Threat Analyzer

Optimizing Nginx with Tomcat Server: Solving KeepAlive and Connection Timeout Issues

Here comes the important part:

How to Set Nginx’s TCP KeepAlive

Real-time,Accuracy and Efficiency

Products

Quick Links

Company

Unicorn Network Threat Analyzer

Optimizing Nginx with Tomcat Server: Solving KeepAlive and Connection Timeout Issues

Here comes the important part:

How to Set Nginx’s TCP KeepAlive

Related posts:

Real-time,Accuracy and Efficiency

Products

Quick Links

Company