MySQL Ping Issue: Background and Network Setup
Building on our previous discussion of unsolved network mysteries, this article focuses on the MySQL Ping issue—a recurring problem where connection anomalies have not been fully resolved. The issue surfaced last year, causing irregular connection drops. Despite changing the MySQL deployment to stabilize operations temporarily, the root cause remains unclear, leaving this MySQL Ping issue unresolved.
After that, I recalled this problem intermittently and consulted experts from all sides. Although there was a little progress, it was still purely theoretical and had not been finally verified in the actual environment.
Mysql Ping connection exception
Network topology
To briefly describe the topology, data centers A and B are interconnected via a WAN dedicated line, and the intermediate devices include multiple switches and firewalls.
Business connection is Client 192.168.1.1 -> LVS 10.2.1.1 -> Mysql database 10.1.1.1.
Problem phenomenon
First, the development colleagues discovered through monitoring that there was a connection problem with the client. The Mysql connection would be disconnected and reconnected at irregular intervals, affecting business operations. After that, they conducted relevant inspections in conjunction with LVS server colleagues, but found nothing. It was suspected to be a network problem, so they reported it to the network team.
Routine network checks revealed no issues. Packets were then captured on the client and LVS server to troubleshoot the problem. The following strange issues were discovered through analysis of the data packets.
Client packet capture
TCP three-way handshake, RTT 0.026511s, client MSS 1460, server MSS 1452, Window scale is supported, and the value is 7.
Wireshark specifies port 3334 as the MySQL port, and you can see some interactive information, Request Query and Response OK, as well as Request Ping and Response OK . The concept of MySQL Ping is as follows:
https://baike.baidu.com/item/mysql_ping/4503428?fr=aladdin https://dev.mysql.com/doc/connectors/en/apis-php-mysqli.ping.html
It is probably used to check whether the server has closed the connection. If the connection is normal, it returns 0; if an error occurs, it returns a non-zero value. Returning a non-zero value does not mean that the server itself is closed, but it may be due to network reasons.
Because the feedback is that the MySQL connection is interrupted, try to filter by mysql.error_code
or mysql.error.message
and return tcp.stream
to view the specific data packets.
After the client performed a Ping check, the server responded with an ERR message Unkonwn command
. The client then Request Quit and then waved four times, disconnecting the connection.
I checked the command in frame 69774, which showed that the Mysql Ping command was normal. So the problem might be in the intermediate transmission stage or on the server .
Analyze the MySQL Ping Issue
LVS server packet capture
Because the data packet does not change its ip.id during the transmission of the intermediate router switch , by locating frame 69774 in the client packet capture file ip.id == 0x1a5b
, quickly filtering and locating in the LVS server packet capture file, the following problems were checked
The Mysql Ping in the frame 69774 sent by the client , when received by the LVS server, the command unexpectedly becomes SLEEP , and the data packet is marked as Malformed Packet . A detailed comparison is performed on this frame.
Client
LVS Server
It can be clearly seen that the MySQL protocol Command in the same data frame was changed from Ping (hexadecimal value 0e) to SLEEP (hexadecimal value 00). Because of this change, the TCP checksum failed and was considered an error packet. However, since the TCP checksum is an end-to-end checksum, the LVS server still forwarded the frame normally. After that, the MySQL server 10.1.1.1 received it and identified it as an unknown command, returning an ERR message, which eventually caused the connection to be interrupted.
Follow-up
After a long period of observation, it was found that this problem occurred occasionally. The Mysql Ping command was disordered, fixedly changing from 0e to 00, rather than changing randomly, causing Mysql connection interruptions from time to time.
By checking the end-to-end network equipment, and when the rest of the business communications are normal, we have preliminarily ruled out CRC problems on the physical ports or lines of the network equipment, because it is difficult to achieve fixed bit errors.
What’s more regrettable is that, as mentioned at the beginning, the network has not yet started the work of segmented packet capture and positioning. That is, the problem disappeared as the environment changed, and the problem was never solved. The final guess is that it is a special bit reversal problem of a certain network device in the middle.
Summarize
Another unsolved mystery…