Background Introduction
A line of code was introduced in a configuration class of a SpringBoot 2.2.0 project:
InetAddress.getLocalHost().getHostAddress()
This caused the project startup to noticeably slow down and also triggered related warning messages:
2022-10-03 23:32:01.806 [TID: N/A] WARN [main] o.s.b.StartupInfoLogger – InetAddress.getLocalHost().getHostName() took 5007 milliseconds to respond. Please verify your network configuration (macOS machines may need to add entries to /etc/hosts).
According to the warning message, if the time taken to fetch host information exceeds the threshold HOST_NAME_RESOLVE_THRESHOLD=200ms, a notification will be displayed. Clearly, our time exceeded 5 seconds. Moreover, on a Mac system, a friendly nudge is given to configure local DNS in the /etc/hosts file.
127.0.0.1 localhost255.255.255.255 broadcasthost::1 localhost
Following suggestions from various online articles, we appended the hostname to it, resulting in:
127.0.0.1 localhost xiaoxi666s-MacBook-Pro.local255.255.255.255 broadcasthost::1 localhost
Here, xiaoxi666s-MacBook-Pro.local is my hostname.
Note: After changing the hosts file content, use the command
sudo killall -HUP mDNSResponder
to refresh DNS without rebooting the computer.
Restarting the SpringBoot program, we observed that the warning message disappeared, indicating that the time taken to fetch host information no longer exceeded 200ms.
Now, the question arises, what mechanism is at work behind the scenes? Let’s explore further.
Using Wireshark for Packet Capture
Since we are retrieving our own host information, we go through the local loopback network and select the Loopback network interface:
>
First, revert the changes in the hosts file, and capture network packets before the alteration:
Based on the timeline, the captured network packets can be divided into three segments, each comprising requests for both IPv4 and IPv6 addresses.
The protocol used is mdns, or multicast DNS, primarily enabling LAN hosts to discover and communicate with each other without a traditional DNS server, utilizing port 5353 and adhering to the DNS protocol. Reviewing any request verifies this:
Moreover, the target IP 224.0.0.251 in the network packet is the official mdns query address of Mac. Details can be found at https://github.com/apple-oss-distributions/mDNSResponder/tree/mDNSResponder-1096.100.3
Multiple tests revealed that host information is returned after the third packet (blocking at InetAddress.getLocalHost() method. As seen in the figure below, blocked at line 18 and after 5 seconds, it jumps to line 19). From the timeline above, results return around 8 seconds, consistent with the earlier mentioned 5007ms. Further inspection of the network packets indicates three consecutive requests: first sent at 3.1s, the second at 4.1s, and the third at 7.1s with retry intervals of 1s and 3s, suggesting an exponential backoff retry mechanism. However, results returned around 8 seconds correspond to the first request, with the remaining being ignored.
Next, inspect the network packets after adding host information to the hosts file:
Oops, this time no related network packets were captured, suggesting a direct reading of the hosts file acquired the hostname, skipping network requests.
So, how does this host information retrieval program operate, and where does the time go if the hostname is not added to the hosts file?
Examining the Source Code
The source code is relatively easy to locate, as shown below:
We remove the hostname from the hosts file again and use the Arthas tool’s trace
command to examine link timing:
Tip: If the No class or method is affected error occurs during packet capture, refer to the respective log file for troubleshooting, as shown below:
It suggests increasing permissions—run the command options unsafe true
—then attempt to use the trace
command again.
But interestingly enough, the invocation chain couldn’t be captured? Then, let’s try generating a flame graph using the Arthas profiler
command:
Many compilation-related aspects are visible, but let’s amplify only the part responsible for retrieving host information:
Oh dear, most of the time is consumed on the line InetAddress.getAddressesFromNameService:
Tracing further down reveals that most time is spent on nameService.lookupAllHostAddr:
This eventually lands on a native method:
Thus, let’s investigate jdk source code (I’m using jdk8):
Next, we need to find the implementation of getaddrinfo. As the precise implementation source is unknown, we look up Linux system source online as a reference, found here: https://codebrowser.dev/glibc/glibc/sysdeps/posix/getaddrinfo.c.html#getaddrinfo
The intricate internal implementations are mostly system interactions, so a brief scan suffices. Additionally, no calling chain from the flame graph was found in the getaddrinfo source, and further pursuit is halted.
Presently, the fact that the getaddrinfo method is invoked is known, so a C program is written to replicate the situation:
#include<sys/time.h>#include #include #include <sys/types.h>#include <sys/socket.h>#include #include <netinet/in.h>#include <arpa/inet.h>using namespace std;int main(){ char* hostname = "xiaoxi666s-MacBook-Pro.local"; addrinfo hints, *res; in_addr addr; int err; struct timeval start, end; gettimeofday(&start, NULL); memset(&hints, 0, sizeof(addrinfo)); hints.ai_socktype = SOCK_STREAM; hints.ai_family = AF_INET; if((err = getaddrinfo(hostname, NULL, &hints, &res)) != 0){ // Print elapsed time (exception case) gettimeofday(&end, NULL); printf("times=%d\n", end.tv_usec - start.tv_usec); printf("error %d : %s\n", err, gai_strerror(err)); return 1; } // Print elapsed time (normal case) gettimeofday(&end, NULL); printf("times=%d\n", end.tv_usec - start.tv_usec); addr.s_addr = ((sockaddr_in*)(res->ai_addr))->sin_addr.s_addr; printf("ip addresss: %s\n", inet_ntoa(addr)); freeaddrinfo(res); return 0;}
The hostname here is xiaoxi666s-MacBook-Pro.local, which we hardcoded in the Java project while debugging.
Run the program to compare the output when the hosts file does or does not include the hostname:
# Without hostname in the hosts filetimes=6431error 8 : nodename nor servname provided, or not known
# With hostname in the hosts filetimes=1789ip addresss: 127.0.0.1
It is evident that without the hostname in the hosts file, no corresponding network address can be found (since DNS also fails to resolve it), but after adding it, the corresponding IP 127.0.0.1 is returned.
There are a few points to note:
- Even if the hostname is added to the hosts file, the standard Linux getaddrinfo method execution incurs nearly two seconds, whereas, in the Java code, it runs in tens of milliseconds.
- In earlier Wireshark packet captures, a retry mechanism for mdns queries was mentioned, but standard Linux getaddrinfo lacks such a mechanism.
- The previously mentioned 5-second response is actually a timeout rather than a result. The standard Linux getaddrinfo does not contain such timeout control.
Thus, it can be speculated that the macOS system has modified standard Linux code, adding local caching, retry, and timeout mechanisms.
Continuing from point 3 above, return to debugging the Java project to examine why a result is returned even after a timeout.
If the hostname is not added to the hosts file, all of the machine’s IP addresses are returned:
When the hostname is added to the hosts file, only the configured IP address 127.0.0.1 is returned:
If the hostname is not added to the hosts file, the getaddrinfo call returns an error code, leading to the invocation of the jdk’s lookupIfLocalhost method, which internally calls the OS’s getifaddrs method to retrieve all IP addresses on the machine:
The corresponding source code is available at https://codebrowser.dev/glibc/glibc/sysdeps/unix/sysv/linux/ifaddrs.c.html.
Summary
This article uses the scenario of slow host name retrieval in Java to explore the underlying principles with various technical approaches, including using Wireshark for packet capture, employing the Arthas tool to pinpoint performance bottlenecks, and examining corresponding native method implementations in the jdk. Due to missing source code for the lowest-level invocation chain, analogous Linux standard source code is referenced, replicating the aforementioned scenario.
As no source code was located for the lowest-level invocation chain, mechanisms such as local caching, retrying, and timeout described above remain unverified. Interested readers are invited to further explore.