1. Symptoms
A certain securities company sought assistance to identify the source of errors. In recent days, the stock market had been booming, and the company added numerous new users. However, within a week, they encountered trading data errors on three occasions, which required data recovery three times. Although the financial impact was not significant, the data discrepancies didn’t match the information provided by the stock exchange. Upon comparing historical records and daily trading records the previous night, it was noticed that trading data errors often occurred simultaneously for multiple users. Suspicions of a virus or malicious user tampering arose, leading to multiple virus scans, system reinstalls, and data recovery from backups. However, the issues reoccurred today.
2. Diagnostic Process
The network was expanded and upgraded in February 1999, transitioning to an all-NT platform. They recently added around 50 new sites. Following common practice, a routine check of the new workstations and their networking systems was carried out. As the stock market had already closed, online errors couldn’t be observed. Simulated online traffic using a traffic generator revealed the following results: for normal data frames, a minimal bandwidth that would paralyze the network was 99%, while the maximum frame length of 1518 bytes had a 99.5% paralyzing bandwidth. For error frames of 50-byte short frames, the paralyzing bandwidth was 90%, and for error frames of 4000-byte long frames, it was 97%. The collision rate was slightly elevated at 6.4%, but no new error types were identified. Testing from the switches showed only a few delayed data packets. These results suggested a robust and well-functioning securities network.
Further investigation pinpointed the issue to a specific group of workstations belonging to a single new user, all connected to a hub group. This network segment was linked to a server through a single switch interface. Besides conducting performance checks separately for the trading and quote servers, the workstations on this network segment were tested, all performing normally. Simulation of network traffic and trading from these workstations also exhibited no issues. This indicated that the network could handle the load efficiently.
The team suspected the presence of a “malicious user” (note: malicious users install their own software/hardware on workstations or insert their laptop’s network card while purposefully connecting to the network without authorization), leading to network disturbances. To trace the errors, an F683 network tester was attached to monitor this network segment over an extended period. No issues occurred the next day. On the third day, after 10 minutes, at 13:10, a substantial number of errors were detected by the network tester on the same network segment, including 15% FCS frame errors and 85% ghost interference that lasted around one minute. These errors were associated with three users in the securities system.
The securities system was equipped with a closed-circuit television (CCTV) surveillance system, and long-term recordings revealed that at the exact moment of the errors at 13:10, a user was using a mobile device. Upon closer examination of the video footage, it was revealed that the device in use was a walkie-talkie.
As walkie-talkies emit significantly more radiation power than mobile phones and operate at frequencies closer to the baseband transmission of networks, they can cause close-range radiation interference. The network cables or grounding system appeared to have issues as they allowed this interference. To resolve the issue, the team decided to check the cabling system of the 50 workstations on the affected network segment.
Using Fluke’s DSP2000 cable tester, they carried out tests, all of which passed. However, issues were identified with the connectors at the hub and switch ports. The connectors were poorly constructed, with 15 cm of cable insulation missing, causing the wires to spread apart and disrupting the twisted pairs. The physical position of the switch was about 1.5 meters away from the users, separated by a glass curtain wall. This made it probable that the radiated signal interfering with the system came from nearby. The cabling was redone as per the TIA568B standard requirements, and the system was properly connected.
3. Diagnostic Recommendations
It is advisable to design cabling systems in a standardized environment. Whenever there are structural changes in the system, it is essential to test the cables. Qualified UTP cabling systems have robust resistance to radiation interference. However, it is essential that cabling systems undergo rigorous testing (in practice, many cabling systems are only tested for physical connectivity, posing significant risks). Many issues often arise from seemingly inconspicuous connectors. It is recommended to include cabling system inspections in annual maintenance (or periodic inspections every one or two years). Testing standards can follow North American standards like TIA568A/568B or ISO11801, among others.
It’s best to prohibit the use of high-power walkie-talkies in business offices. Some high-power analog mobile radios should also be included in the restricted list.
In fault detection, it’s important to focus on devices that have been recently moved or modified, as experience suggests. Interestingly, when you inquire with users about whether they’ve made any changes to their settings after the fact, you often receive the response that nothing has been altered.
4. Afterword
At the agreed-upon time, we received a report from the securities company, confirming that the system has been working steadily for two weeks without any recurring issues. The user responsible for causing interference was a government employee with legitimate rights to use a walkie-talkie (specific details are not disclosed here). He used his professional privileges to engage in stock trading during his leisure time and conducted daily “routine inspections” of the stock market. He has since been advised about his actions.