SLAC Internal Network Connectivity

Les Cottrell and Gary Buhrmaster

Reported Problem

On 6/27/07 Alexy Lyapin reported a problem using the SLAV visitor subnet:

 Hello,

 Just wanted to report some problems with the visitor wireless network 
 in the Counting House A, which is building 60.
 So far the problem has been observed on 2 systems, mine, which is 
 running windows, and an Apple laptop. The problem is that sometimes 
 the connection gets extremely slow (SLAC homepage takes ~5 mins to download).
 Interestingly, the problem was not observed on both systems at the 
 same time, but one had a good connection while another one could not 
 transfer data. Signal strength stays high and wireless reports at 
 least 12 MBit/s connection speed.
 That looks like switching problems to me, though i am no expert.
 Trying to ping to different sites on my machine gives:
 www.slac.stanford.edu - 2 ms
 www.google.com - 54 ms
 www.ucl.ac.uk - timed out

First Look

The first two RTTs look fine, what woiuld have been more useful was the min/avg/max RTTs and losses from many pings. www.ucl.ac.uk blocks pings even though the web site is accessible. Suggestions were made to try other istes in the U.K., e.g. www.dl.ac.uk which do not block pings. We also needed to know when the occurence happened, since there we had a site web cam of the LCLS tunnel breakthrough wity over 100 users and wanted to be sure this sis not impact anything. There were also a couple of compromised hosts at SSRL that were emitting hundreds of kilobytes, though we doubted this would affect the End Station wireless/visitor network connectivity on site.

We then heard from the user that it was taking hime 10 minutes to download the SLAC home page. Typically this shoud take a few seconds, so something was dramatically wrong.

We requested more information from the user: what O/S, when did it occur, what is the host name and address (we needed this among other things to see whether the switch he was attached to was working properly). We suggested trying the NDT server at Stanford (http://netspeed.stanford.edu/) since the visitor subnet cannot access the SLAC NDT server at http://nettest5.slac.stanford.edu/ due to security concerns WE also suggested that when gthe problem occurs to alos run pings from his host to the server with which he is having slow response.

The user reported that on Sunday July 1st 2007 around 14:30 he again had extremely slow connections on two systems in building 60 (End Station A). The systems were:

  1. mac laptop dhcp ip: 198.129.218.91 (dhcpvisitor21891)
  2. windows laptop dhcp ip: 198.129.218.34
Both were connected to the router: 198.129.216.1.

Resolution

Both these machines were running SKYPE (or some other P2P software), and each turned into a SKYPE supernode. That means they were routing traffic for a significant portion of the Internet. Such hosts are placed in a "penalty box" which reduces teir available network bandwidth. We requested the user to shutdown SKYPE (completely shutdown, not just click close). We believed that means he has to select "exit" from the tray icon. Even after the application is shutdown, it will take some time for other systems to stop attempting to the host as their SKYPE supernode.

To avoid becoming a supernode, and having your network impacted, you may find the following suggestions from FNAL helpful (as always, your mileage will vary, and FNAL is responsible for the information). https://netweb.fnal.gov/skype/skype.htm.