SLAC logo

Loss of Connectivity to BINP in Novosibirek Russia Network logo

Les Cottrell. Page created: May 23 2003.

Central Computer Access | Computer Networking | Network Group | ICFA-NTF Monitoring
SLAC Welcome
Highlighted Home
Detailed Home
Search
Phonebook

Introduction

This case shows how a router misconfiguration caused loss of connectivity between SLAC and BINP in Novosibirsk.

Problem report

Serge Belov of BINP reported the following by email on May 22nd, 2003.
As we can see in our logs, since May 21st, 02:16 local time 
(that is GMT+7) we've lost the ESNet connectivity. 
Few traces in both directions:

-- this is traceroute from noric01.slac:
[noric01] ~ > traceroute sky.inp.nsk.su
traceroute to sky.inp.nsk.su (193.124.167.84), 30 hops max, 38 byte packets  
1  rtr-farmcore-farm0 (134.79.87.9)  0.256 ms  0.181 ms  0.158 ms  
2  rtr-dmz1-ger (134.79.135.15)  0.230 ms  0.197 ms  0.200 ms  
3  slac-rt4.es.net (192.68.191.146)  0.291 ms  0.236 ms  0.266 ms  
4  snv-pos-slac.es.net (134.55.209.1)  0.635 ms  0.617 ms  0.673 ms  
5  chicr1-oc192-snvcr1.es.net (134.55.209.54)  48.883 ms  48.826 ms  48.871 ms  6  aoacr1-oc192-chicr1.es.net (134.55.209.58)  68.883 ms  68.833 ms  68.873 ms  7  aoapr1-ge0-aoacr1.es.net (134.55.209.110)  68.981 ms  68.937 ms  68.875 ms  8  * * *  9  * * *

-- this is a traceroute from www.slac
traceroute to CSD-CC.inp.nsk.su (193.124.167.209): 3-30 hops, 38 byte packets  
3  192.68.191.146 (192.68.191.146)  0.610 ms (ttl=252!)  
4  snv-pos-slac.es.net (134.55.209.1)  0.843 ms (ttl=251!)  
5  chicr1-oc192-snvcr1.es.net (134.55.209.54)  49.0 ms (ttl=250!)  
6  aoacr1-oc192-chicr1.es.net (134.55.209.58)  69.1 ms (ttl=249!)  
7  aoapr1-ge0-aoacr1.es.net (134.55.209.110)  69.1 ms (ttl=248!)  
8  *  9  *

-- this is a trace in other direction, from BINP:
mx:belov {111} traceroute ping.slac.stanford.edu
traceroute to ns-ext2.slac.stanford.edu (134.79.18.21), 64 hops max, 40 byte packets  
1  cisco-1 (193.124.167.254)  0.346 ms  0.339 ms  0.394 ms  
2  Rtc-gw (193.124.167.5)  0.587 ms  0.518 ms  0.877 ms  
3  192.153.114.137 (192.153.114.137)  108.356 ms  106.759 ms  106.622 ms  
4  130.87.43.2 (130.87.43.2)  107.282 ms  106.805 ms  106.494 ms  
5  keksw1-ns.kek.jp (130.87.4.34)  106.589 ms  106.791 ms  106.533 ms  
6  kekgw.kek.jp (130.87.4.1)  106.974 ms  106.839 ms  107.217 ms  
7  KEK-P6-0.sinet.ad.jp (150.99.197.125)  385.765 ms  419.214 ms  405.685 ms  
8  JT-tokyo-S1-P10-0.sinet.ad.jp (150.99.197.33)  373.749 ms  394.104 ms  403.642 ms  
9  nii-S1-P4-0.sinet.ad.jp (150.99.197.22)  376.77 ms  379.923 ms  501.592 ms 
10  nii-gate2-P2-0.sinet.ad.jp (150.99.199.174)  392.255 ms  395.85 ms  373.816 ms 
11  nii-gate3-P0-0.sinet.ad.jp (150.99.199.178)  369.830 ms  360.757 ms  397.259 ms 
12  * *^C

At the same time there still is connectivity between SLAC and KEK: 
[noric01] ~ > ping www.kek.jp PING ccwww.kek.jp (130.87.104.100) from 134.79.86.51 : 56(84) bytes of data. 
64 bytes from ccwww.kek.jp (130.87.104.100): icmp_seq=1 ttl=240 time=262 ms 
64 bytes from ccwww.kek.jp (130.87.104.100): icmp_seq=2 ttl=240 time=262 ms

bsunsrv1[52]% traceroute www.slac.stanford.edu
traceroute to www4.slac.stanford.edu (134.79.18.136), 30 hops max, 40 byte packets  
1  130.87.224.201 (130.87.224.201)  5 ms  1 ms  1 ms  
2  ns1ka.kek.jp (130.87.5.10)  2 ms  2 ms  2 ms  
3  keksw1-ns.kek.jp (130.87.4.34)  2 ms  2 ms  2 ms  
4  kekgw.kek.jp (130.87.4.1)  2 ms  2 ms  2 ms  
5  KEK-P6-0.sinet.ad.jp (150.99.197.125)  2 ms  2 ms  2 ms  
6  JT-tokyo-S1-P10-0.sinet.ad.jp (150.99.197.33)  3 ms  3 ms  3 ms  
7  nii-S1-P4-0.sinet.ad.jp (150.99.197.22)  43 ms  169 ms  480 ms  
8  nii-gate2-P2-0.sinet.ad.jp (150.99.199.174)  4 ms  4 ms  4 ms  
9  nii-gate3-P1-0.sinet.ad.jp (150.99.199.182)  195 ms  195 ms  195 ms 
10  aoa-sinet.es.net (198.124.216.121)  195 ms  195 ms  195 ms 
11  aoacr1-ge0-aoapr1.es.net (134.55.209.109)  195 ms  195 ms  195 ms 
12  chicr1-oc192-aoacr1.es.net (134.55.209.57)  215 ms  216 ms  215 ms 
13  snvcr1-oc192-chicr1.es.net (134.55.209.53)  275 ms  276 ms  276 ms 
14  slac-pos-snv.es.net (134.55.209.2)  276 ms  276 ms  276 ms 
15  rtr-dmz1-vlan400.slac.stanford.edu (192.68.191.149)  264 ms  264 ms 264 ms 
16  * * * 
17  www4.slac.stanford.edu (134.79.18.136)  334 ms *  277 ms

Should something be fixed?

Resolution The following email from Joe Metzger of ESnet on May 22, 2003 identifies the cause: A loose unicast RPF filter was put in place around 12:04 PDT Tuesday on the router supporting the tunnel to BINP. The problem reported appeared to be resolved when we removed those filters this morning.

However, I am suspicious that this is the actual cause for of a couple of reasons. First is the differences in reported start times. Second is that the filter was configured to log all packets that it rejected and the log doesn't appear to contain any packets releveant to the connection to sky.inp.nsk.su

Looking at the raw data PingER records for SLAC to BINP, it appears the onset of loss of connectivity was after 18:30 and before 19:00 May 20, 2003 GMT (or between 11:30a and noon PDT) whichj is in reasonable agreement with the installation of the loose unicast RPF filter.


Page owner: Les Cottrell