"dl2.gif
Network connectivity problems reporting via BWEJiri Navratil. Page created: September 21 2002.Central Computer Access | Computer Networking | Network Group | ICFA-NTF Monitoring |
|
Normally, the avarage of ABW between SLAC and Daresbury lab was on the level 65 Mbps. When KPNQest stopped its activity the routing in GEANT was automaticaly converted into backup (commercial ISP: ALTER NET) but backup links had no capacity as original path and in that moment our ABW shows the drop into level of 10 Mbps. Later, during afternoon and next night we have seen several other changes, which was not so dramatic as previous situation. However, it was also interesting to see that our monitoring system has capability distinquish also very small changes as "flipping interfaces" or "level-balancing" as second case looks like.
The original path ("normal situation") from SLAC to Daresbury was via Es.net and Geant node in UK. Our traceroute has been taken just few minutes before changes happend. It was following:
Wed Jul 24 8:25:14 2002 (stamp) 1027524314 traceroute to rtlin1.dl.ac.uk [1] RTR-CORE1A.SLAC.Stanford.EDU,134.79.143.2,0,0 [2] RTR-DMZ1-GER.SLAC.Stanford.EDU,134.79.135.15,0,0 [3] 192.68.191.146,192.68.191.146,0,0 [4] snv-pos-slac.es.net,134.55.209.1,0,14 [5] chi-s-snv.es.net,134.55.205.102,0,59 [6] nyc-s-chi.es.net,134.55.205.105,0,73 [7] 62.40.126.5,62.40.126.5,0,67 [8] 62.40.126.14,62.40.126.14,0,138 [9] janet-gw.uk1.uk.geant.net,62.40.103.150,0,149 [10] 146.97.37.81,146.97.37.81,0,137 [11] po6-0.read-scr.ja.net,146.97.35.133,0,138 [12] po3-0.warr-scr.ja.net,146.97.33.54,0,142 [13] po0-0.manchester-bar.ja.net,146.97.35.46,0,143 [14] 146.97.40.178,146.97.40.178,0,143 [15] 194.66.25.30,194.66.25.30,0,144 [16] gw-fw.dl.ac.uk,193.63.74.233,0,143 [17] rtlin1.dl.ac.uk,193.62.119.20,0,144 [18] rtlin1.dl.ac.uk,193.62.119.20,0,145After changes the path was following:
Wed Jul 24 8:45:13 2002 (stamp 1027525513)i traceroute to rtlin1.dl.ac.uk [1] RTR-CORE1A.SLAC.Stanford.EDU,134.79.143.2,0,0 [2] RTR-DMZ1-GER.SLAC.Stanford.EDU,134.79.135.15,0,0 [3] 192.68.191.146,192.68.191.146,0,0 [4] snv-pos-slac.es.net,134.55.209.1,0,22 [5] orn-s-snv.es.net,134.55.205.121,0,65 [6] dchub-orn.es.net,134.55.209.18,0,85 [7] 198.124.192.21,198.124.192.21,100,85 [8] 0.so-3-1-0.XL1.DCA6.ALTER.NET,152.63.38.118,0,130 [9] 0.so-0-0-0.TL1.DCA6.ALTER.NET,152.63.38.69,0,130 [10] 0.so-7-0-0.IL1.DCA6.ALTER.NET,152.63.9.193,0,106 [11] so-0-0-0.IR1.DCA4.Alter.Net,146.188.13.34,0,202 [12] so-6-1-0.TR2.LND9.Alter.Net,146.188.4.82,0,257 [13] so-6-0-0.XR1.LND9.Alter.Net,146.188.15.42,0,291 [14] pos1-0.gw1.lnd9.alter.net,158.43.150.142,0,257 [15] ukerna-gw.pipex.net,158.43.37.202,0,256 [16] po15-0.lond-scr.ja.net,146.97.35.137,0,256 [17] po4-0.read-scr.ja.net,146.97.33.74,0,244 [18] po3-0.warr-scr.ja.net,146.97.33.54,0,258 [19] po0-0.manchester-bar.ja.net,146.97.35.46,0,243
Picture shows situation when path to UK has changed characteristics dramatically (violet in the picture). The log shows more detail data:
timestamp abw ... 1027524541 62.280 1027524631 63.158 1027524721 62.490 1027524810 109.474 1027524900 61.962 1027524990 6.344 1027525080 10.493 1027525170 10.770 1027525259 10.740 1027525349 10.639 ...
The second case shows the not very stabile situation during afternoon and during night. Probably as a conseqences of "work" of networking people, who was trying to fix problem. The one segment in the path (router LND9.Alter.Net) used probably two different interfaces for connection with neighbours (in our traceroutes once reported as 146.188.13.34 and once as so-6-1-0.TR2.LND9.Alter.Net)
path1 .... [8] 152.63.38.118,152.63.38.118,0,86 [9] 0.so-0-0-0.TL1.DCA6.ALTER.NET,152.63.38.69,0,80 [10] 0.so-7-0-0.IL1.DCA6.ALTER.NET,152.63.9.193,0,82 [11] 146.188.13.34,146.188.13.34,0,87 [12] so-6-1-0.TR2.LND9.Alter.Net,146.188.4.82,0,169 [13] so-6-0-0.XR1.LND9.Alter.Net,146.188.15.42,0,165 [14] pos1-0.gw1.lnd9.alter.net,158.43.150.142,0,168 ... or path2 ... [8] 152.63.38.118,152.63.38.118,0,80 [9] 0.so-0-0-0.TL1.DCA6.ALTER.NET,152.63.38.69,0,85 [10] 0.so-7-0-0.IL1.DCA6.ALTER.NET,152.63.9.193,0,81 [11] 146.188.13.34,146.188.13.34,0,90 [12] so-6-1-0.TR2.LND9.Alter.Net,146.188.4.82,0,166 [13] 146.188.15.42,146.188.15.42,0,258 [14] pos1-0.gw1.lnd9.alter.net,158.43.150.142,0,275 In reality, for us it means that ABW alternated on the level 6, 10 or 15 Mbps for quite easily visible time period. timestamp abw ... 1027582547 10.752 1027582640 10.541 1027582731 10.778 1027582824 10.493 1027583006 6.293 1027583097 6.000 1027583188 6.482 1027583279 6.370 ... The picture show this situation between 15.00 - 16.00.
Thei whole situation has been stabilized Thu Jul 25 11:00:30 US/Pacific 2002 ... 1027615543 10.690 8 25 10.726 10.991 rtlin1 1027615635 10.558 7 25 10.717 10.984 rtlin1 1027615727 10.535 10 25 10.707 10.977 rtlin1 1027615838 10.592 10 25 10.701 10.971 rtlin1 1027615930 10.561 11 25 10.694 10.964 rtlin1 1027620030 65.561 13 25 27.154 11.859 rtlin1 1027620123 61.307 16 25 37.400 12.670 rtlin1 1027620215 62.866 16 25 45.040 13.493 rtlin1 1027620308 64.923 14 25 51.005 14.336 rtlin1 1027620400 62.293 17 25 54.391 15.122 rtlin1 1027620491 60.719 18 25 56.289 15.869 rtlin1 New Routing was setup in frame of Geant network: [1] RTR-CORE1A.SLAC.Stanford.EDU,134.79.143.2,0,0 [2] RTR-DMZ1-GER.SLAC.Stanford.EDU,134.79.135.15,0,0 [3] 192.68.191.146,192.68.191.146,0,0 [4] snv-pos-slac.es.net,134.55.209.1,0,11 [5] chi-s-snv.es.net,134.55.205.102,0,60 [6] nyc-s-chi.es.net,134.55.205.105,0,88 [7] abilene-nyc.es.net,198.124.216.106,0,66 [8] abilene-gtren.de2.de.geant.net,62.40.103.253,0,155 [9] de2-1.de1.de.geant.net,62.40.96.129,0,146 [10] de.fr1.fr.geant.net,62.40.96.50,0,157 [11] fr.uk1.uk.geant.net,62.40.96.90,0,162 [12] janet-gw.uk1.uk.geant.net,62.40.103.150,0,160 [13] 146.97.37.81,146.97.37.81,0,162 [14] po6-0.read-scr.ja.net,146.97.35.133,0,164 [15] po3-0.warr-scr.ja.net,146.97.33.54,0,167 [16] po0-0.manchester-bar.ja.net,146.97.35.46,0,167 [17] 146.97.40.178,146.97.40.178,0,168 [18] 194.66.25.30,194.66.25.30,0,169 [19] gw-fw.dl.ac.uk,193.63.74.233,0,169 [20] rtlin1.dl.ac.uk,193.62.119.20,0,170 [21] rtlin1.dl.ac.uk,193.62.119.20,0,170 The path is much longer (see [9,10,11]) but since this time there is again stable BW at about 61-64 Mbps.
The real data from the monitori show following value:
timestamp abw (mpbs) 1032302992 211.321 1032303107 379.661 1032303223 342.857 1032303339 23.133 1032303455 24.050 1032303571 22.958 ... 1032308205 23.688 1032308321 23.440 1032308437 274.286 1032308553 231.818 1032308668 194.595 1032308784 248.780 ... During problematic time was following traceroute: traceroute to wiggum.mcs.anl.gov (140.221.11.99), 30 hops max, 38 byte packets 1 rtr-core1-pub6 (134.79.27.2) 52.475 ms 159.393 ms 144.713 ms 2 rtr-dmz1-ger (134.79.135.15) 126.598 ms 128.322 ms 66.988 ms 3 slac-rt4.es.net (192.68.191.146) 63.108 ms 133.659 ms 149.581 ms 4 snv-pos-slac.es.net (134.55.209.1) 171.181 ms 140.351 ms 58.375 ms 5 chi-s-snv.es.net (134.55.205.102) 134.145 ms 167.378 ms 172.571 ms 6 198.125.140.162 (198.125.140.162) 113.928 ms 215.703 ms 269.653 ms 7 140.221.20.124 (140.221.20.124) 188.569 ms 80.969 ms 59.247 ms 8 wiggum.mcs.anl.gov (140.221.11.99) 50.143 ms 54.176 ms 134.031 ms After changes finished and the normal traffic has been seen, the traceroute has returned to original: traceroute to wiggum.mcs.anl.gov (140.221.11.99), 30 hops max, 38 byte packets 1 rtr-core1-pub6 (134.79.27.2) 79.794 ms 42.669 ms 47.118 ms 2 rtr-dmz1-ger (134.79.135.15) 35.323 ms 0.421 ms 0.399 ms 3 slac-rt4.es.net (192.68.191.146) 0.499 ms 0.406 ms 0.398 ms 4 snv-pos-slac.es.net (134.55.209.1) 0.785 ms 0.762 ms 0.759 ms 5 chi-s-snv.es.net (134.55.205.102) 48.599 ms 48.587 ms 50.995 ms 6 anl-chi.es.net (134.55.208.42) 62.907 ms 65.672 ms 200.640 ms 7 kiwi-esnet.anchor.anl.gov (192.5.170.77) 131.487 ms 168.004 ms 139.712 ms 8 stardust-guava.anchor.anl.gov (130.202.222.73) 110.705 ms 209.990 ms 146.158 ms 9 wiggum.mcs.anl.gov (140.221.11.99) 158.669 ms 96.055 ms 127.274 msSo changes has been only between: chi-s-snv.es.net and destination node.