Comparison of Surveyor and PingER

Authors: Les Cottrell and Warren Matthews. Created: June 5; last updated on July 9, 1999

IEPM | Tutorial | PingER Help | PingER Tools | PingER Summary Reports| PingER Detail Reports

Page Contents

  • Introduction
  • Comparing Surveyor with Ping
  • Comparing Surveyor with PingER
  • Summary

  • Introduction

    Both of these tools make end-to-end active performance measurements of the Internet. As can be seen below, they should be regarded as complementary to one another, Surveyor being more detailed and PingER being more lightweight.

    The Surveyor (and RIPE) monitoring project relies on a dedicated PC running Unix to be placed at each monitoring site. Each PC in turn relies on a Global Positioning System (GPS) device to obtain accurate time and to synchronize time between each of the monitors. The monitors send packets at Poisson randomized time intervals to each other and use these packets to gather one way end-to-end delay and loss measurements. Surveyor also makes concurrent traceroutes which provides route history information. Surveyor is more accurate and better for short term measurement, especially for sites which have good connectivity. Surveyor currently provides daily snapshots of performance. The community for Surveyor is Internet 2, though there are monitors at non Internet 2 sites, and in particular at 3 Higher Energy Physics (HEP) sites CERN, FNAL and SLAC that are also PingER monitor sites.

    PingER uses the ICMP echo facility (ping) and thus only makes round trip measurements. PingER uses an existing host with no special software installed at the monitored site and does not require a GPS system. PingER is a more light weight solution, requires less management, uses less bandwidth, requires less storage, and nothing needs to be installed at the remotely monitored sites. PingER is good for remote sites with poor connectivity. PingER, today, has more reports available for showing long term trends. The community of interest for PingER is ESnet, High Energy and Nuclear Physics (HENP) sites and the Cross Industry Working Team (XIWT). More general information comparing the Surveyor and PingER can be found in Comparison of some Internet Active End-to-end Performance Measurement projects.

    Comparing ping with Surveyor

    We made some high statistics (~250K samples) long term measurements with ping from SLAC to CERN from May 9 thru May 12, 1999. The pings were made using the standard ping utility with 100 data bytes (including the 8 ICMP bytes but not the IP header), were made at one second intervals and had a timeout of 20 seconds. The host (ping client) issuing the ping echo requests was an IBM RS/6000 250/80 running AIX 4.1.5. It is the same host (minos) that is used for the PingER monitoring at SLAC. The host echoing the pings (ping server) at CERN was the same host that is monitored by PingER (ping.cern.ch).

    The distribution of these pings (see the magenta squares in the chart the right or above) indicates a sharp peak (95% of the Round Trip Times (RTTs) are contained in a 9.5 msec.) centered around 220 msec. There is both a high and a low RTT tail. The figure also shows the Surveyor delay frequency distributions (green and blue triangles) for the same time period. The Surveyor distributions also show sharp peaks with a high RTT tail. The medians of the two delay distributions (113 msec. SLAC to CERN and 105 msec. CERN to SLAC) add up to roughly the RTT seen by pinger (221 msec.). Note, they are not expected to be exactly equal since the packet sizes are different. The SLAC to CERN delay distribution (the blue dots) also exhibits a low RTT tail similar to that seen in the ping distribution. During this period Surveyor observed packet losses of 0.71% from CERN to SLAC, 0.68% from SLAC to CERN and the pings observed 1.04% for the round trip.


    We then investigated the causes of the low and high RTT tails. The time distribution of pings with a high RTT (> 260 msec.) is shown below. For Tuesday May 11th, several clusters of high RTT are apparent. The cluster aroung 18:00 hours UTC is seen below. A route change occured (seen both from SLAC to CERN and in the reverse direction) at about 18:10 hours UTC causing traffic to take a shorter but more congested route (note the increase in lost packets). See Ping high statistics results for more details. As can be seen this change in RTT performance is also evident in the Surveyor reports for the same period. Comparing the Surveyor graph with the ping graph above it is also evident that the ping clusters at about 01:00, 07:00 and 14:00 hours also show up in the Surveyor data.

    We also binned the Surveyor and ping data into 1 minute bins with the contents of each time bin being the average Surveyor one-way delay or ping RTT for that minute. We also added together the Surveyor one-way delays from each direction for each minute to create a Surveyor round trip delay for each minute. This data is shown in the chart below or to the left. The magenta and black dots (the bottom and next to bottom sets of points) show the Surveyor one-way delays, the green dots show the Surveyor round trip delays, and the blue dots (the top set of dots) show the ping RTT. Note that the left hand y axis is for the SLAC to CERN Surveyor delay and the Surveyor round trip delay, and the right hand y axis is for the Surveyor to SLAC delay and the ping RTT. The use of 2 separate y axes enables us to display the points so they do not overlap and hide one another. Careful examination of this chart reveals that the green and blue dots track one another very well reproducing all the peaks and flat periods.
    Scatter plotting Surveyor round trip delays for each minute vs the ping RTT for the same minute yields the chart below or to the right. It should be noted that the timestamps of the pings were adjusted (see below) to the nearest minute to account for the lack of an accurate record of time correlation between the clocks of the hosts making the measurements (i.e. between surveyor and minos) at the time the measurements were made. It is seen that the points roughly follow a straight line with an R2 of 0.918 indicating a strong correlation between the two sets of measurements. Part of the reason for the slope not being one may be due to the difference in packet sizes used by Surveyor and these pings.

    We optimized the adjustment of the ping timestamps mentioned above, by varying the adjustment from -60 seconds to +60 seconds and calculating the correlation coefficient R between the timestamp adjusted ping RTTs and the Surveyor round-trip delays. The results are shown to the left or below. It is seen that there is a sharp peak at an adjustement of +2 minutes with a width (IQR) of about 5.5 minutes. By the time the adjustment is off by 30 minutes or more in either direction, it is seen that the correlation has disappeared.

    Comparing PingER and Surveyor

    To enable us to compare the Surveyor data with the PingER data, Matt Zerkauskas of the Surveyor project kindly made available to us Surveyor data for the six pairs between CERN, FNAL and SLAC from November 1998 thru May 1999. We aggregated the Surveyor data to match the time "ticks" used in PingER (hourly, daily, monthly). Then we reformatted the into PingER format and made it available via the PingER tools. We then exported the data from PingER to Excel and added the delays and losses from site a to site b and b to a to create an RTT between a and b (see Tutorial on Internet Monitoring and PingER at SLAC for how to combine the one way results to come up with the round trip results.)

    Long term - monthly

    To compare the long term data (i.e. one point per month) we scatter plotted the monthly Surveyor round trip delays (derived as described above) against the monthly PingER round trip delays for the 3 sites to yield the plot below. The line is a linear least square fit to a straight line with the parameters show, and the correlation coefficient R2 (see Microsoft Excel User's Guide, Microsoft Corporation, for how the correlation coefficient is defined) indicates that there is a strong correlation between the two sets of data.

    Medium term - daily

    We scatter plotted the daily Surveyor data versus the daily PingER data to yield the scatter plot below. The straight line fit probably has a slope of < 1 since the Surveyor packets are shorter by about a factor of two compared to the PingER packets. Again the R2 indicates a strong correlation.

    Short term - hourly

    Finally we repeated the above for hourly ticks to yield the results below for a PingER monitor at SLAC monitoring 2 hosts at CERN and vice versa.

    Summary

    Surveyor has more detailed measurements of the performance both in the frequency at which the measurements are made and also in the fact that it has one way measurements. Surveyor relies on dedicated platforms with strong central management.

    Pinger is more parsimonious with resources (bandwidth, disk space and cpu) and does not require a dedicated host and GPS aerial to be installed at every site. This enables it to be attractive for sites that have limited bandwdith, or are unwilling to install a dedicated host and GPS aerial. It has also turned out to be attractive to groups such as the XIWT that have limited resources to gather the data and analyze the data. Though PingER is less accurate especially at low time resolution (< an hour) it is very good for looking at long term trends and grouping of sites where limited statistcs are less of a problem.

    The strong correlation, both visually and statistically, between the Surveyor and PingER data for RTT and the Surveyor and ping RTT data (on which PingER is built) indicate that the results from both projects can be used together in complementary ways.


    [ Feedback ]