NMWG Working Group

Rough notes by Les Cottrell, SLAC, June 25, 2003

Chaired by Richard Hughes-Jones

There were about 26 attendees. Richard went over the agenda, followed by the working group's milestones. In particular the hierarchy document was posted on the web site in June 2003, and will be submitted to the steering group later this month. Work was started on the schema document in March 2003.

An Implementation of the Profile Document: Les Cottrell

Les described an implementation by Warren Matthews of a web service to provide access to network monitoring data using the concepts of the NMWG schema document. The service provides access to various network measurement repositories including PingER, IEPM-BW, E2Epi OWAMP, and RIPE-tt. Besides the characteristics outlined in the schema document, they have added path.availability.roundTrip. The focus is on the service rather than on defining a schema so the WSDL definition is very rudimentary.. He demonstrated a simple client implemented in Perl. They added a definition for path.availability.roundTrip based on PingER's unavailability. Some of the issues raised in the implementation included: returning multiple values such as the individual (singleton) measurements comprising a statistical aggregate value; how to address multiple tools reporting the same characteristic; what is the meaning of standard deviation for path.achievable.bandwidth when one is using multiple streams for a long period with incremental throughputs reported; whether/how to report disk-to-disk throughputs (and memory-to-disk etc); how does one handle predictions (a suggestion was to make this a separate service so as not to overload things); how to define and report hoplists and traceroute information (e.g. one of the most useful things in traceroutes is the ability to discover route changes); how to report multiple values (e.g. a separate document for each value vs. some single header followed by a list of values).

Mona LISA: Yang Xia - Caltech

Adapting Existing Tools to E2E in particular Mona LISA.

Using Java for ubiquity. Acquire GUID, OS & vsn, processor details, NIC info, memory, TCP stack, intterrupts/sec. CPU usahe, plus network information (e.g. iperf). Ping, Tracerouote both ways, bandwidth testing is done both ways to discover duplex problems. Results saved locally & on centralized database. Anal,ysis on centralized server. Want wizard knowledge in "applets".

MonaLISA ia monitoring framework, it is in use by several sites today. It acts as dynamic service to be used by other services that require such information (Jini, UDDI WSDL/SOAP). It can dynamically discover "farm units". Data is saved in an SQL database. It can integrate other measurement tools such as SNMP, LSF, Ganglia, Hawkeye, IEPM-BW etc.

EU DataGrid network monitoring: Richard Hughes-Jones

Richard reported on work by Yee Ting-Lee of UCL. Found could not always match exactly what NMWG described. Maybe partly due to looking at older version. He has created a schema. The hoplist has been embedded into the schema as the path. He strived to make all characteristics uniform without special treatment. He was unable to put in patterns, e.g. how to specify a distribution. Tried to include the ability to provide a statistical result as well as the components. He returns results as xml documents so can describe multiple results (e.g. components) as well as aggregate statistics. A question is whether to give all information back or the aggregate value. With XPATH one can specify (qualify) the query with what you really want, and it will come back in a reasonable compact format (e.g. not provide for each singleton ping the OS version information).

Lowly-intrusive Network Monitoring (GridLab): Brian Tierney

This is work by Thilo Kielman. It is a project in the EU to build a Grid Application Toolkit. Thilo working on performance monitoring, trying to use existing sensors. They want to focus on low intrusivness (so not using iperf, rather NWS sensor using 64KByte stream, traceroute, ping). They are also focusing on active monitoring. Do not have a web services access.

General discussion

Need to discuss hop-list. Also go through schema document to see what needs fixing. Need to have a nearness concept (i.e. you ask a service for performance between 2 hosts, and it gives the closest it can get given where the monitoring hosts are located, and how good a match it is likely to be). Another issue is how to represent disk2disk vs. memory. Note for Grid replication services, disk to disk is critical. There are tools to make such measurements (e.g. bbftp, GridFTP). Should it be part of the Network Monitoring WG or some other WG? Could just "Say no to Disk-to-disk" or do it right, or just provide a minor tweak to the schema to incorporate disks.