Schema/Profile for Network Performance Measurements for Grids
version .07 (June 19, 2003)
This document is a first attempt to define names and properties for the most important network measurements for Grid middleware.This document is not yet complete.
This document describes a set of schemas for publishing network measurement data. It is assumed that the reader of this document is familar with the GGF NM-WG document:A Hierarchy of Network Performance Characteristics for Grid Applications and Services, which defines a classification hierarchy for network measurements that are useful for Grid applications and services. Use of the schemas described in this document should facilite the developement of interoperable Grid services.
As an example of how such network measurements could be used in a Grid environment, we use the case of a Grid file transfer service. Assume that a Grid Scheduler determines that a copy of a given file needs to be copied to site A before a job can be run. Several copies of this file are registered in a Data Grid Replica Catalogue, so there is a choice of where to copy the file from. The Grid Scheduler needs to determine the optimal method to create this new file copy, and to estimate how long this file creation will take. To make this selection the scheduler must determine what is the best source (or sources) to copy the data from. Selecting the best source to copy the data from requires a prediction of future end-to-end path characteristics between the destination and each possible source. Accurate prediction of the performance obtainable from each source requires measurement of available bandwidth (both end-to-end and hop-by-hop), latency, loss, and other characteristics important to file transfer performance.
A simple example is the following. Publication of network delay information such as is measured using ping (named path.delay.roundTrip in the NMWG "Characteristics" document) requires a great deal of infomation to be able to interpret the results. A number of test parameters must also be published, such as the tool name, the number of samples used, the protocol used, the packet size, and so on. Some of this information is mandatory, and other information is optional. The final result for a path.delay.roundTrip test may look like this.
path.delay.roundTrip
Property | Value |
source | 131.243.2.11 |
destination | 137.138.28.230 |
time | 20030521060902.893847 |
toolName | ping |
toolVersion | redhat 7.2 |
packetSize | 64 |
numPackets | 25 |
packetSpacing | periodic |
packetGap | 1.0 |
packetType | ICMP |
minimum | 275.149 |
maximum | 277.674 |
median | 275.121 |
StdDev | 0.375 |
value | 275.727 |
The classes that this result is made up from are all described in detail below.
Terminology:
Target: Defined in GGF DAMED WG naming document.
The tables in this document use the following letters to describe the requirement level: (from CIM)
M – Mandatory
O – Optional
C – Conditional (See CIM documentation for explanation)
Types: (from CIM)
string
uint16
uint16[ ] (array of 16bit ints)
uint32
uint64
real32
boolean
datetime: standard timestamp
All measurement must have a IETF RFC3339 Timestamp and a value.
We are using the naming conventions defined by the GGF DAMED working group.
Classes for the following network measurement characteristics are defined in this document.
path.delay.roundTrip
path.delay.oneWay
path.delay.jitterpath.loss.oneWay
path.reordering.oneWaypath.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream
path.bandwidth.achievable.UDP
path.bandwidth.available
path.bandwidth.utilization
path.bandwidth.capacityAll of the above measurement characteristics except for bandwidth.achievable can be for hops as well as for paths. E.G.:
hop.bandwidth.capacity
hop.bandwidth.utilizedAdditional topology characteristics will be included in a future version of this document.
All measurements require the classes NetworkTestTool, NetworkTestInfo, NetworkToolSetting, and a specific characteristic test, all described below. An example of a complete measurement result is found here.
NetworkTestTool: (subclass of CIM_SERVICE)
Property Type Requirement Level Description CIM ClassOrigin toolName string M name of tool used toolVersion string O? version of tool used toolAccuracy real32 O some indication of the accuracy of the tool (what are the units? %error? need more discussion on this)
NetworkTestInfo (subclass of CIM_StatisticalData?)
Property Type Requirement Level Description CIM ClassOrigin source string M Source IP:[port] destination string M Destination IP[port] startTime datetime O time test was started time datetime M time test was completed timeResolution real32[] O resolution of the timestamp of source and destination (in seconds) timeAccuracy real32[] O accuracy of the timestamp (src and dest) (in seconds) timeAccuracyMethod string[] O clock sync method (src and dst): e.g.: NTP, GPS, AFS, etc. Note: It is very difficult to accurately determine timeAccuracy without something like a GPS clock on the measurement host. However it is important to be able to trust the timestamps, and even though these are optional, they are strongly encouraged to be used.
Delay, Loss, Jitter, capacity, reordering, available bandwidth, and achievable.UDP measurements all require the following base class:
NetworkToolSetting: (subclass of CIM_StatisticalData ?)
Property Type Requirement Level Description CIM ClassOrigin packetSize uint16 M size of test packet CIM_StatisticalData numPackets uint16 M number of test packets packetSpacing boolean O Poisson or periodic packetGap real32 C time between test packets, in seconds (for periodic tests) packetType string M ICMP or UDP or TCP portNum uint16 O port number used for test priority uint16 O IP precedence bit set, etc. lossThreshold uint16 M the threshold used to distinguish between a large finite delay and loss For more information on the details of these properties, see the following IETF documents:
One way Delay: http://www.ietf.org/rfc/rfc2679.txt
Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt
These characteristics use the following class to describe their properties:
path.delay.roundTrip
path.delay.oneWay
NetworkPathDelayStatistics (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin percentile uint16[ ] O array of percentiles, eg: 50th percentile is the median (See RFC2679) percentileValue real32[ ] C value for above percentile, in milliseconds median real32 O median of all measurements in test (optional, but strongly encouraged)
minimum real32 O minimum of all measurements in test maximum real32 O maximum of all measurements in test StdDev real32 O standard deviation of the results value real32 M average result in milliseconds NOTE: everyone agreed that median is more useful than average. However most tools currently only report average, so average is the only mandatory value.
Note: current idea: for value, median, maximum, etc, use -1 to indicate value > loss threshold value. Need to discuss this more.
For more information on the details of these properties, see the following IETF documents:
One way Delay:http://www.ietf.org/rfc/rfc2679.txt
Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt
These characteristics use the following class to describe their properties:
path.loss.roundTrip
path.loss.oneWay
NetworkPathLossStatistics (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin Loss-Distance uint16 O number of packets since the previous loss (See RFC3357) Loss-Period uint16 O number of groups of lost packets (See RFC3357)
Noticeable-Rate uint16 O percent of packets lost where if the distance between the lost packet and the previously lost packet is no greater than the "loss constraint" (See RFC3357) Period-Total uint16 O total number of loss periods (See RFC3357) Period-Lengths uint O number of packets in a burst of loss (See RFC3357) Inter-Loss-Period-Lengths uint O number of packets between bursts of loss (See RFC3357) NumPacketsLost uint O number of packets lost out during the test value real32 M average packet loss (in percent) (See RFC2680) For more information on the details of these properties, see the following IETF documents:
One-way Loss: http://www.ietf.org/rfc/rfc2680.txt
Loss Patterns: http://www.ietf.org/rfc/rfc3357.txt
This characteristic uses the following class to describe its properties:
path.delay.jitter
Note: someone who understands this better needs to flush this out
NetworkPathJitterStatistics (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin percentile uint O eg: 50th percentile is the median (See RFC3393) peak-to-peak-ipdv uint O (See RFC3393) value real32 M number of ms For more information on the details of these properties, see the following IETF documents:
Delay Variation (Jitter): http://www.ietf.org/rfc/rfc3393.txt
This characteristic uses the following class to describe its properties:
path.reordering.oneWay
NetworkPathReorderingStatistics (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin lateTime uint16 O (see IPPM draft) gap uint16 O number of positions out of order value uint16 M result, in percent For more information on the details of these properties, see the following IETF documents:
http://www.ietf.org/internet-drafts/draft-ietf-ippm-reordering-02.txt
path.bandwidth.available
path.bandwidth.utilizationNetworkPathABWStatistics (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin Measured boolean M Measured (ie: SNMP) vs estimated measurementMethod string O eg: SNMP, packet pair, packet train, etc. (eg: URI pointer to description) confidence real32 O the tools reported accuracy of this measurement bottleneck target O which hop is the bottleneck ("tight link") value real32 M result, in Mbits/sec Note: measurementMethod will be hard to get right, as tools often use multiple methods or a combination of methods. One solution is to just put a URL to the tool web page here.
Q: should "samplingMethod" be separate from measurementMethod?
path.bandwidth.capacity
NetworkPathCapacityStatistics (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin Measured boolean M Measured (ie: SNMP) vs estimated measurementMethod string O eg: SNMP, packet pair, packet train, etc. confidence real32 O the tools reported accuracy of this measurement bottleneck target O which hop is the bottleneck ("narrow link") value real32 M result, in Mbits/sec Q: should "samplingMethod" be separate from measurementMethod? Also "bottleneckDetectionMethod?
These characteristics use the following class to describe their properties:
path.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream
Property Type Requirement Level Description CIM ClassOrigin numBytes uint32 O amount of test traffic duration real32 O how many seconds the test ran measurementMethod string O eg: 1 long test, average of shorter tests, etc. TCPBufferSize uint32 M size of TCP buffers used TCPType string O Reno, Vegas, HSTCP, ScalableTCP, etc numStreams uint16 O number of parallel streams includesDisk boolean O memory to memory or disk to disk (need pointer to disk object?) bottleneck string O indication of what is the bottleneck (network, CPU, NIC, memory, disk, etc.) value real32 M result, in Mbits/sec Achievable bandwidth is defined in the NM-WG Characteristics document.
Q: such "includesDisk" be a property, or a characteristic (ie: path.bandwidth.achievable.TCP.disk2disk)?
path.bandwidth.achievable.UDP
E2EAchievableUDPStatistics (subclass of ??)
Property Type Requirement Level Description CIM ClassOrigin numBytes uint32 O amount of test traffic duration real32 O how many seconds the test ran measurementMethod string O eg: 1 long test, average of shorter tests, etc. numStreams uint16 O number of parallel streams includesDisk boolean O memory to memory or disk to disk (need pointer to disk object?) bottleneck string O indication of what is the bottleneck (network, CPU, NIC, memory, disk, etc.) value real32 M result, in Mbits/sec
achievable bandwidth tests should all reference end-host (both source and destination) information that includes the following class:
non-OS specific properties
(Note: all these exist in one of several CIM objects, but it's useful to collect them together for our purposed. We can use CIM "Model corespondence" to point to source of the data.)
Property Type Requirement Level Description CIM ClassOrigin OSversion string M name and version of OS used NICtype uint16 M 100BT, 1000BT, etc NICchipSet uint16 O? Intel, syskonnect, etc. MTU string M MTU size set by host (number of bytes) disk string O type of disk for disk-to-disk tests MemorySize uint16 O amount of memory MemorySpeed string O type/speed of memory CPU string O type/speed of CPU IOBusSpeed string O type/speed of IO Bus time datetime M time last measured
OS specific properties
Linux 2.4
Property Type Requirement Level Description CIM ClassOrigin net.core.rmem_max
uint32 O net.core.wmem_max uint32 O net.core.rmem_default uint32 O net.core.wmem_default uint32 O net.ipv4.tcp_rmem uint32 O net.ipv4.tcp_wmem uint32 O net.ipv4.tcp_mem uint32 O net.core.netdev_max_backlog uint32 O txqueuelen uint32 O ?? time datetime M time last measured
FreeBSD
Solaris
etc.
A measurement for delay might include all of the following (several are optional)
path.delay.roundTrip
(need to finish this)
Property Type Requirement Level Description CIM ClassOrigin value uint16 M average result, in milliseconds
Mapping of NW-WG Terminology to CIM Terminology:
Note: still need to resolve which terminology to use in this document.
Current NM-WG Term CIM TermCharacteristic A subclass of StatisticalData Measurement Methodology / Tools Service Observation An instance of the subclass of StatisticalData Nodes Systems: subclasses are AdminDomain and AutonomousSystem, and ComputerSystem which may be virtual, dedicated to switching or routing, or a user's computer Paths NetworkPipes