Schema/Profile for Network Performance Measurements for Grids
version .11 (Sept 8, 2003)
This document is a first attempt to define names and properties for the most important network measurements for Grid middleware.This document is not yet complete.
This document describes a set of schemas for publishing network measurement data. It is assumed that the reader of this document is familiar with the GGF NM-WG document:A Hierarchy of Network Performance Characteristics for Grid Applications and Services, which defines a classification hierarchy for network measurements that are useful for Grid applications and services. Use of the schemas described in this document should facilitate the development of interoperable Grid services.
As an example of how such network measurements could be used in a Grid environment, we use the case of a Grid file transfer service. Assume that a Grid Scheduler determines that a copy of a given file needs to be copied to site A before a job can be run. Several copies of this file are registered in a Data Grid Replica Catalogue, so there is a choice of where to copy the file from. The Grid Scheduler needs to determine the optimal method to create this new file copy, and to estimate how long this file creation will take. To make this selection the scheduler must determine what is the best source (or sources) to copy the data from. Selecting the best source to copy the data from requires a prediction of future end-to-end path characteristics between the destination and each possible source. Accurate prediction of the performance obtainable from each source requires measurement of available bandwidth (both end-to-end and hop-by-hop), latency, loss, and other characteristics important to file transfer performance.
A simple example is the following. Publication of network delay information such as is measured using ping (named path.delay.roundTrip in the NMWG "Characteristics" document) requires a great deal of information to be able to interpret the results. A number of test parameters must also be published, such as the tool name, the number of samples used, the protocol used, the packet size, and so on. Some of this information is mandatory, and other information is optional.
Here are some sample results.
Terminology:
Target: Defined in GGF DAMED WG naming document.
The tables in this document use the following letters to describe the requirement level: (from CIM)
M – Mandatory
O – Optional
C – Conditional (See CIM documentation for explanation)
Types: (from CIM)
string
uint16
uint16[ ] (array of 16bit ints)
uint32
uint64
real32
boolean
datetime: standard timestamp
All measurement must have a IETF RFC3339 Timestamp and a value.
We are using the naming conventions defined by the GGF DAMED working group.
Classes for the following network measurement characteristics are defined in this document.
path.delay.roundTrip
path.delay.oneWay
path.delay.jitter.roundTrip
path.delay.jitter.oneWaypath.loss.roundTrip
path.loss.oneWaypath.reordering.oneWay
path.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream
path.bandwidth.achievable.UDPpath.bandwidth.available
path.bandwidth.utilization
path.bandwidth.capacityAll of the above measurement characteristics except for bandwidth.achievable can be for hops as well as for paths. E.G.:
hop.bandwidth.capacity
hop.bandwidth.utilizedAdditional topology characteristics will be included in a future version of this document.
There are 5 main classes:
NetworkTestTool: describes the measurement tool
NetworkTestCharacteristic: specifies which NMWG characteristic is being measured.
NetworkTestSetting: describes the input parameters and/or built-in testing methodology
NetworkTestInfo: describes the endpoints of the test (hosts, etc.)
NetworkTestResults: describes the output
NetworkTestTool: (subclass of CIM_SERVICE)
Property Type Requirement Level Description CIM ClassOrigin toolName string M name of tool used SoftwareIdentity majorVersion string O? version of tool used SoftwareIdentity minorVersion string O version of tool used SoftwareIdentity Measured boolean M Measured (ie: SNMP) vs estimated measurementMethod string O eg: SNMP, packet pair, packet train, etc. (eg: URI pointer to description)
Note: Should "NetworkTestMethodology" be a separate class? Also, measurementMethod will be hard to get right, as tools often use multiple methods or a combination of methods. One solution is to just put a URL to the tool web page here. Q: should samplingMethod or statisticalMethod be separate from measurementMethod?
NetworkTestCharacteristic:
Property Type Requirement Level Description CIM ClassOrigin Characteristic string M NMWG Characteristic Note: a given tool may measure several of these.
NetworkTestInfo
Property Type Requirement Level Description CIM ClassOrigin source string M Source IP:[port] IPProtocolEndpoint destination string M Destination IP[port] IPProtocolEndpoint sourceHostInfo see below ComputerSystem sourceHostTimerDevice see below destHostInfo ComputerSystem destHostTimerDevice
NetworkTestSetting: (subclass of CIM_StatisticalData ?)
The first group of properties is for UDP/ICMP based tests, and the 2nd group is for TCP based tests. Some of these only make sense for UDP/ICMP based tests, and some of these only make sense for TCP based tests.
Property Type Requirement Level Description CIM ClassOrigin packetType string M ICMP or UDP or TCP packetSize uint16 O size of test packet numPackets uint16 O number of test packets packetSpacing boolean O (UDP) Poisson or periodic packetGap real32 C (UDP) time between test packets, in seconds (for periodic tests) portNum uint16 O port number used for test ToS uint16 O Type of Service (IP precedence) ProtocolID uint8 O IP v4 or v6 DSCP uint8 O differentiated services code point FlowLabel uint8 O IP v6 option for QoS lossThreshold uint16 M (UDP) the threshold used to distinguish between a large finite delay and loss numBytes uint32 O (TCP) amount of test traffic duration real32 O how many seconds the test ran TCPBufferSize uint32 M (TCP) size of TCP buffers used TCPType string O (TCP) Reno, Vegas, HSTCP, ScalableTCP, etc numStreams uint16 O number of parallel streams includesDisk boolean O memory to memory or disk to disk
For more information on the details of these properties, see the following IETF documents:
One way Delay: http://www.ietf.org/rfc/rfc2679.txt
Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt
NetworkTestResults
All results use this base class:
| Property | Type | Requirement Level | Description | CIM ClassOrigin |
| startTime | datetime | O | time test was started | |
| endTime | datetime | M | time test was completed | |
| CharacteristicResults | specific results for this characteristic (see below) | |||
Different characteristics require different result, listed below:
path.delay.roundTrip
path.delay.oneWay
DelayResults (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin percentile uint16[ ] O array of percentiles, eg: 50th percentile is the median (See RFC2679) percentileValue real32[ ] C value for above percentile, in milliseconds median real32 O median of all measurements in test (optional, but strongly encouraged)
average real32 O average result in milliseconds minimum real32 O minimum of all measurements in test maximum real32 O maximum of all measurements in test StdDev real32 O standard deviation of the results NOTE: everyone agreed that median is more useful than average. However most tools currently only report average, so average is the only mandatory value. Note: current idea: for average, median, maximum, etc, use -1 to indicate value > loss threshold value. Need to discuss this more. Are there standard CIM properties from these sorts of stats?
For more information on the details of these properties, see the following IETF documents:
One way Delay:http://www.ietf.org/rfc/rfc2679.txt
Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt
path.loss.roundTrip
path.loss.oneWayLossResults (subclass of NetworkToolSetting)
Property Type Requirement Level Description CIM ClassOrigin Loss-Distance uint16 O number of packets since the previous loss (See RFC3357) Loss-Period uint16 O number of groups of lost packets (See RFC3357)
Noticeable-Rate uint16 O percent of packets lost where if the distance between the lost packet and the previously lost packet is no greater than the "loss constraint" (See RFC3357) Period-Total uint16 O total number of loss periods (See RFC3357) Period-Lengths uint O number of packets in a burst of loss (See RFC3357) Inter-Loss-Period-Lengths uint O number of packets between bursts of loss (See RFC3357) NumPacketsLost uint O number of packets lost out during the test PercentLoss real32 M average packet loss (in percent) (See RFC2680) For more information on the details of these properties, see the following IETF documents:
One-way Loss: http://www.ietf.org/rfc/rfc2680.txt
Loss Patterns: http://www.ietf.org/rfc/rfc3357.txt
path.delay.jitter.oneWay
path.delay.jitter.roundTrip
Note: someone who understands this better needs to flush this out
JitterResults (subclass of NetworkToolResults)
Property Type Requirement Level Description CIM ClassOrigin percentile uint O eg: 50th percentile is the median (See RFC3393) peak-to-peak-ipdv uint O (See RFC3393) AverageJitter real32 M number of ms For more information on the details of these properties, see the following IETF documents:
Delay Variation (Jitter): http://www.ietf.org/rfc/rfc3393.txt
path.reordering.oneWayReorderingResults (subclass of NetworkToolResults)
Property Type Requirement Level Description CIM ClassOrigin lateTime uint16 O (see IPPM draft) gap uint16 O number of positions out of order PercentReordered uint16 M result, in percent For more information on the details of these properties, see the following IETF documents:
http://www.ietf.org/internet-drafts/draft-ietf-ippm-reordering-02.txt
path.bandwidth.available
path.bandwidth.utilization
path.bandwidth.capacityBandwidthResults (subclass of NetworkToolResults)
Property Type Requirement Level Description CIM ClassOrigin confidence real32 O the tools reported confidence of this measurement bottleneck target O which hop is the bottleneck ("tight link") bandwidth real32 M result, in Mbits/sec
path.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream
path.bandwidth.achievable.UDPAchievableResults (subclass of NetworkToolResults)
Property Type Requirement Level Description CIM ClassOrigin bottleneck string O indication of what is the bottleneck (network, CPU, NIC, memory, disk, etc.) throughput real32 M result, in Mbits/sec Achievable bandwidth is defined in the NM-WG Characteristics document.
path.bandwidth.available: Percentage of the time that a path is up
AvailabilityStatistics (subclass of NetworkToolResults)
Property Type Requirement Level Description Units CIM ClassOrigin MTBF
uint32 O Mean Time Between Failures
seconds MTTR
unit32 O Mean Time To Repair
seconds Downs unit32 O Number of periods when path was not available periods Median-Outage-Length unit32 O Median seconds PercentUp real32 M percent Availability = 100*(#unavailable cycles/total cycles)% percent
Topology related Classes
path.hoplist: list of routers along a path
HopList (subclass of ??)
Property Type Requirement Level Description Units CIM ClassOrigin routerInterfaceList string [ ] M array if IP addresses of the routers on the path ??? hopcount uint32 M hopcount
still need to add:
Forwarding Table and Policy description
what else??
TimerDevice (being added to CIM V2.9, but will look something like this)
Property Type Requirement Level Description CIM ClassOrigin timeResolution real32[] O resolution of the timestamp of source and destination (in seconds) timeOffsetFromUTC real32 O distanceFrom GPS time timeAccuracy real32[] O accuracy of the timestamp (src and dest) (in seconds) timeAccuracyMethod string[] O clock sync method (src and dst): e.g.: NTP, GPS, AFS, etc. Note: It is very difficult to accurately determine timeAccuracy without something like a GPS clock on the measurement host. However it is important to be able to trust the timestamps, and even though these are optional, they are strongly encouraged to be used.
non-OS specific properties
(Note: all these exist in one of several CIM objects, but it's useful to collect them together for our purposed. We can use CIM "Model correspondence" to point to source of the data.)
Property Type Requirement Level Description CIM ClassOrigin OStype string M type of OS used OperatingSystem OSversion string M version of OS used OperatingSystem NICtype uint16 M 100BT, 1000BT, etc NICchipSet uint16 O? Intel, syskonnect, etc. MTU string M MTU size set by host (number of bytes) disk string O type of disk for disk-to-disk tests MemorySize uint16 O amount of memory MemorySpeed string O type/speed of memory CPU string O type/speed of CPU IOBusSpeed string O type/speed of IO Bus time datetime M time last measured
OS specific properties
Linux 2.4
Property Type Requirement Level Description CIM ClassOrigin net.core.rmem_max
uint32 O net.core.wmem_max uint32 O net.core.rmem_default uint32 O net.core.wmem_default uint32 O net.ipv4.tcp_rmem uint32 O net.ipv4.tcp_wmem uint32 O net.ipv4.tcp_mem uint32 O net.core.netdev_max_backlog uint32 O txqueuelen uint32 O ?? time datetime M time last measured
FreeBSD
Solaris
etc.
A measurement for delay might include all of the following (several are optional)
path.delay.roundTrip
(need to finish this)
Property Type Requirement Level Description CIM ClassOrigin value uint16 M average result, in milliseconds
Mapping of NW-WG Terminology to CIM Terminology:
Note: still need to resolve which terminology to use in this document.
Current NM-WG Term CIM TermCharacteristic A subclass of StatisticalData Measurement Methodology / Tools Service Observation An instance of the subclass of StatisticalData Nodes Systems: subclasses are AdminDomain and AutonomousSystem, and ComputerSystem which may be virtual, dedicated to switching or routing, or a user's computer Paths NetworkPipes