Schema/Profile for Network Performance Measurements for Grids

version .07 (June 19, 2003)

 

This document is a first attempt to define names and properties for the most important network measurements for Grid middleware.This document is not yet complete.


This document describes a set of schemas for publishing network measurement data. It is assumed that the reader of this document is familar with the GGF NM-WG document:A Hierarchy of Network Performance Characteristics for Grid Applications and Services, which defines a classification hierarchy for network measurements that are useful for Grid applications and services. Use of the schemas described in this document should facilite the developement of interoperable Grid services.

As an example of how such network measurements could be used in a Grid environment, we use the case of a Grid file transfer service. Assume that a Grid Scheduler determines that a copy of a given file needs to be copied to site A before a job can be run. Several copies of this file are registered in a Data Grid Replica Catalogue, so there is a choice of where to copy the file from. The Grid Scheduler needs to determine the optimal method to create this new file copy, and to estimate how long this file creation will take. To make this selection the scheduler must determine what is the best source (or sources) to copy the data from. Selecting the best source to copy the data from requires a prediction of future end-to-end path characteristics between the destination and each possible source. Accurate prediction of the performance obtainable from each source requires measurement of available bandwidth (both end-to-end and hop-by-hop), latency, loss, and other characteristics important to file transfer performance.

A simple example is the following. Publication of network delay information such as is measured using ping (named path.delay.roundTrip in the NMWG "Characteristics" document) requires a great deal of infomation to be able to interpret the results. A number of test parameters must also be published, such as the tool name, the number of samples used, the protocol used, the packet size, and so on. Some of this information is mandatory, and other information is optional. The final result for a path.delay.roundTrip test may look like this.

path.delay.roundTrip

Property Value
source 131.243.2.11
destination 137.138.28.230
time 20030521060902.893847
toolName ping
toolVersion redhat 7.2
packetSize 64
numPackets 25
packetSpacing periodic
packetGap 1.0
packetType ICMP
minimum 275.149
maximum 277.674
median 275.121
StdDev 0.375
value 275.727

The classes that this result is made up from are all described in detail below.


Terminology:

Target: Defined in GGF DAMED WG naming document.

The tables in this document use the following letters to describe the requirement level: (from CIM)

M – Mandatory
O – Optional
C – Conditional (See CIM documentation for explanation)

Types: (from CIM)

string
uint16
uint16[ ] (array of 16bit ints)
uint32
uint64
real32
boolean
datetime: standard timestamp

All measurement must have a IETF RFC3339 Timestamp and a value.

We are using the naming conventions defined by the GGF DAMED working group.

Classes for the following network measurement characteristics are defined in this document.

path.delay.roundTrip
path.delay.oneWay
path.delay.jitter

path.loss.oneWay
path.reordering.oneWay

path.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream
path.bandwidth.achievable.UDP


path.bandwidth.available
path.bandwidth.utilization
path.bandwidth.capacity

All of the above measurement characteristics except for bandwidth.achievable can be for hops as well as for paths. E.G.:

hop.bandwidth.capacity
hop.bandwidth.utilized

Additional topology characteristics will be included in a future version of this document.


All measurements require the classes NetworkTestTool, NetworkTestInfo, NetworkToolSetting, and a specific characteristic test, all described below. An example of a complete measurement result is found here.

NetworkTestTool: (subclass of CIM_SERVICE)

Property Type Requirement Level Description CIM ClassOrigin
toolName string M name of tool used  
toolVersion string O? version of tool used  
toolAccuracy real32 O some indication of the accuracy of the tool (what are the units? %error? need more discussion on this)  

 

NetworkTestInfo (subclass of CIM_StatisticalData?)

Property Type Requirement Level Description CIM ClassOrigin
source string M Source IP:[port]  
destination string M Destination IP[port]  
startTime datetime O time test was started  
time datetime M time test was completed  
timeResolution real32[] O resolution of the timestamp of source and destination (in seconds)  
timeAccuracy real32[] O accuracy of the timestamp (src and dest) (in seconds)  
timeAccuracyMethod string[] O clock sync method (src and dst): e.g.: NTP, GPS, AFS, etc.  

Note: It is very difficult to accurately determine timeAccuracy without something like a GPS clock on the measurement host. However it is important to be able to trust the timestamps, and even though these are optional, they are strongly encouraged to be used.

 

Delay, Loss, Jitter, capacity, reordering, available bandwidth, and achievable.UDP measurements all require the following base class:

NetworkToolSetting: (subclass of CIM_StatisticalData ?)

Property Type Requirement Level Description CIM ClassOrigin
packetSize uint16 M size of test packet CIM_StatisticalData
numPackets uint16 M number of test packets  
packetSpacing boolean O Poisson or periodic  
packetGap real32 C time between test packets, in seconds (for periodic tests)  
packetType string M ICMP or UDP or TCP  
portNum uint16 O port number used for test  
priority uint16 O IP precedence bit set, etc.  
lossThreshold uint16 M the threshold used to distinguish between a large finite delay and loss  

For more information on the details of these properties, see the following IETF documents:

One way Delay: http://www.ietf.org/rfc/rfc2679.txt

Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt

 


These characteristics use the following class to describe their properties:

path.delay.roundTrip
path.delay.oneWay

NetworkPathDelayStatistics (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
percentile uint16[ ] O array of percentiles, eg: 50th percentile is the median (See RFC2679)  
percentileValue real32[ ] C value for above percentile, in milliseconds  
median real32 O median of all measurements in test (optional, but strongly encouraged)
 
minimum real32 O minimum of all measurements in test  
maximum real32 O maximum of all measurements in test  
StdDev real32 O standard deviation of the results  
value real32 M average result in milliseconds  

NOTE: everyone agreed that median is more useful than average. However most tools currently only report average, so average is the only mandatory value.

Note: current idea: for value, median, maximum, etc, use -1 to indicate value > loss threshold value. Need to discuss this more.

For more information on the details of these properties, see the following IETF documents:

One way Delay:http://www.ietf.org/rfc/rfc2679.txt

Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt


These characteristics use the following class to describe their properties:


path.loss.roundTrip
path.loss.oneWay

 

NetworkPathLossStatistics (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
Loss-Distance uint16 O number of packets since the previous loss (See RFC3357)  
Loss-Period uint16 O

number of groups of lost packets (See RFC3357)

 
Noticeable-Rate uint16 O percent of packets lost where if the distance between the lost packet and the previously lost packet is no greater than the "loss constraint" (See RFC3357)  
Period-Total uint16 O total number of loss periods (See RFC3357)  
Period-Lengths uint O number of packets in a burst of loss (See RFC3357)  
Inter-Loss-Period-Lengths uint O number of packets between bursts of loss (See RFC3357)  
NumPacketsLost uint O number of packets lost out during the test  
value real32 M average packet loss (in percent) (See RFC2680)  

For more information on the details of these properties, see the following IETF documents:

One-way Loss: http://www.ietf.org/rfc/rfc2680.txt

Loss Patterns: http://www.ietf.org/rfc/rfc3357.txt


This characteristic uses the following class to describe its properties:


path.delay.jitter

Note: someone who understands this better needs to flush this out

 

NetworkPathJitterStatistics (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
percentile uint O eg: 50th percentile is the median (See RFC3393)  
peak-to-peak-ipdv uint O (See RFC3393)  
value real32 M number of ms  

For more information on the details of these properties, see the following IETF documents:

Delay Variation (Jitter): http://www.ietf.org/rfc/rfc3393.txt

 


This characteristic uses the following class to describe its properties:


path.reordering.oneWay

 

NetworkPathReorderingStatistics (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
lateTime uint16 O (see IPPM draft)  
gap uint16 O number of positions out of order  
value uint16 M result, in percent  

For more information on the details of these properties, see the following IETF documents:

http://www.ietf.org/internet-drafts/draft-ietf-ippm-reordering-02.txt


 

path.bandwidth.available
path.bandwidth.utilization

NetworkPathABWStatistics (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
Measured boolean M Measured (ie: SNMP) vs estimated  
measurementMethod string O eg: SNMP, packet pair, packet train, etc. (eg: URI pointer to description)  
confidence real32 O the tools reported accuracy of this measurement  
bottleneck target O which hop is the bottleneck ("tight link")  
value real32 M result, in Mbits/sec  

Note: measurementMethod will be hard to get right, as tools often use multiple methods or a combination of methods. One solution is to just put a URL to the tool web page here.

Q: should "samplingMethod" be separate from measurementMethod?


path.bandwidth.capacity

NetworkPathCapacityStatistics (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
Measured boolean M Measured (ie: SNMP) vs estimated  
measurementMethod string O eg: SNMP, packet pair, packet train, etc.  
confidence real32 O the tools reported accuracy of this measurement  
bottleneck target O which hop is the bottleneck ("narrow link")  
value real32 M result, in Mbits/sec  

Q: should "samplingMethod" be separate from measurementMethod? Also "bottleneckDetectionMethod?


These characteristics use the following class to describe their properties:

path.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream

 

Property Type Requirement Level Description CIM ClassOrigin
numBytes uint32 O amount of test traffic  
duration real32 O how many seconds the test ran  
measurementMethod string O eg: 1 long test, average of shorter tests, etc.  
TCPBufferSize uint32 M size of TCP buffers used  
TCPType string O Reno, Vegas, HSTCP, ScalableTCP, etc  
numStreams uint16 O number of parallel streams  
includesDisk boolean O memory to memory or disk to disk (need pointer to disk object?)  
bottleneck string O indication of what is the bottleneck (network, CPU, NIC, memory, disk, etc.)  
value real32 M result, in Mbits/sec  

Achievable bandwidth is defined in the NM-WG Characteristics document.

Q: such "includesDisk" be a property, or a characteristic (ie: path.bandwidth.achievable.TCP.disk2disk)?

 

path.bandwidth.achievable.UDP

E2EAchievableUDPStatistics (subclass of ??)

Property Type Requirement Level Description CIM ClassOrigin
numBytes uint32 O amount of test traffic  
duration real32 O how many seconds the test ran  
measurementMethod string O eg: 1 long test, average of shorter tests, etc.  
numStreams uint16 O number of parallel streams  
includesDisk boolean O memory to memory or disk to disk (need pointer to disk object?)  
bottleneck string O indication of what is the bottleneck (network, CPU, NIC, memory, disk, etc.)  
value real32 M result, in Mbits/sec  

 

achievable bandwidth tests should all reference end-host (both source and destination) information that includes the following class:

non-OS specific properties

(Note: all these exist in one of several CIM objects, but it's useful to collect them together for our purposed. We can use CIM "Model corespondence" to point to source of the data.)

Property Type Requirement Level Description CIM ClassOrigin
OSversion string M name and version of OS used  
NICtype uint16 M 100BT, 1000BT, etc  
NICchipSet uint16 O? Intel, syskonnect, etc.  
MTU string M MTU size set by host (number of bytes)  
disk string O type of disk for disk-to-disk tests  
MemorySize uint16 O amount of memory  
MemorySpeed string O type/speed of memory  
CPU string O type/speed of CPU  
IOBusSpeed string O type/speed of IO Bus  
time datetime M

time last measured

 

 

OS specific properties

Linux 2.4

Property Type Requirement Level Description CIM ClassOrigin
net.core.rmem_max
uint32 O    
net.core.wmem_max uint32 O    
net.core.rmem_default uint32 O    
net.core.wmem_default uint32 O    
net.ipv4.tcp_rmem uint32 O    
net.ipv4.tcp_wmem uint32 O    
net.ipv4.tcp_mem uint32 O    
net.core.netdev_max_backlog uint32 O    
txqueuelen uint32 O    
??        
time datetime M

time last measured

 

 

FreeBSD

Solaris

etc.


 

Sample Complete Measurement:

A measurement for delay might include all of the following (several are optional)

path.delay.roundTrip

(need to finish this)

Property Type Requirement Level Description CIM ClassOrigin
value uint16 M average result, in milliseconds  

 


Mapping of NW-WG Terminology to CIM Terminology:

Note: still need to resolve which terminology to use in this document.

Current NM-WG Term
CIM Term
Characteristic A subclass of StatisticalData
Measurement Methodology / Tools Service
Observation An instance of the subclass of StatisticalData
Nodes Systems: subclasses are AdminDomain and AutonomousSystem, and ComputerSystem which may be    virtual, dedicated to switching or routing, or a user's computer
Paths NetworkPipes