Schema/Profile for Network Performance Measurements for Grids

version .11 (Sept 8, 2003)

 

This document is a first attempt to define names and properties for the most important network measurements for Grid middleware.This document is not yet complete.


This document describes a set of schemas for publishing network measurement data. It is assumed that the reader of this document is familiar with the GGF NM-WG document:A Hierarchy of Network Performance Characteristics for Grid Applications and Services, which defines a classification hierarchy for network measurements that are useful for Grid applications and services. Use of the schemas described in this document should facilitate the development of interoperable Grid services.

As an example of how such network measurements could be used in a Grid environment, we use the case of a Grid file transfer service. Assume that a Grid Scheduler determines that a copy of a given file needs to be copied to site A before a job can be run. Several copies of this file are registered in a Data Grid Replica Catalogue, so there is a choice of where to copy the file from. The Grid Scheduler needs to determine the optimal method to create this new file copy, and to estimate how long this file creation will take. To make this selection the scheduler must determine what is the best source (or sources) to copy the data from. Selecting the best source to copy the data from requires a prediction of future end-to-end path characteristics between the destination and each possible source. Accurate prediction of the performance obtainable from each source requires measurement of available bandwidth (both end-to-end and hop-by-hop), latency, loss, and other characteristics important to file transfer performance.

A simple example is the following. Publication of network delay information such as is measured using ping (named path.delay.roundTrip in the NMWG "Characteristics" document) requires a great deal of information to be able to interpret the results. A number of test parameters must also be published, such as the tool name, the number of samples used, the protocol used, the packet size, and so on. Some of this information is mandatory, and other information is optional.

Here are some sample results.

 


Terminology:

Target: Defined in GGF DAMED WG naming document.

The tables in this document use the following letters to describe the requirement level: (from CIM)

M – Mandatory
O – Optional
C – Conditional (See CIM documentation for explanation)

Types: (from CIM)

string
uint16
uint16[ ] (array of 16bit ints)
uint32
uint64
real32
boolean
datetime: standard timestamp

All measurement must have a IETF RFC3339 Timestamp and a value.

We are using the naming conventions defined by the GGF DAMED working group.

Classes for the following network measurement characteristics are defined in this document.

path.delay.roundTrip
path.delay.oneWay
path.delay.jitter.roundTrip
path.delay.jitter.oneWay

path.loss.roundTrip
path.loss.oneWay

path.reordering.oneWay

path.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream
path.bandwidth.achievable.UDP

path.bandwidth.available
path.bandwidth.utilization
path.bandwidth.capacity

All of the above measurement characteristics except for bandwidth.achievable can be for hops as well as for paths. E.G.:

hop.bandwidth.capacity
hop.bandwidth.utilized

Additional topology characteristics will be included in a future version of this document.


There are 5 main classes:

NetworkTestTool: describes the measurement tool

NetworkTestCharacteristic: specifies which NMWG characteristic is being measured.

NetworkTestSetting: describes the input parameters and/or built-in testing methodology

NetworkTestInfo: describes the endpoints of the test (hosts, etc.)

NetworkTestResults: describes the output

 

NetworkTestTool: (subclass of CIM_SERVICE)

Property Type Requirement Level Description CIM ClassOrigin
toolName string M name of tool used SoftwareIdentity
majorVersion string O? version of tool used SoftwareIdentity
minorVersion string O version of tool used SoftwareIdentity
Measured boolean M Measured (ie: SNMP) vs estimated  
measurementMethod string O eg: SNMP, packet pair, packet train, etc. (eg: URI pointer to description)  

 

Note: Should "NetworkTestMethodology" be a separate class? Also, measurementMethod will be hard to get right, as tools often use multiple methods or a combination of methods. One solution is to just put a URL to the tool web page here. Q: should samplingMethod or statisticalMethod be separate from measurementMethod?

 

NetworkTestCharacteristic:

Property Type Requirement Level Description CIM ClassOrigin
Characteristic string M NMWG Characteristic  

Note: a given tool may measure several of these.

 

NetworkTestInfo

Property Type Requirement Level Description CIM ClassOrigin
source string M Source IP:[port] IPProtocolEndpoint
destination string M Destination IP[port] IPProtocolEndpoint
sourceHostInfo     see below ComputerSystem
sourceHostTimerDevice     see below  
destHostInfo       ComputerSystem
destHostTimerDevice        

 

NetworkTestSetting: (subclass of CIM_StatisticalData ?)

The first group of properties is for UDP/ICMP based tests, and the 2nd group is for TCP based tests. Some of these only make sense for UDP/ICMP based tests, and some of these only make sense for TCP based tests.

Property Type Requirement Level Description CIM ClassOrigin
packetType string M ICMP or UDP or TCP  
packetSize uint16 O size of test packet  
numPackets uint16 O number of test packets  
packetSpacing boolean O (UDP) Poisson or periodic  
packetGap real32 C (UDP) time between test packets, in seconds (for periodic tests)  
portNum uint16 O port number used for test  
ToS uint16 O Type of Service (IP precedence)  
ProtocolID uint8 O IP v4 or v6  
DSCP uint8 O differentiated services code point  
FlowLabel uint8 O IP v6 option for QoS  
lossThreshold uint16 M (UDP) the threshold used to distinguish between a large finite delay and loss  
numBytes uint32 O (TCP) amount of test traffic  
duration real32 O how many seconds the test ran  
TCPBufferSize uint32 M (TCP) size of TCP buffers used  
TCPType string O (TCP) Reno, Vegas, HSTCP, ScalableTCP, etc  
numStreams uint16 O number of parallel streams  
includesDisk boolean O memory to memory or disk to disk  

 

For more information on the details of these properties, see the following IETF documents:

One way Delay: http://www.ietf.org/rfc/rfc2679.txt

Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt


NetworkTestResults

All results use this base class:

Property Type Requirement Level Description CIM ClassOrigin
startTime datetime O time test was started  
endTime datetime M time test was completed  
CharacteristicResults     specific results for this characteristic (see below)  
         

 

Different characteristics require different result, listed below:

path.delay.roundTrip
path.delay.oneWay

DelayResults (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
percentile uint16[ ] O array of percentiles, eg: 50th percentile is the median (See RFC2679)  
percentileValue real32[ ] C value for above percentile, in milliseconds  
median real32 O median of all measurements in test (optional, but strongly encouraged)
 
average real32 O average result in milliseconds  
minimum real32 O minimum of all measurements in test  
maximum real32 O maximum of all measurements in test  
StdDev real32 O standard deviation of the results  

NOTE: everyone agreed that median is more useful than average. However most tools currently only report average, so average is the only mandatory value. Note: current idea: for average, median, maximum, etc, use -1 to indicate value > loss threshold value. Need to discuss this more. Are there standard CIM properties from these sorts of stats?

For more information on the details of these properties, see the following IETF documents:

One way Delay:http://www.ietf.org/rfc/rfc2679.txt

Round Trip Delay: http://www.ietf.org/rfc/rfc2681.txt



path.loss.roundTrip
path.loss.oneWay

LossResults (subclass of NetworkToolSetting)

Property Type Requirement Level Description CIM ClassOrigin
Loss-Distance uint16 O number of packets since the previous loss (See RFC3357)  
Loss-Period uint16 O

number of groups of lost packets (See RFC3357)

 
Noticeable-Rate uint16 O percent of packets lost where if the distance between the lost packet and the previously lost packet is no greater than the "loss constraint" (See RFC3357)  
Period-Total uint16 O total number of loss periods (See RFC3357)  
Period-Lengths uint O number of packets in a burst of loss (See RFC3357)  
Inter-Loss-Period-Lengths uint O number of packets between bursts of loss (See RFC3357)  
NumPacketsLost uint O number of packets lost out during the test  
PercentLoss real32 M average packet loss (in percent) (See RFC2680)  

For more information on the details of these properties, see the following IETF documents:

One-way Loss: http://www.ietf.org/rfc/rfc2680.txt

Loss Patterns: http://www.ietf.org/rfc/rfc3357.txt


path.delay.jitter.oneWay
path.delay.jitter.roundTrip

Note: someone who understands this better needs to flush this out

JitterResults (subclass of NetworkToolResults)

Property Type Requirement Level Description CIM ClassOrigin
percentile uint O eg: 50th percentile is the median (See RFC3393)  
peak-to-peak-ipdv uint O (See RFC3393)  
AverageJitter real32 M number of ms  

For more information on the details of these properties, see the following IETF documents:

Delay Variation (Jitter): http://www.ietf.org/rfc/rfc3393.txt



path.reordering.oneWay

ReorderingResults (subclass of NetworkToolResults)

Property Type Requirement Level Description CIM ClassOrigin
lateTime uint16 O (see IPPM draft)  
gap uint16 O number of positions out of order  
PercentReordered uint16 M result, in percent  

For more information on the details of these properties, see the following IETF documents:

http://www.ietf.org/internet-drafts/draft-ietf-ippm-reordering-02.txt


path.bandwidth.available
path.bandwidth.utilization

path.bandwidth.capacity

BandwidthResults (subclass of NetworkToolResults)

Property Type Requirement Level Description CIM ClassOrigin
confidence real32 O the tools reported confidence of this measurement  
bottleneck target O which hop is the bottleneck ("tight link")  
bandwidth real32 M result, in Mbits/sec  

 


path.bandwidth.achievable.TCP
path.bandwidth.achievable.TCP.multiStream
path.bandwidth.achievable.UDP

AchievableResults (subclass of NetworkToolResults)

Property Type Requirement Level Description CIM ClassOrigin
bottleneck string O indication of what is the bottleneck (network, CPU, NIC, memory, disk, etc.)  
throughput real32 M result, in Mbits/sec  

Achievable bandwidth is defined in the NM-WG Characteristics document.

 

path.bandwidth.available: Percentage of the time that a path is up

AvailabilityStatistics (subclass of NetworkToolResults)

Property Type Requirement Level Description Units CIM ClassOrigin
MTBF
uint32 O Mean Time Between Failures
seconds  
MTTR
unit32 O Mean Time To Repair
seconds  
Downs unit32 O Number of periods when path was not available periods  
Median-Outage-Length unit32 O Median seconds  
PercentUp real32 M percent Availability = 100*(#unavailable cycles/total cycles)% percent  

 


Topology related Classes

path.hoplist: list of routers along a path

HopList (subclass of ??)

Property Type Requirement Level Description Units CIM ClassOrigin
routerInterfaceList string [ ] M array if IP addresses of the routers on the path    
???          
hopcount uint32 M hopcount    

 

still need to add:

Forwarding Table and Policy description

what else??

 


 

TimerDevice (being added to CIM V2.9, but will look something like this)

Property Type Requirement Level Description CIM ClassOrigin
timeResolution real32[] O resolution of the timestamp of source and destination (in seconds)  
timeOffsetFromUTC real32 O distanceFrom GPS time  
timeAccuracy real32[] O accuracy of the timestamp (src and dest) (in seconds)  
timeAccuracyMethod string[] O clock sync method (src and dst): e.g.: NTP, GPS, AFS, etc.  

Note: It is very difficult to accurately determine timeAccuracy without something like a GPS clock on the measurement host. However it is important to be able to trust the timestamps, and even though these are optional, they are strongly encouraged to be used.

 

non-OS specific properties

(Note: all these exist in one of several CIM objects, but it's useful to collect them together for our purposed. We can use CIM "Model correspondence" to point to source of the data.)

Property Type Requirement Level Description CIM ClassOrigin
OStype string M type of OS used OperatingSystem
OSversion string M version of OS used OperatingSystem
NICtype uint16 M 100BT, 1000BT, etc  
NICchipSet uint16 O? Intel, syskonnect, etc.  
MTU string M MTU size set by host (number of bytes)  
disk string O type of disk for disk-to-disk tests  
MemorySize uint16 O amount of memory  
MemorySpeed string O type/speed of memory  
CPU string O type/speed of CPU  
IOBusSpeed string O type/speed of IO Bus  
time datetime M

time last measured

 

 

OS specific properties

Linux 2.4

Property Type Requirement Level Description CIM ClassOrigin
net.core.rmem_max
uint32 O    
net.core.wmem_max uint32 O    
net.core.rmem_default uint32 O    
net.core.wmem_default uint32 O    
net.ipv4.tcp_rmem uint32 O    
net.ipv4.tcp_wmem uint32 O    
net.ipv4.tcp_mem uint32 O    
net.core.netdev_max_backlog uint32 O    
txqueuelen uint32 O    
??        
time datetime M

time last measured

 

 

FreeBSD

Solaris

etc.


 

Sample Complete Measurement:

A measurement for delay might include all of the following (several are optional)

path.delay.roundTrip

(need to finish this)

Property Type Requirement Level Description CIM ClassOrigin
value uint16 M average result, in milliseconds  

 


Mapping of NW-WG Terminology to CIM Terminology:

Note: still need to resolve which terminology to use in this document.

Current NM-WG Term
CIM Term
Characteristic A subclass of StatisticalData
Measurement Methodology / Tools Service
Observation An instance of the subclass of StatisticalData
Nodes Systems: subclasses are AdminDomain and AutonomousSystem, and ComputerSystem which may be    virtual, dedicated to switching or routing, or a user's computer
Paths NetworkPipes