Network Characterization Service Inquiry Protocol Specification v1.1

Jin Guojun

DSD

Lawrence Berkeley National Laboratory

1 Cyclotron Road, Berkeley, CA 94720

June 2000

Revised: April 18, 2001

Table of Content

Introduction.

Reference.

Inquiry data structure and request sequence.

Commands.

Status.

Use Cases.

Who should use NCS and How to use NCS.

Platform support.

Introduction

This is a user API that provides remote information inquiry interface. The protocol is described in API specification document, named API, under design/ncsd/api directory. Related data structures and definitions are in netest/include/ncs-api.h file.

This API specifies a protocol rather than a program calling interface, See : NCS API command and return data structure, Figure 1. , so clients can use any language to inquire information from a NCSD. Notice that this API is not language specific. Currently, a C api is available in library format. Any other language library can be build according to this C library. For example, a python interface can be used for HTML programming in web design; or you may use Perl to inquire information from a NCSD. One thing to be cleared here is that no client programming needs to know NCSD data structure and implementation. The client programming only needs to know the inquiry sequence and data structure described in following paragraph and API specification.

Reference

netest/include/ncs-api.h in NCS distribution is a C header file that contains all solid definitions of data structure, inquiry commands, and status.

Inquiry data structure and request sequence

See : NCS API command and return data structure, Figure 1. is the NCS inquiry protocol data structure. It is 8-byte long, and it is the primary data structure used for inquiry and data retrieve process. The inquiry sequence is very simple, clients send command in data structure described in See : NCS API command and return data structure, Figure 1. to a NCSD, and the NCSD will return inquiry status and/or data in the same data structure to clients. This structure is called overlapped structure, i.e., a structure cell (16-bit filed or 32-bit filed) can be used for different purposes. To be able to use distinguished names for different inquiries, a set of overlapped (pseudo) structure or aliases need to be created to mirror the retrieve data structure (the bottom half of the See : NCS API command and return data structure, Figure 1. ) from See : return header data type, Figure 3. to See : user defined variable length data format, Figure 5. .

Any request (command or inquiry) must send this data structure to a NCSD by filling in a meaningful command or inquiry in nq_command field, with a valid nq_ipaddr (for init) or a valid nq_path_id (after init) if the command is a path specific inquiry.

Straight commands or non-path related (specific) inquiry, such as NQ_DST_IN_CACHE, may not require a nq_ipaddr or nq_path_id, but data. [data equivalent filed -- d.e.f.]

nq_command: uint16_t command field

nq_path_id: uint16_t a path ID

aliases -- nq_short_1 for NQ_SHORT_RET_{1S, 2, 3}

d.e.f. -- nrh_path_id ncsd_return_hd_t

nq_datalen: uint16_t for non-initial requests (o) (no use)

aliases -- nq_short_2 for return an int16 data (i)

nq_flags: uint16_t for both i/o

aliases -- nq_hopID a node ID (TTL-1) (o)

nq_short_3 for return third int16 (i)

_ nq_ipaddr: uint32_t for merely NQ_INIT_QUERY

alias -- nq_size: for return data size. It is also equivalent to ncsd_return_hd_t.nrh_size.

d.e.f. -- nsr_ipaddr in ncsd_shared_t, or nrd_ipaddr ncsd_return_data_t.

See : Other embedded structures. shows two data structures for returning hop information, maximum and available bandwidth, and round trip time, minimum and average. These data structures are embedded in both ncsd_shared_t and ncsd_return_data_t.

The bandwidth (ncs_hop_info_t) uses 16-bit float data type. The exponent uses 4 bits, and the mantissa uses rest 12 bits, but the size of mantissa may vary between 10 and 12 bits and it is defined as RateBITS in ncs-api.h. So, make sure to comply with it when you write APIs in other languages other then C. Examples on how to convert data formats between regular float and this 16-bit float formats are described in NCS inquiry commands NQ_GET_DBOTTLENECK_HOP and NQ_GET_SBOTTLENECK_HOP . See NQ_GET_DBOTTLENECK_HOP and NQ_GET_SBOTTLENECK_HOP. See NQ_GET_DBOTTLENECK_HOP and NQ_GET_SBOTTLENECK_HOP.

NCS 16-bit exponential data type format
nibble 1-3			nibble 4
mantissa			exponent

A regular reply starts with ncs_query_t (equivalent to ncsd_return_hd_t) data structure + any data with size indicated in nq_size (nrh_size) field.

A short format is returned by command NQ_SHORT_RET_??.

NQ_SHORT_RET_1S: one short return value in ret_short_1 field
NQ_SHORT_RET_1L: one long return value in ret_long_1 field
NQ_SHORT_RET_2: one short and one long return values
NQ_SHORT_RET_3: three short return values
NQ_SHORT_RET_X: depends on inquiry request

Since ncsd_return_hd_t does not have nrh_short_2 and nrh_short_3, so, once detected nrh_stat == NQ_SHORT_RET_3, use ncs_query_t to get data in case of using ncsd_return_hd_t instead of ncsd_query_t. ncsd_return_hd_t is a symbolic data structure, which is used only in long data format case. Normally, ncsd_query_t is used for sending command and receiving status, see See Commands. for details.

Some example of inquiry sequence is described in See : NCS inquiry sequence, Figure 5. . Inquiry process has two slightly different formats: long format and short format. The long format includes three steps:

sending a command
receiving a status and data length
receiving data (in length indicated in nq_size)

The short format has only two steps: sending a command and receiving status + data, which can fit into this 48-bit primary data structure (64 bits minus 16 bits used for state filed) in See : NCS API command and return data structure, Figure 1. .

This 48-bit data container can hold the maximum data in following format returned in the nq_stat field:

NQ_SHORT_RET_1S: one short return value in ret_short_1 field
NQ_SHORT_RET_1L: one long return value in ret_long_1 field
NQ_SHORT_RET_2: one short and one long return values
NQ_SHORT_RET_3: three short return values
NQ_SHORT_RET_X: depends on inquiry request

That is, returned nq_stat (nrh_stat depending on which structure is using) field may contains either error codes, or successful states that include above short data formats, which tell what data are in the 48-bit container, or the NQ_DATA_READY, which uses nq_size (nrh_size) to indicate the following data length for long data-return format.

The example in See : NCS inquiry sequence, Figure 5. shows both inquiry formats. More information can be found in ncs-api.h.

Notice that all data are in network byte order, and above example does not convert them!

Commands

All client requests should use a command/reply paired routine

SendCmd_RecvStat(fd, nq_p, command)

to send a command, because almost every command reply an either status or reverse command in nq. This routine will guarantee to read the reply to prevent a command channel blocked by a TCP reply.

Additional data can be I/Oed by read/write then.

Precaution --

The field -- nq_path_id -- in ncs_query_t data structure used for inquiry must be valid or 0 if unknown for most QUERY related commands. Also, this field needs not hton?/ntoh?() operation at clients side because clients never use it for any operation except returning it to the daemon. However, if this field is used for returning other type of data, then it may need ntohs() at clients side. Generally, when using this field for other purposes, use its alias -- nq_short_1.

MACRO --

Daemon side:

#define reply_nq(tcp, nq, status) { \

nq.nq_command = htons(NQ_STATUS);\

nq.nq_flags = htons(status); \

write(tcp, &nq, sizeof(nq)); \

}

NQ_INIT_QUERY

All clients contacting a NCSD for inquiry must start this command with a valid destination IP, then followed by level-2 commands.

To implement this command, please refer to the ncsC_example.c.

Daemon side:

Unless memory failure, it always return NS_READY.

Client side:

Once confirm the return status == NS_READY, then issue additional inquiry requests.

NQ_DST_IN_CACHE

See how many paths have been cached at a specific NCSD. This has been implemented in ncsCapi library as:

int NC_Query(pc_d_t *pcd_p)

Because this inquiry is not a path specific, so it needs not send a NQ_INIT_QUERY command before this inquiry. That is, this inquiry can be sent directly to any NCSD.

Daemon side:

To simplify the process, Do Not Return status by using standard

reply_nq --

nq.nq_command = htons(NQ_STATUS);

nq.nq_flags = htons(status);

Just simply return a command by using standard data return procedure -- 4-Byte command/data size + nq_size-Byte data:

nq.nq_command = htons(NS_DONE);

nq.nq_size = htonl(npcd * sizeof(*ncip));

write(ncsd_ctl, &nq, sizeof(nq)

if (!npcd)

break; /* nq.nq_size is Zero :-) */

/* if non-zero, send cache info. */

write(ncsd_ctl, ncip, npcd * sizeof(*ncip));

Client side:

Decode data according above order. In case of a NCSD cache is empty, the nq.nq_size will be zero.

NQ_REMOVE_QUERY

Remove a path from a specific NCSD. This command can be issued only by the initiator or the ncsd host.

Implemented as RemovePath(pc_d_t *pcd_p) in ncsCapi library.

Daemon side:

check if the target exists. If not, reply_nq(tcp, nq, NS_NONEXIST).

check if the requester is the initiator or the daemon host, do it if true, or go to error handle (rpy_fatal).

After successfully removing a path, NCSD will close this connection; otherwise, the connection will be left open.

NQ_STATUS

Daemon side:

Return specific NCSD status by using

reply_nq(tcp, nq, (l_pcd.pd_flags & NQ_STAT_MASK).

Client side:

check nq.nq_command == NQ_STATUS, then nq.nq_flags == ?

NCHG_NumProbes

NCHG_BurstSize

Change NCSD parent configuring parameters. These command will NOT change existing path configuring parameters.

Daemon side:

Return NS_DONE by reply_nq(tcp, nq, NS_DONE), or reply_nq(tcp, nq, NS_FATAL) if the value is out of range.

NCMD_NCSD_EXIT

Instruct a NCSD to exit and save its cache info. This may not be implemented on all daemons. A alternative way to do so is using "kill":

# kill -INT `ps axuwg | grep ncsd | awk '{print $2}'`

Client side:

need to read nq back regardless if checking the status or not because the daemon side will close the connection, and we do not want the connection lingers.

NCMD_MERGE_CACHE

Send a cache file to a NCSD for merging.

Daemon side:

Returns the number of entries has been merged into the cache in nq.nq_short_1 by reply_nq(tcp, nq, NS_DONE).

Client side:

Must read nq back regardless if checking the status or not.

NCMD_En_ACTIVE_SERVICE

NCMD_De_ACTIVE_SERVICE

Not implemented.

NCMD_RESERVED_128

reserved.

NCMD_START_MONITOR and NCMD_STOP_MONITOR

Start monitoring on a merged path, or resume a stopped path.

Stop monitoring (scheduling re-probing ) on a path.

reply_nq(tcp, nq, NS_DONE) or goto rpy_fatal if permission is denied.

Client side:

See examples at main() in ncsC_example.c:

ncsC_example.c :: main(...) {

...

if (ncmd) {

ncmd = NCS_ConfirmReq(u_pcd.pd_raw, ncmd, n ? conf_f[n-1] : NULL);

if (ncmd != NS_DONE)

prgmerr(-1, "NCMD RET Status = %s\n", NCS_CodeToName(ncmd));

}

...

}

Must read nq for status checking.

NQ_NOP

No real operation. This may used for passive monitoring and exchanging some information.

----- inquiry commands ----- Level-2 commands

NQ_GET_ALL and NQ_GET_INFO

No definition yet.

No implementation on server side.

NQ_GET_DBOTTLENECK_HOP and NQ_GET_SBOTTLENECK_HOP

Implemented as a ncsCapi library routine

Get_Bottleneck(pc_d_t *pcd_p, ncsd_shared_t *nsp, int Xbn, bool prt)

Xbn can be either NQ_GET_DBOTTLENECK_HOP or NQ_GET_SBOTTLENECK_HOP. Network byte ordered data is stored in nsp->nsd_hi.nhi_avl_BW for NQ_GET_DBOTTLENECK_HOP, and nsp->nsd_hi.nhi_max_BW otherwise.

Note: See Inquiry data structure and request sequence. has addressed NCS float data type (ExpBW). Here is the data conversion macros --

Daemon side: // convert float (real) number to 16-bit float

#define RealBW_to_ExpBW(rbw, ebw) { \

for (i=0; v > MAX_RATE_MANTISSA; ++i) \

v >>= SHIFT_BITS_4_1K; \

ebw = (i<<RateBITS) | v; \

}

Client side: // convert 16-bit float to integer

#define ExpBW_to_LongBW(ebw) \

((ebw & RATE_MANTISSA_MASK) << ((ebw >> RateBITS) * SHIFT_BITS_4_1K))

NQ_GET_BOTTLENECKS

No implementation at server (daemon) side. It is currently substituted by Get_Bottleneck(...), which does both NQ_GET_DBOTTLENECK_HOP and NQ_GET_SBOTTLENECK_HOP, in C.

NQ_GET_RTT

C API uses Get_RTT_n_TCPWin() to cover it and TCP Window Size.

Returns both min RTT (in nri_rtt) and average RTT (in nri_avg_rtt) in standard (long) format -- 4-Byte NQ + N-Bytes NR. These values are in 0.1 ms !!!

NQ_GET_RTTS

Returns RTT for each hop (node).

No implementation on server side. Use NQ_GET_RTT or NQ_GET_HOP_INFO for the moment.

NQ_GET_HOP_INFO

Implemented as a information printing routine

bool Get_HopInfo(pc_d_t *pcd_p, ncsd_return_data_t *nrp, bool prt)

in ncsCapi library. One can modify this routine to allocate hop_info_t for each link, and store queried data in there for returning. nq.nq_hopID contains node ID and nq.nq_path_id contains path ID for inquiry.

NQ_GET_TOTAL_HOPS

Implemented as Get_N_Hops(pc_d_t *pcd_p) in ncsCapi library.

Get total number of hops for path (pc_d_t *pcd_p). It is stored in pcd_p->pd_nhops which is from nq.nq_short_1 returned by a daemon in both forms used below. The last update timestamp can be returned in either standard form -- nq.nq_size (in 4-Byte NQ) + ==> N-Byte NR : ncsd_shared_t if timestamp requires high resolution (sec + usec). Otherwise, the timestamp will be returned in nq.nq_ret_long by short cut form.

NQ_GET_BEST_TxWIN:

return the best TCP transmission (Tx) window size in KBytes. The value is in nq_p->nq_ret_long (or nrh->nrh_size) returned in short cut format -- 4-Byte quick reply. It is integrated in GET_RTT_n_TCPWin().

Status

NS_INITIALIZED

A path connection exists, so further inquiry can be proceeded.

NS_PROBING

Specified path is under probing. Inquiry needs to wait until probing to finish.

NS_READY

This path is ready for inquiry.

NS_STOPPED

This path has been stopped for scheduling re-probing, but inquiry is acceptable for getting the lastest monitoring status.

NS_DONE

This confirms a previous command/inquiry is successfully done.

NS_NONEXIST

required path does not exist for inquiry.

NS_RTCHANGE

Routing path changed during the path probing.

NS_UNREACH

The destination is not reachable (due to not alive or other problems).

NS_FATAL

Command or inquiry is failed due to either unimplemented or wrong parameters. Check with a specific command for detailed error information.

NS_MULTIP

A multiple-path enquiry is allowed.

NR_COMPLETED

A pseudo bit-mask indicator (Do NOT Use it). Use NR_PARTIAL for real mask operation.

NR_PARTIAL

A bit-mask to see if a task sent to ncs server (daemon) is completed or partially completed.

Use Cases

This section describes what is the NCS design for.

Who should use NCS and How to use NCS

The NCSD is designed in purpose of being a generic network information service. It is used for applications that need to gain the best network throughput, such as ftp clients/servers, network storage systems, remote file systems, etc.; for applications that need to do QoS, such as available bandwidth based queuing; for applications that need to control network traffic, such as adaptive gateway; for users who need to know the network status, e.g., system administration personnels, network managers.

Usually, system administration and network diganosis may require frequent network probe. In this case, using a command line tool -- pipechar -- may be more convenient than rescheduling NCSD for reprobing. Below is some NCS use cases of inquiry procedures in both program and command methods.

Inquiry optimized TCP congestion window size for best TCP throughput:

Get_TCP_CongestWindow(dest_IP/NAME)

{

ncs_query_t nq;

int tcp = open a socket(TCP, NCSD_SERVICE_PORT);

return error if fails;

gethostbyname(dest_IP/NAME)

Loop { // initialize inquiry for the path to destination

copy dest IP into nq.nq_ipaddr;

n = SendCmd_RecvStat(tcp, &nq, NQ_INIT_QUERY);

if (n > 0) {

if (nq.nq_command == NQ_STATUS) {

if (nq.nq_flags == NS_PROBING)

continue; // not ready yet, so go back to loop;

break; // data is ready

}

nq_p->nq_path_id = 0; // must be a valid even not use it!

if ((value = SendCmd_RecvStat(tcp, nq_p, NQ_GET_BEST_TxWIN)) >= 0) {

value = ntohl(nq.nq_ret_long);

// otherwise, value = error code

close(tcp);

return value;

}

A program example of getting the dynamic bottleneck of a path for congestion control:

GetDynamicBottleneck(dest)

{

ncs_query_t nq;

ncs_shared_t ns_p = &nq;

do NQ_INIT_QUERY same as in example 1).

if (SendCmd_RecvStat(pcd_p->pd_raw, &nq, NQ_GET_DBOTTLENECK_HOP) >= 0

&& ntohs(nq.nq_stat == NQ_DATA_READY)

value = ntohs(ns_p->nsd_hi.nhi_avl_BW);

else value = ERROR;

close(tcp);

return value;

}

If multiple inquiries needed, the NQ_INIT_QUERY and NCSD TCP connection should be done outside of inquiry body to reduce the overhead. Above examples are simple one-time inquiries.

Use command line tool to inquire a path information to find the bottleneck (network managers can use this method to analyze a path). Option "-l" is for penetrating firewalls or non responsive routers/switches:

pipechar -l yukon.mcs.anl.gov

0: localhost [9 hops]

1: 16.0s ir100gw-r2.lbl.gov (131.243.2.1) 0.28 0.74 1.66ms

2: 16.4s er100gw.lbl.gov (131.243.128.5) 0.23 -3.00 5.80ms

3: 16.0s lbl2-gig-e.es.net (198.129.224.2) 0.21 0.26 1.35ms

4: 16.1s snv-nton-lbl2.es.net (134.55.208.182) 0.26 0.20 4.75ms

5: 19.2s chi-s-snv.es.net (134.55.205.102) 0.29 0.47 52.09ms

6: 3.6s anl-chi-ds3.es.net (134.55.208.150) 3.41 4.38 69.05ms

7: 2.4s anl-esanl2.es.net (198.124.254.166) 3.58 4.63 77.14ms

8: 6.6s stardust-msfc-20.mcs.anl.gov(140.221.20.124) 3.54 5.77 66.34ms

9: 7.2s tundra.mcs.anl.gov (140.221.9.176) 3.27 3.79 63.61ms

PipeCharacter statistics: 80.68% reliable

From localhost:

| 261.818 Mbps possible GigE (980.6149 Mbps)

1: ir100gw-r2.lbl.gov (131.243.2.1 )

| 300.797 Mbps unKnown link ??? congested bottleneck <68.4211% BW used>

2: er100gw.lbl.gov (131.243.128.5)

| 325.409 Mbps unKnown link ??? congested bottleneck <66.0377% BW used>

3: lbl2-gig-e.es.net (198.129.224.2)

| 278.840 Mbps unKnown link ??? congested bottleneck <71.8750% BW used>

4: snv-nton-lbl2.es.net (134.55.208.182)

| 990.290 Mbps GigE <13.2198% BW used>

5: chi-s-snv.es.net (134.55.205.102)

| 44.319 Mbps T3 <53.0654% BW used> May get 91.35% congested

6: anl-chi-ds3.es.net (134.55.208.150)

| 19.711 Mbps unKnown link ??? congested bottleneck <55.2448% BW used>

7: anl-esanl2.es.net (198.124.254.166)

| 20.180 Mbps unKnown link ??? congested bottleneck <54.8150% BW used>

8: stardust-msfc-20.mcs.anl.gov (140.221.20.124)

| 21.985 Mbps possible 100BT (95.8805 Mbps)

9: tundra.mcs.anl.gov (140.221.9.176)

A NCS client use API to inquire the same destination from a NCSD:

(option "-ah" ask for return all hops' info)

ncsC -ah yukon.mcs.anl.gov

IP = 131.243.2.35

9 hops to yukon.mcs.anl.gov: last update at Fri Jun 1 09:42:04 2001

hop 1: 131.243.2.1: BW avl 413 Mb max 953 Mb; RTT min 0.7 avg 2.0 ms

hop 2: 131.243.128.5: BW avl 255 Mb max 593 Mb; RTT min 0.9 avg 1.1 ms

hop 3: 198.129.224.2: BW avl 381 Mb max 953 Mb; RTT min 0.9 avg 1.1 ms

hop 4: 134.55.208.182: BW avl 259 Mb max 593 Mb; RTT min 3.0 avg 3.3 ms

hop 5: 134.55.205.102: BW avl 940 Mb max 954 Mb; RTT min 50.8 avg 59.5 ms

hop 6: 134.55.208.150: BW avl 18 Mb max 42 Mb; RTT min 53.5 avg 64.6 ms

hop 7: 198.124.254.166: BW avl 17 Mb max 42 Mb; RTT min 53.6 avg 67.4 ms

hop 8: 140.221.20.124: BW avl 17 Mb max 95 Mb; RTT min 54.1 avg 54.7 ms

hop 9: 140.221.9.176: BW avl 20 Mb max 95 Mb; RTT min 54.1 avg 54.4 ms

hop 7: 198.124.254.166: Dynamic bottleneck -- BW 17 Mb

hop 6: 134.55.208.150: Static bottleneck -- BW 42 Mb

Platform support

NCS is currently tested on following O.S. platforms:

FreeBSD (best performance)
BSD/OS and possible all other BSD O.S.s
Linus
Solaris
IRIX
Digital UNIX

It does not and will not run on T3E due to T3E compiler has no 16-bit integer type.

The generic NCS functions should be able to compile and run on platforms that comply with IPv4 standard. Only kernel timer related functions need to FreeBSD KLD mechanism, so that kernel timer related functions only available on FreeBSD platform.

Network Characterization Service Inquiry Protocol Specification v1.1

Table of Content

Introduction

Reference

Inquiry data structure and request sequence

NCS 16-bit exponential data type format

Commands

NQ_INIT_QUERY

NQ_DST_IN_CACHE

NQ_REMOVE_QUERY

NQ_STATUS

NCHG_NumProbes

NCHG_BurstSize

NCMD_NCSD_EXIT

NCMD_MERGE_CACHE

NCMD_En_ACTIVE_SERVICE

NCMD_De_ACTIVE_SERVICE

NCMD_RESERVED_128

NCMD_START_MONITOR and NCMD_STOP_MONITOR

NQ_NOP

----- inquiry commands ----- Level-2 commands

NQ_GET_ALL and NQ_GET_INFO

NQ_GET_DBOTTLENECK_HOP and NQ_GET_SBOTTLENECK_HOP

NQ_GET_BOTTLENECKS

NQ_GET_RTT

NQ_GET_RTTS

NQ_GET_HOP_INFO

NQ_GET_TOTAL_HOPS

NQ_GET_BEST_TxWIN:

Status

NS_INITIALIZED

NS_PROBING

NS_READY

NS_STOPPED

NS_DONE

NS_NONEXIST

NS_RTCHANGE

NS_UNREACH

NS_FATAL

NS_MULTIP

NR_COMPLETED

NR_PARTIAL

Use Cases

Who should use NCS and How to use NCS

Platform support