Tutorial for using netest and pipechar



	Brief note for difference between pipechar and pathchar
pipechar and pathchar give similar looking results, and both will tell you which segment is the bottleneck. However they are in fact quite different tools, and one should not confuse them. pathchar attempts to accurately report the bandwidth (OC-3 or slower) and loss characteristics of every hop in the path. pipechar accurately reports the slowest hop limited by the NIC speed in default; results for all segments beyond the slowest segment (static bottleneck) will not be accurate. However, if the slowest segment is a dynamic bottleneck, pipechar will penetrate this node and report rest link characters outstandingly.
For example, if the first hop (NIC) is the slowest, pipechar results for all other segments will be limited by this speed. This could be meaningless if the goal is to analyze the every link bandwidth, but it is useful for TCP applications to adjust their windows. To measure the bandwidth for every links from a slow NIC, please use advanced pipechar options.

Another significant difference between the tools is the time to run them. For a typical WAN path of 8 hops, basic pipechar operation takes about 1-2 minutes, and advanced pipechar operation takes four (4) more minutes longer, but pathchar may take up to 1 hour to do the equivalent operation. If you are trying to determine the optimal TCP window size, pipechar is clearly the better tool, since it takes much less time to identify the slowest hop.


1. Basic Usage:

Both netest and pipechar have a several options that may make users thinking them hard to understand and use. These options are designed for special case study and intensive testing, but not for the general purpose. Furthermore, these tools all have a certain level adaptive control mechanism that will let users use as few options as possible. The common usage is fairly simple.

Section 1: pipechar

In this section, let's briefly preview the basic usage for these tools. make simple start: pipechar destination or pipechar [-hop 0 | -bot] destination The second form tells pipechar to only report the bottleneck available bandwidth. "-hop 0" option instructs pipechar only probing the bottleneck available bandwidth. and "-bot" option instructs pipechar probing all links, but report only the bottleneck link.
This is the basic approach of network bandwidth analysis. Occasionally, you may see a negative result at the last hop or some funny thing in the middle hop (should not happen) because pipechar uses a sub-set (partial) of the network analysis filter in the NCSD. Simply to restart the test and you will get a correct result. If the problem persistent, try to slow down the probing described in the later of this section. Contact NCSD team ncsd#lbl.gov if and only if some funny things continuously happen. Now let's see how to read the output in a short example.
iss-p10.lbl.gov: pipechar ux5
0: localhost [3 hops]
1: ir100gw-r2.lbl.gov                  (131.243.2.1)      0.19  -0.13   0.98ms
2: ir30gw.lbl.gov                      (131.243.128.30)   0.77   0.81   2.52ms
3: ux5.lbl.gov                         (128.3.7.103)      1.02   1.53   3.42ms

PipeCharacter statistic: 1.05% reliable
From localhost:
|       387.097 Mbps GigE (1019.3851 Mbps)

1: ir100gw-r2.lbl.gov              (131.243.2.1 )
|
|       100.716 Mbps  100BT      <24.1413% BW used>
2: ir30gw.lbl.gov                  (131.243.128.30)
|       70.658 Mbps 100BT (100.7303 Mbps)

3: ux5.lbl.gov                     (128.3.7.103 )

re-testing

iss-p10.lbl.gov: pipechar ux5
Warning: Host [ux5] is not alive.
Reduce TTL [-max] if takes long time to probe
0: localhost [3 hops]
1: ir100gw-r2.lbl.gov                  (131.243.2.1)      0.22   0.19   1.02ms
2: ir30gw.lbl.gov                      (131.243.128.30)   0.75   0.79   2.42ms
3: ux5.lbl.gov                         (128.3.7.103)      0.74   0.98   2.79ms

PipeCharacter statistic: 94.05% reliable
From localhost:
|       327.273 Mbps GigE (1019.3851 Mbps)

1: ir100gw-r2.lbl.gov              (131.243.2.1 )
|
|       100.716 Mbps  100BT      <0.2678% BW used>
2: ir30gw.lbl.gov                  (131.243.128.30)
|       96.644 Mbps 100BT (100.7303 Mbps)

3: ux5.lbl.gov                     (128.3.7.103 )

pipechar has two report sections. First, pipechar print out some timing for each hop, tells where the probing is going to, and shows that it is alive. In this section, three times are reported -- minimum packet forwarding time, average packet differential time, and minimum round trip time (RTT). The 1st and 3rd timings represent how fast packets can travel through this hop; and the second timing gives some error information (it can be negative value frequently).
In second section, pipechar does hop by hop statistic analysis. The higher reliable percentage indicates the higher quality on the information collected. The low reliable percentage means that more error encountered during the probing, and the network utilization is high at this time. In above example, the reliability is 1.05%, because it takes very long time to finish for just three-(3)-hop probing, there must be some re-probing and/or alternative method used for improving the data accuracy. The result is nearly precise as comparing with the re-testing result. From the re-testing result , we can see that the ux5 is not responding probing correctly (even not alive) sometimes, and the middle router was much busy (24.14% v.s. 0.26%) during the first testing.

In statistic section, the bandwidth utilization is analyzed except for the two ends. If the hop analyzed is not in congested bottleneck, the left number is the maximum bandwidth that can be used for this link, and the currently used bandwidth percentage is on the right side. If this is a congested hop, the bandwidth on the left side means available bandwidth for this hop now. The statistic reports for both end are approximately maximum interface speed and the link bandwidth.

Here is more pipechar output analysis data that describe how general pipechar works. These data are collected in normal time in the morning. Three sets of data are in this page, the first set contains four testing results; three from a 1GB host with regular probe , fast probe , and even fast probe ; and the other one is from a 100BT host.
In the case fast probe , the bandwidth between hop 4 and 5 is negative, so NCSD analysis filter is invoked (hop analyzed: 279.95 : 0.00) to compute the current available bandwidth. Remember that this filter is partially installed, so results may not be the same as querying from the NCSD. Also, these are dynamic data that will vary from time to time. Do not be surprised by the data changes. If they are not changing, you are measuring a less used (peaceful) network.

Comparing the rest data to see what is different between results from fast NIC (1GB) and slow NIC (100BT). Remember that this is under normal network condition; when it is under the abnormal condition, you will get big surprise by FSE . More interested results may be posted here when they can be captured.

Options for pipechar:

	pipechar [-P < maximum Probes > ] [-S < slow timer> [ -hop #] destination

The default value for [-P] is 32, for [-S] is 234000us, and 1 for [-hop].
In any situation that needs to slow down the probing process to get more static available bandwidth, please try to use "-S 567000" or even large delay "-S 1234567". Large -P option may increase the probing period if network traffic fluctuates. So, when try to increase the probe times, you may use large delay time -S. This will take longer time to run, reduce network fluctuation effect, and the result will be more static and solid. Specially, if you need to know only a certain segment character, you may jump to that hop directly by using "-hop hop_number [-max hop_number+n]". This will send data to test that several hops only instead of probing every hops.
pipechar output analysis for busy and problematic network will be available after we can recollect these data.

Section 2: netest

In this section, the netest is used to analysis on a problematic router that has been detected by pipechar. As we know the netest is a SRP tool. That is, it may have more options than a SO tool. To reduce the pressure on how to use it, we simplify the usage by saying: "to use UDP only for network analysis, and do not use other netest options." netest is a network traffic simulator (wave generator) and analyzer. Here, we use it solely for network analysis. Usage: receiver% netest -u -P [-N] sender% netest -u -P [-N] -t receiver

The [-N] option should not be used at the first time unless the network is in a very bad shape. This option tells netest do not do the clock synchronization. Without synchronized system clocks, the one way throughput report may result wrong. Also, the netest clock synchronization mechanism tests the network reliability. If netest clock synchronization process fails, the network is known very problematic, and then [-N] option must be used on both receiver and sender sides to turn off the clock synch. Here is a real example for netest receiver output . By default, netest transfers 64 packets in a burst. The packet size is automatically determined by the operating system. Due to the clock synchronizing failure, we add "-N" option for the further testing. In this example, we see all packet (size=32739) are lost in the first three transmission. Since we saw the transmission changed from "2 000001 buf_seq#:" to "2 000003 buf_seq#:", we know that only large packet getting lost; otherwise, the receiver should not report anything and be sat there very quiet. Therefore, we restarted the sender with smaller packet:

sender% netest -u -P -N -l 1472 -t receiver

and all packets are travel through. Then increasing the packet with 28 more bytes (actually 1 byte is enough),

sender% netest -u -P -N -l 1500 -t receiver

and we saw six (6) packets arrived with interleaved (mis-ordered) synch packets (8-byte long), then another packet, and some corrupt packet that makes IRIX coredump {only IRIX send bad packet up, the other OS will toss the packet, and program will report lost packet by saying "x x 6 x x 9 ...", that is why we used a IRIX as the receiver for this example}. The next example under this one uses a Sun workstation as a receiver that reports many lost packet and mis-ordered packet (strange packet without sync header 1478 x x x 51).

At this point, we know that router randomly drops the fragmented packet. Why? later we found out there was a filter trying to filter some packet based on transport layer information. However, the fragmented packets have no such information in there, so they are the sacrificer.
These testing takes only a few minutes. The problematic router can be found in real-time. As long as we are capable to access a remote host, we can quickly understand what is the pattern of the router dropping the packet. Contact the router administrator and have them to fix the problem.

2. Advance Usage:

Advanced options are designed for network gurn to do some specific diagnostics. These options may take either longer time to run, or have specially understanding to use. Because there are many advanced options, and they are not very easy to be addressed without specific situation and examples, so I put a couple of examples of advanced options here for clearing the difference mentioned at the top of this tutorial. If you have some specific requests on how these tools can help you to solve certain network problems, and you are willing to provide the information of the problematic network for diagnosing, please send all information (in good trouble-shooting request form) to network diagnosis team, and we will try to analyze the problem. If you are willing to let us use diagnosed result in this tutorial, please state so. Here is a couple of examples for advanced options:
	pipechar -PxCHAR destination
	pipechar -hsw destination

are all considered as advanced use of these tools. The option "-PxCHAR" instructs pipechar to do modified pathchar/pathchar probing after its regular analysis. The option "-hsw" tells pipechar to find hidden switches in the gateway area. These usages are advanced usages. Advanced usage means to use more options in these tools. These options are designed for professional network analysis and monitoring. More and detailed information will be available later on.


iss-p10.lbl.gov: pipechar -PxCHAR ux5
0: localhost [3 hops]
1: ir100gw-r2.lbl.gov                  (131.243.2.1)      7.40  11.02  15.70ms
2: ir30gw.lbl.gov                      (131.243.128.30)   7.34   9.74  14.74ms
3: ux5.lbl.gov                         (128.3.7.103)      7.38  14.18  23.09ms

PipeCharacter statistics: 1.05% reliable
From localhost:
|       9.732 Mbps 10BT (10.4437 Mbps)

1: ir100gw-r2.lbl.gov              (131.243.2.1 )
|
|       10.108 Mbps              <0.7434% BW used>
2: ir30gw.lbl.gov                  (131.243.128.30)
|       9.749 Mbps 10BT (10.1105 Mbps)

3: ux5.lbl.gov                     (128.3.7.103 )
Modified PATHCHAR Method:
0: localhost
% error range= 3.649635 Mbps sending probes...
   56bytes: 0.452(0.466)     417bytes: 0.854(0.854)     778bytes: 1.233(1.234)
   1139bytes: 1.620(1.628)    1500bytes: 1.986(1.992)
preslope= 0.000000;     slope= 0.001062
|       7.533Mbps
% error range= 3.676971 Mbps sending probes...
   56bytes: 0.829(0.942)     417bytes: 1.236(1.276)     778bytes: 1.703(1.715)
   1139bytes: 2.086(2.099)    1500bytes: 2.453(2.473)
preslope= 0.001062;     slope= 0.001135
|       109.393Mbps
% error range= 3.656060 Mbps sending probes...
   56bytes: 0.677(0.681)     417bytes: 1.141(1.161)     778bytes: 1.691(1.697)
   1139bytes: 2.089(2.179)    1500bytes: 2.620(2.637)
preslope= 0.001135;     slope= 0.001339
|       39.239Mbps

Please send any suggestion on how to improve this tutorial to author

Back to the parent page [an error occurred while processing this directive]
Updated Wednesday, 18-Oct-2000 14:22:55 PDT