DOE/MICS Mid-Year Project Report
Date: December 15, 2003
Project Title: Net100
Project Type: Base
PI: Brian Tierney
Institution: LBNL
1. Executive Summary
The Net100 Collaboration (PSC, NCAR, UT, LBNL, and ORNL) is developing a model for
network-aware operating systems using Web100 as the means for incorporating network
information and its analysis into host operating systems to improve performance. To
investigate how effective network-aware operating systems can be, we are using a
three-phase approach. First, we will use the network-aware, Web100-based operating
system to tune a simple, bulk-transport application and demonstrate its use over high
performance network links. We will then extend this model to support more advanced
and complex applications, moving from point-to-point optimization to optimizations for
fully distributed environments. Finally, as proof that a network-aware operating
system can tune and optimize performance on behalf of applications, we will also develop
application-internal tools (based on NetLogger) to monitor the efficiency of application
support, and provide an external monitoring methodology to gauge the impact this system
has on the rest of the network.
Significant Net100 accomplishments to date (all sites)
- systematic testing of WAD using NTAF
- WAD tuning of parallel and single stream GridFTP
- demonstrate continuous flow tuning by WAD
- added Sally Floyd high-speed extensions(HS-TCP) to WAD and Linux kernel
- PSC submitted TCP MIB to IETF (Web100)
- C WAD daemon (ORNL) and Python WAD (LBNL)
- WAD daemons developed and demonstrated 13x speedup (LBNL, ORNL)
- PSC developed new tool (pathprobe) based on Web100
- network probes and database deployed (LBNL)
- combined kernel mods for Web100 and LANL's DRS (ORNL)
- WAD tuned TCP AIMD parameters, "virtual MSS", 6x faster recovery from TCP loss (PSC, ORNL)
- event notification extensions to Web100 (ORNL, PSC)
- NetLogger/Web100 extensions to iperf (LBNL) and ttcp (ORNL)
- added Tom Kelly's Scalable TCP and doing testing and comparison with HS-TCP (ORNL/LBNL)
Publications:
new in the past 6 months:
- Wim Sjouw, Antony Antony, Johan Blom, Cees de Laat and Jason Lee, "TCP behaviour on transatlantic Lambda's", accepted for publication in LNCS, Springer Verlag (2003).
- Antony Antony, Johan Blom, Cees de Laat, Jason Lee, Exploring practical limitations of TCP over Trans Atlantic networks, to be submitted to High-Speed Networks and Services for Data-Intensive Grids, Special issue of Future Generation Computer Systems (FGCS)
previous:
- T. Dunigan, M. Mathis and B. Tierney, A TCP Tuning Daemon, Proceeding of IEEE Supercomputing 2002 Conference, Nov. 2002, LBNL-51022.
- B. Tierney, Using NetLogger and Web100 for TCP Analysis , Invited Paper, First International Workshop on Protocols for Fast Long-Distance Networks , LBNL-51776.
- A. Antony, J. Blom, C. de Laat, Jason Lee, W. Sjouw, Microscopic Examination of TCP flows over transatlantic Links, iGrid2002 special issue, Future Generation Computer Systems, volume 19 issue 6.
- Brian L. Tierney, Jason R. Lee, Dan Gunter, Martin Stoufer, Improving Distributed Application Performance Using TCP Instrumentation, LBNL report.
2. Recent LBNL Accomplishments: (June 2003 - December 2003)
The main task of the past 6 months has been to organize a workshop called "Protocols for Long Distance Networks" (PFLDnet) with Les Cottrell, SLAC. I was also on the program committee for the Bandwidth Estimation Workshop. We also collected a great deal of web100 data at the Supercomputing Conference in Phoenix, and will analyze this soon. We worked with the GridFTP developers to discuss ways to improve network performance. We also worked with PSC to do testing with 9K MTUs, and began testing 10 GigE. We also attended the FAST Project Review at Caltech, and an Internet2 Network Monitoring Workshop.
NTAF progress:
- continue to design and test the GGF Network Measurement publication schemas, and provide feedback to GGF working group.
- continued Integration of SCNM results with NTAF, and analyze the results
- continue to evaluate network analysis tools for possible inclusion in the Net100 NTAF
- some minor bug fixes
Progress on the web interface to netarchd:
- continue to enhance web interfaces with new statistical graphics. This will help with the exploration and discovery of more subtle correlations between NTAF measurements.
- some bug fixes
- explored the conversion of the interface to the Woven framework
3. In the coming 6 months LBNL plans to:
The NTAF monitoring infrastructure is now completely in place, with new data being archived daily. The main task for the next few months will be to analyze the data in the archive, and to look for correlation between TCP settings and throughput. We will also continue to monitor the infrastructure for robustness, and continue adding more tools to NTAF. We are also evaluating the network monitoring publication schema being defined by the GGF Network Measurement working group.
Specific tasks include:
- continue to evolve NTAF to act as a OGSA Web Service for use in Grid network monitoring
- finish the integration of SCNM results with NTAF, and analyze the results
- continue to test the GGF Network Measurement publication schemas, and provide feedback to GGF.
- continue to evaluate network analysis tools for possible inclusion in the Net100 NTAF
- continue to 10 GigE testing
- continue monitoring, probing, and analyzing ESnet links
- continue to enhance web interfaces with new statistical graphics. This will help with the exploration and discovery of more subtle correlations between NTAF measurements.
- continue to develop and apply more involved data mining techniques to analyze the results generated by the NTAF. These will hopefully deliver deeper understandings of the various tests and the meaning of synthesized metrics and multi metric correlations.
- continue to test and gather performance data from all Neb100 sites (ORNL, NCSA, NCAR, PSC, NERSC)
4. Research Interactions
We have ongoing interactions with:
- Cees de Laat, University of Amsterdam
- Sally Floyd, modified slow-start, modified AIMD
- Linda Winkler, Europe-US OC48 WAD testing
- Thomas Hacker, parallel TCP flows
- Wu Feng and his Dynamic Right Sizing work and his TCP Vegas Work
- LBL self-configuring network monitoring project (tcpdump server)
- KC Claffey and Les Cottrell on INCITE and pingER data collection
- C. Dovrolis pathrate/pathload work
- R. Wolski and NWS
- NCS/pipechar project (Goujun Jin)
- various data grid projects (SciDAC)
- Probe/HPSS projects at NERSC/ORNL
- GGF Network Measurements working group
- Internet2 end-to-end projects (Surveyor, NIMI, etc.)
5. Remarks
Detailed information on progress for the LBNL portion of Net100 is maintained
at
http://www-didc.lbl.gov/net100/
Detailed status is at:
http://www-didc.lbl.gov/~jason/net100/
The full project web page is
http://www.net100.org