Introduction - Network Analysis Visualization (NAV)
The objective of this project is to develop a tool for visualizing network performance and connectivity data. Networking involves many layers of protocols, from the physical layer through to the application; our objective is to focus on the most commonly used protocols on the network and transport layer, with particular attention to TCP/IP and the services that run on this protocol suite. Networking has been used extensively in the enterprise for decades and today most organizations have some level of network connectivity both internal and external. But in the fast few years increased availability and falling prices have enabled a large number of home users to gain high speed access to internet resources; as many as 64% of Canadian home internet users had broadband as of April 20031. Based on the pervasiveness of networking, we envision two primary tasks to which our application will be suited: enterprise traffic analysis and home network traffic monitoring. Enterprise users may be interested in monitoring the data flow between internal subnets as well as with the external network. In addition they may be interested in detecting known attacks as well as monitoring which services are using the bandwidth. A home user may wish to monitor their network to determine the amount of data being downloaded and uploaded, as well as what services are being used. Sophisticated home users could also be interested in trying to detect unauthorized access to their system.
This project is being pursued by Meghan Allen and Peter McLachlan.
I do not have very much experience with networking. I have taken one networking course, CPSC 417 at UBC, so I am familiar with the basic terms and concepts.
I have experience doing network programming of TCP sockets, as well as experience using 'sniffer' tools such as tcpdump, ethereal and Sniffer Pro. I also have a background in network switching and routing, and have run through the core Cisco CCNA curriculum which includes bit level analysis of packets.
Solution (high level)
We have determined that our solution should focus on the replay of log files; the ability to capture live traffic is a potential extension to the project that could be considered at a later time. We intend to provide two primary 'views' into the data flow, a service centric and an IP centric view. The IP centered model involves the creation of two IP 'walls' of addresses with one side representing local addresses and the other remote hosts. The local address range is specified by the user of the application. Lines are drawn to connect the local and remote hosts, with further information such as traffic type encoded as line color and line width indicating traffic volume. After the 'basic' implementation is complete we intend to allow users to 'aggregate' ranges of IP addresses to reduce the number of line crossings. Animation patterns along the lines could be implemented to signal rate of traffic flow as an alternative to line width. Dynamic queries implemented with sliders will help users sort through the data, allowing them to further reduce the number of crossings on the wall. Options for these queries include connection status, service type, IP address ranges, TCP flags etc. The duration a connecting line lingers after the port has been 'closed' is one aspect of this design that will require further consideration; the best solution may be to make this user configurable. For intrusion detection the system can be evolved to include capabilities such as highlighting connections containing known attack payloads. Examples of this include code red and nimda. For the purposes of this project any implementation of this hostile payload detection would be a proof of concept only; internet attacks evolve daily and to be effective the tool would need to include the ability to maintain an automatically updating database of known attacks which would be beyond the scope of this project. The user may want to define lists of hosts that do not normally accept connections from the internet, or IP's in their local range for which no hosts exist and highlight traffic that attempts connections to these hosts which would have a high probability of being malicious or 'scanning' traffic. These connections could be highlighted, allowing a simplified view of the source of these probing attempts to allow the administrator to impose firewall rules that can block them. Overall the focus of IP centric wall view is to minimize edge crossings and provide a scalable visualization.
The service based view is based on a trellis of 2D scatterplots or line graphs. Each graph represents a service, x-axis is time and y-axis is bytes/s of traffic. The IP and service based views can be brushed to highlight the corresponding information in either view. For example, if the user click and drags the HTTP line graph onto the IP centric view, the filter on the IP centric view is set to exclude all non-web related connections which vanish until the filter is reset or a different brushing operation takes place. Similarly, dragging an IP from the wall view onto a service graph ties that graph to showing traffic from that specific IP. One scalability enhancement that can be made to this view is to allow portions of the time axis to 'stretch' or 'compress' through user interaction allowing them to gain details on demand. If this proves too problematic the user could be given the ability to 'pan' through the time axis by scrolling left and right on the graph.
Scenario of Use
We have developed two scenarios of use for our visualization design.
1) Network administrator
Sally, a network administrator, is browsing the IP centric view of our NAV tool. She notices that there is an unusually large amount of traffic coming from a particular range of IP addresses. Because she was able to observe the unusual amount of traffic, she is able to further investigate this anomaly and determine that someone is attempting a denial of service attack.
2) Home user
John wants to determine whether the ftp server that he hosts at home has been using an excessive amount of bandwidth; his ISP has recently implemented bandwidth caps on their users. He looks at the ftp service graph, and notices that the usage is fairly consistent as approximately one gig per month, but that there are a number of spikes very recently that push his usage for the current month close to two gigabytes. John manipulates the time scale of the ftp service graph so that the spike is more clearly in view and is able to see that his ftp server was transferring X b/s 5 days ago. He remembers that his friend got a very large set of digital pictures from his ftp server on that day and realizes that the large spike in usage must be due to his friend downloading pictures. John is no longer worried about the bandwidth his ftp server uses.
We are planning on using jpcap, a network packet capture library, to extract data from log files and potentially capture real time data. We are planning on using the InfoVis toolkit2 for the trellis display of services, and Java 2D for the IP centric view. The application will use the Standard Widget Toolkit (SWT), a native library widget toolkit, for layout. We will use an Eclipse development environment with Subversion to manage source control.
November 3, 2004 - Conduct user interview
Our user interview helped confirm many of our intuitions about the data that our tool should visualize; it was established that per-service traffic flows were of the highest importance and that the user was less concerned about the actual packet internals. Based on this interview we developed several new ideas for our design including using 'expanding' bar graphs next to each IP in the wall view to signify the amount of traffic sent to that host.
November 5, 2004 - Proposal completed
November 7, 2004 - Development environment operational
The complete development environment includes two installations of the Eclipse IDE, the SWT native library, the jpcap native library, relevant documentation files as well as the Subversion source control client on the development hosts. The Subversion server will also be activated and initial project code will be uploaded.
November 10, 2004 - Class design complete
The class layout and high level interfaces will be completed by this date.
November 26, 2004 - Simple working system
By November 26, we will have a simple working system that we can build on. Our system will read log files and display the data in a split window, using one side for the IP centric view and the other for the service centric view. We will concentrate on getting a working system, and only deal with scalability and other issues that we encounter once our basic system is working.
December 5, 2004 - Brushing and Aggregation implemented
By December 5, we will have brushing and aggregation implemented. Brushing either view will set filters on the visible data in the other view. For instance, if you brushed over http in the service centric view, you would only see the http traffic in the IP centric view.
We envision that the IP centric view will quickly become prohibitively dense with crossings, so we are planning on implementing some aggregation algorithm. We have not yet determined how we will aggregate the data, but very likely it will be done by 'collapsing' IP ranges.
December 10, 2004 - Implementation phase completed
There are a few implementation options that we will consider during this phase. Our simple system will only be able to show a limited amount of data on the time axis in our service centric view. We will consider implementing a logarithmic based scaling approach, or a rubber-band type stretching of the axis to allow users to view data from further in the past.
We believe our IP centric view will benefit from allowing users to specify which type of information that they want to emphasize. We will consider allowing users to emphasize based on IP addresses ranges, amount of traffic, or other factors which we discover during our initial implementation phase.
Since we are concentrating on replaying log files, we think it will be useful to show bars above each service view, representing the relative amount of traffic used by that service in the log file. We will consider implementing this feature in this phase.
December 14, 2004 - Final presentation and final report completed
April Bandwidth Report: Canadian Broadband Continues Record Growth. URLwire, http://www.urlwire.com/news/042503.html.
InfoVis Toolkit SourceForge page http://ivtk.sourceforge.net