See also the InfoVis04
Workshop on InfoVis Software Infrastructures page
Vis Contest 2008: multifield 3D scalar data, galaxy formation
- IEEE VisWeek Contests are great resources because they have not only
data, but also tasks and examples of how others have tackled
the problem. See the HCIL Benchmark Repository for a complete list.
- VAST Contests
- VAST Challenge 2011: geospatial and microblogging, cybersecurity, text
- VAST Challenge 2010: text, medical records, gene sequences.
- VAST Challenge 2009:
internet traffic, social network with geographic component, video.
- VAST Challenge 2008:
Grand challenge: phone records for social network analysis, geotemporal records, wikipedia edit data/history (unstructured text analysis), location tracking (evacuation modelling).
- VAST Contest
2007: Blue Iguanodon:
News stories, blog entries, background info, multimedia materials.
- VAST Contest
News stories, background info, multimedia materials.
- InfoVis Contests
- Vis (scivis) Contests
- Health 2.0 Developer Challenge
- visualizing.org challenge list
- WikiVis 2011: Wikipedia Data Visualization Challenge
- Mozilla Labs Open Data Visualization Competition Fall 2010: How do people use Firefox?
- A Day In The Life of a Browser - Version 2
a variety of general browsing data, such as: startup/shutdown events, session restore information, memory usage statistics, profile age, history size, and more. Also user demographic data, e.g. gender, age, self-reported technical level, etc.
- Firefox 4 Beta Interface - Version 2
user interactions with the main Firefox UI, from the Back and Forward buttons, to smaller controls like the Web Feed icon. Demographic data as above.
- also, dozens of other test case datasets at Mozilla Labs Test Pilot
Other Dataset Lists
Networking project ideas
- Tor Anonymity Network:
See me if you're interested, I'll put you in contact with the folks at Tor. In their own words:
The short version is that we're trying to visualize the changes in
usage of the Tor anonymity network. We've been collecting a
variety of (not terribly well documented yet) data sets.
We use them to generate graphs like total
network capacity and per-country usage, including censorship
2 (red dots are 'statistical drop' whereas blue dots are
The challenge is that we have people who want to see pretty
visualizations of exactly who is benefiting from Tor at that
moment. Alas, since Tor is an anonymity system, we intentionally
only collect aggregated data -- so the idea that they keep
suggesting of "just put a dot on the map for each IP address
that's using Tor right now" is not something we can (or should)
We could imagine an alternative approach of doing a little video of "how
growth has changed in that country over the past month", since while they
ask for "right now" I bet that's not really what they most want to see. Or
a map where the size of the country is based on how much Tor use it's seen
in some time period. But I bet there are other better ideas out there.
- Visual Tcpdump:
Tcpdump is a powerful tool that shows all network traffic on a link,
but it can be quite hard to understand what's going on when confronted with
the raw tcpdump output. "Visual tcpdump" would ideally run off
either log file of a past tcpdump session or in realtime with live tcpdump
connection. There are several tasks one might target from this dataset. First,
visually characterizing traffic patterns - for example, showing the
distribution of session lengths or packet types. Second, highlighting
dangerous packets that could occur in a stream - for example, passwords sent
in plain text. Third, characterizing protocols - for example, showing the TCP
window size changes over the course of a session. Some previous knowledge of
networking will be helpful for this project.
- Intrusion Detection:
Noticing that a network is under attack is difficult because of the sheer
volume of benign traffic, and the number of attack methods. The two main tasks
are real-time detection that an attack is occurring, and forensic analysis of
a past attack. There is a publicly
available dataset of network traces with four different simulated attacks
plus a control baseline with no attacks. Previous knowledge of networking and
security issues will be helpful for this project.
One way to map "the Internet" is to consider the structure of the
backbone router interconnections. Bill Cheswick has been keeping archives
of the daily changes in the roughly 100,000 core reachable routers for over
three years. Even the static dataset from a single day is a difficult
challenge to show comprehensibly, and showing growth and changes over time is
an even more interesting problem. The H3
browser for large graphs is a potential resource. This project should be
feasible without previous knowledge of networking.
Other project ideas
- Graph sparsification New theory faculty member Nick
Harvey has some interesting and cool ideas on how to approach
graph sparsification in a very different direction from previous
work on multilevel graph coarsening. See me to discuss further.
The following books are on reserve in the CS reading room:
- Information Visualization: Perception for Design, Colin Ware
- The Visual Display of Quantitative Information, Edward R. Tufte,
Graphics Press 1983
- Envisioning Information, Edward R. Tufte, Graphics Press 1990
- Visual Explanations, Edward R. Tufte, Graphics Press 1997
- Readings in Information Visualization: Using Vision To Think;
Card, Mackinlay, and Shneiderman, eds; Morgan Kaufmann 1999.
- The Visualization Toolkit, 3rd edition; Schroeder, Martin and
Lorensen; Kitware Inc, 2004.
- Fell In Love With Data, Enrico Bertini
- Visual Business Intelligence, Stephen Few
- Eager Eyes, Robert Kosara
- infovis wiki, Vienna
- infovis.net, Juan C. Durstler
- Statistical Graphics, Martin Theus
- Functional Color, Maureen Stone
- Well-Formed Data, Moritz Stefaner
- Flowing Data, Nathan Yau
- information aesthetics, Andrew Vande Moere
- visual complexity, Manuel Lima
- Ask ET, Edward Tufte
- Information Wants to be Seen, TJ Jankun-Kelly
- Visuale, Enrico Bertini (no longer updated)
- Atlas of Cyberspaces, Martin Dodge (no longer updated)
Back to 533 Home
Last modified: Fri Jun 15 14:55:13 PDT 2012