dmitry nekrasovski : courses : cpsc533c assignment 1

Example of a good visualization

Source: Global Assessment of Organic Contaminants in Farmed Salmon, Ronald A. Hites, Jeffery A. Foran, David O. Carpenter, M. Coreen Hamilton, Barbara A. Knuth, and Steven J. Schwager, Science 2004 303: 226-229.

This figure is from an article which compares the concentration of pollutants in farmed and wild salmon. The figure shows concentrations of 14 contaminants found in farm-raised (red bars) and wild (green bars) salmon. The horizontal axis represents the concentration of the chemical in nanograms per gram of wet weight. The vertical lines represent the 10th, 50th, and 90th percentiles, and the boxes represent the 25th to 75th percentiles.

I believe the visualization in this figure is good because it effectively presents a large quantity of information. The use of box and whiskers plots allows several pieces of statistical information, such as the range and median, to be conveyed for each chemical/fish type combination. The logarithmic scale allows for compact presentation of chemical quantities that are orders of magnitude different from each other. The names of chemicals are shortened so as not to clutter the graph. The two contrasting colours, red and green, help to clearly distinguish the results for farmed and wild salmon, and serve as a visual reminder of the article's conclusion that farmed salmon are significantly more contaminated. Overall, the graph is clear, requires a minimum of textual explanation, and is accessible to both lay readers and experts.

Example of a bad visualization

Source: Periodic Pulsing of Characteristic Microearthquakes on the San Andreas Fault, Robert M. Nadeau and Thomas V. McEvilly, Science 2004 303: 220-222.

This figure is from an article which describes a new finding of repetitive patterns of micro-earthquakes along California's San Andreas Fault (SAF). Part A shows the long-term tectonic plate slip rate going northwest along the SAF. The triangles represent creep meter data for the same period. Part B shows a profile of short-term rates given in percent difference from the long-term rates. The vertical black line represents the time of the magnitude 7.1 Loma Prieta (LP) earthquake, and the open black circles times and relative sizes of earthquakes of magnitude > 3.5. Part C shows the slip and seismicity data for the five segments of the SAF denoted by the same letters in part B.

I believe the visualization in this figure is ineffective because it presents too much information in a convoluted way. Each of the three parts of the figure has its own horizontal and vertical scale, making it difficult to correlate them. The black triangles in part A lack contrast with the rest of the graph, as do the circles and and the letters denoting fault sections in part B. The use of the whole colour spectrum in part B is visually overwhelming, and the graph at the top of this part is too small to interpret meaningfully and could be easily overlooked. Finally, the graph requires a dense 300-word explanation, making it inaccessible to all but the most expert or dedicated reader.