Figure 2 is a very simple visualization that attempts
to display the arsenic distribution in Bangladesh, which has reached
a crisis in the drinking water in that area. Figure 2 presents this
relatively simple information by using a 2D geographic map of Bangladesh,
and superimpose the arsenic distribution of the sub-regions on this
map using a modified pseudocolour encoding.
Why is it bad?
1. Use of Pseudocolouring
Pseudocolour encoding is a technique to represent continuously varying
map values using a sequence of colour. Using this technique to
encode a single variable is problematic in itself. The optimal use of
hue (or colour, as used in common English) is for discrete encoding,
since colour is not perceptually ordinal. For example, it is difficult
to comment on the question, "is blue higher or lower than green?".
For continuous (and therefore ordered data), encoding with luminance
or saturation is much preferred perceptually. For example, it is easier
to understand and learn that "red is higher than pink". A
related problem is reading values directly from the map. Since the difference
in colour is perceptually small among the different levels, it is difficult
to assign values to colour on the map based on the scale. For example,
the colour used for 200-300 ug/L is almost perceptually the same as
that for >300 ug/L level. One side note in using colour is the consideration
for users with colour-blindness. For users with red-green colour blindness,
the greens and the reds are indiscernible, which is particularly problematic
in pseudocolouring, as the "greens" are in mid-range of the
scale (in this case, from 20-50 ug/L), and the reds are in the high
range (from 50- >300 ug/L).
2. Deviations from Pseudocolouring
Figure 2 deviates from the pseudocolour encoding in two significant
ways--it creates an impression that the scale and hence the encoding
is discrete, and it deviates from the original encoding by using only
parts of the visible spectrum.
Figure 2: The distribution
Source: Scientific American,
Aug 2004, p. 89.
Since arsenic distribution is a continuous data, and pseudocolouring
is also continuous, it is unclear why a discontinuous scale is used
to represent the encoding. This choice of the scale creates a confusion:
"is the encoding itself discrete, or continuous?". This
cannot be easily deciphered by inspecting the diagram, since the
colours used in the scale is barely discernable from one another
(e.g., 200-300 and >300 ug/L seem the same). If it is discrete,
then the encoding unnecessarily reduce the resolution of the visualization,
and imposes arbitrary divisions.
The same difference of 2 ug/L is presented in dramatically different
ways: going from 21-23 ug/L the green colour remains unchanged;
going from 19-21 ug/L, the colour changes from light-blue to green;
going from 49-51 ug/L, the colour changes from green, to the much
more "alarming" pink. This colour coding implies a threshold
of concern of 50 ug/L, which may or may not be the intend of the
author of the diagram.
colours? The original colour sequence approximates the
visible spectrum, going from red (at the high end) to orange,
yellow, green, blue and finally purple. In Figure 2, the yellow
part of the spectrum is missing in the scale and from the
encoding, further emphasizing the 50 ug/L division perceptually,
which again, may not be intentional.
small notes on the scale:
(1) Should a concentration of 50 ug/L be green, pink, or
somewhere in between on this diagram?
(2) Does the Bay of Bengal contain arsenic at 10-20 ug/L, as
indicated by the light-blue colour of that area?