lab2.md

DSCI 532 Lab 2 - Multiple views with ggplot2 and interactive multiple linked views (coupled events) with shiny and ggmaps

General Lab Instructions

rubric={mechanics:2,writing:4,reasoning:4}

  • Ensure your lab meets these basic lab requirements: https://github.ubc.ca/ubc-mds-2016/general/blob/master/general_lab_instructions.md
  • Ensure that the low-level flow of your writing is appropriate, with full sentences and using proper English, spelling, and grammar.
  • Ensure that the high-level structure of your solution is clear and thorough for each answer, with a well-thought-out organization of concepts that avoids repetition.
  • This assignment is to be completed in R, submitting both a .Rmd markdown file you create in RStudio (you can add your answers directly to this one) along with a rendered .md AND .pdf file.
  • Ensure that your markdown submission as a whole is easy to read: use appropriate formatting that clearly distinguishes between our questions and your answer, between different sub-parts of your answer, and with visible differences between code and English.

Exercise 1: Displaying trends: Animation vs traces vs faceting.

rubric={reasoning:15,code:5,viz:10}

Animations are often used to convey trends over time. This exercise compare the effectiveness of animations to alternatives.

Load the gapminder dataset (you can do this by loading the gapminder library; the dataset will then appear as a tibble called gapminder). Make the following three plots using ggplot2 -- each with GDP per capita on the x-axis, and life expectancy on the y-axis.

  1. An animation of a single scatterplot:
    • Use colour to indicate the continent.
    • Animate this plot over time (year), using the gganimate package.
  2. A single plot with traces:
    • For each country, use traces (with ggplot2::geom_path) to indicate progression over time. These should be overlaid on one plot.
    • Colour each country by continent.
  3. A trellis plot with traces:
    • Use facetting to separate the countries into their own panels (with ggplot2::facet_wrap).
    • Use traces (with ggplot2::geom_path) to indicate progression over time.
    • Colour each country by continent.

Discuss the pros and cons of the three approaches.

Exercise 2: Linked views: spatial and tabular

The seal datasets contain positional (GPS) and movement data of a seal during her 3-day trip to sea. The seal_gps.csv data contains the GPS coordinates (recorded whenever the seal surfaces), and the seal.csv contains motion and environmental data, collected at a set period during the journey.

In this exercise, you'll link a plot of spatial data with tabular data. Your objective is to discover different behaviours that the seal exhibits on her trip. For example, some behaviours could be foraging, in transit, or sleeping, and these can be associated with certain environmental variables.

2(a): Linked Highlighting

rubric={reasoning:5,code:5,viz:20}

Use ggplot2 and ggmap to make a shiny app that contains the following two plots:

  1. A plot of a bird's-eye view of the seal's path, with a satellite map view underneath. Just use interpolation to re-construct the path from the GPS coordinates in the seal_gps.csv data (i.e., connect the dots).
  2. A scatterplot that shows depth vs acceleration. You can calculate the acceleration as the length of the acceleration vector, sqrt(aX^2 + aY^2 + aZ^2)). These data can be found in the seal.csv dataset.

Using shiny, link plot (2) to plot (1), so that when the user highlights (does a "mouse-over") a path segment in plot (1), the points in plot (2) occuring during that path segment are highlighted.

Can you discover something about these data based on different legs of the seal's journey? For example, highlight a portion of the path that is highly turbulent; highlight a portion of the path that is on a straight-away. Could you have made this discovery without the linked views?

In 1-2 paragraphs, discuss the impact (or lack thereof) of adding reactivity to this visualization.

2(b): Detail on Demand

rubric={reasoning:5,code:5,viz:20}

In the previous exercise, the "slave" plot (2) only highlighted certain points in a scatterplot. In this exercise, we'll consider a different plot (2) that gets refreshed each time a segment in plot (1) is highlighted.

Use ggplot2 and ggmap to make a new shiny app that contains the following two plots:

  1. Plot (1) from the previous exercise.
  2. A histogram of light level, which can be found in the seal.csv dataset.

Using shiny, link plot (2) to plot (1), so that when the user highlights (does a "mouse-over") a path segment in plot (1), a new histogram appears in place of plot (2) that is the histogram of the light levels subsetted to occur during the selected path.

Can you discover something about these data based on different legs of the seal's journey? For example, highlight a portion of the path that is highly turbulent; highlight a portion of the path that is on a straight-away. Could you have made this discovery without the linked views?

In 1-2 paragraphs, discuss the impact (or lack thereof) of adding reactivity to this visualization.

2(c): (Optional) Scalability

rubric={code:3,viz:3}

The seal data are actually available with recordings at 1Hz, resulting in about half a million observations. The data is called seal1Hz.RData, and is available in the course repository under "lab_data". It can be loaded into R with the load function, which adds a variable seal1Hz to your environment, which is a data frame of the 1Hz data.

Use the seal1Hz dataset in place of the seal dataset for one of the shiny apps in 2(a) or 2(b), such that the interactivity is not compromised -- that is, there should not be much of a lag in response to a new mouse-over event.