## Lab 3 - Exploring colour and visualizing spatial data in R with `ggplot2`'s `geom_polygon` and `ggmap`
### General lab instructions
- Ensure your lab meets these basic lab requirements: [https://github.ubc.ca/ubc-mds-2016/general/blob/master/general_lab_instructions.md](https://github.ubc.ca/ubc-mds-2016/general/blob/master/general_lab_instructions.md)
- For this lab, you must submit both the source code (`.py`, `.ipynb`, `.R` etc) **AND** a final report in `.md` format that contains the visualizations and your reflection/discussion on them.
### Overview
In this lab you will explore how to visualize spatial data in R using `ggplot2`'s `geom_polygon` and `ggmap` packages. You will also explore the effective use of colour.
### Resources:
[`ggmap` cheatsheet](https://www.nceas.ucsb.edu/~frazier/RSpatialGuides/ggmap/ggmapCheatsheet.pdf)
### Exercise 1 - Create a chloropleth map of Canada to visualize statistics
- Follow [this example](https://github.ubc.ca/ubc-mds-2016/DSCI_531_viz-1_students/blob/master/scripts/spatial.R) on how to create a chloropleth map of Canadian provinces & territories to visualize population count data by province. Required data for this example live [here](https://github.ubc.ca/ubc-mds-2016/DSCI_531_viz-1_students/tree/master/data). The population data was sourced from [this table from Statistics Canada](http://www5.statcan.gc.ca/cansim/a26?lang=eng&id=510005).
#### 1a
- Choose a statistic you can download from the [datasets from the Canadian Open Government Portal](http://open.canada.ca/data/en/dataset) that is divided by one of the boundaries available from [Statistics Canada's shape files](http://www12.statcan.gc.ca/census-recensement/2011/geo/bound-limit/bound-limit-2011-eng.cfm) (use digital not cartographic).
- Example datasets could include:
- [Provincial Wait time for priority procedures in Canada](http://open.canada.ca/data/en/dataset/93e718fb-f9c7-44bb-89cd-09d9a6f0980d)
- [Physical activity during leisure time, by sex, provinces and territories](http://open.canada.ca/data/en/dataset/8c81f820-3116-46ed-8286-4d10f028c92b)
- [Sales of liquor, wine and beer, by kind of business and class of customer, Canada, provinces and territories](http://open.canada.ca/data/en/dataset/541a92e1-b33d-4b5c-8516-f1a0ea4c90ae)
- Download the shape file for the boundary you chose to use
- Create a chloropleth map similar to the example provided to visualize your statistic of interest.
- Experiment with the use of colour, and choose a pallete that you think best visualizes the data.
#### 1b
- Discuss the results of your visualization (what did you find out about the data by creating the visualization).
- Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, any choices you made to derive additional attributes beyond the input dataset, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute (if categorical or ordered attributes are present). Provide a rationale for your use of colour, with explicit discussion of its constituent visual channels of luminance, saturation, and hue; and of sequential, diverting, or cycling attribute characteristics.
### Exercise 2 - Correcting count data for population size
![alt text](imgs/heatmap.png "source: http://xkcd.com/1138/")
*source: http://xkcd.com/1138/*
#### 2a
- Using a similar strategy to what you did to solve Exercise 1, make a chloropleth map that illustrates the number of asthma cases per Canadian province/territory for the year of 2014. The asthma data is available [here](https://github.ubc.ca/ubc-mds-2016/DSCI_531_viz-1_students/raw/master/data/health50a-eng.csv), and was sourced from [here](http://open.canada.ca/data/en/dataset/a48c51c2-b304-4d79-81a3-b0fcbe42eb01). You can decide how you want to deal with, or ignore, the gender aspect of the data. *note - you will have to do some data wrangling to get this data into a useable shape for plotting.*
#### 2b
- Make another chloropleth map using this data, but this time standardize to the population size of that province (*e.g.*, provide cases of asthma per 1000, or whatever seems reasonable). You can use the 2014 provincial/territory population data from the [population data set used Exercise 1](https://github.ubc.ca/ubc-mds-2016/DSCI_531_viz-1_students/raw/master/data/canadian_pop_by_prov.csv) to do this.
#### 2c
- Discuss the results of the two visualizations. How are they similar? How do they differ? Which visualization do you think is more informative? Explain.
### Exercise 3 - Plotting points on a map using `ggmap`
#### 3a
- Download Canadian earthquake data from the [Earthquake Database](http://www.earthquakescanada.nrcan.gc.ca/stndon/NEDB-BNDS/bull-en.php) for a time period you are interested in (the database goes all the way back to 1960's). *note - you will have to do some data wrangling to get the data in good shape to plot*
- Get a map of Canada using `ggmap's` `get_map()` function, and overlay the locations of the earthquakes on the map using `ggplot's` `geom_point`.
- Colour-code the points based on the depth of the earthquake.
- Have the size of the point/bubble represent the magnitude of the earthquake.
- Don't forget to label the visualization so that all someone has to do is look at it to understand it.
#### 3b
- Discuss the results of your visualization (what did you find out about the data by creating the visualization).
- Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, any choices you made to derive additional attributes beyond the input dataset, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute (if categorical or ordered attributes are present). Provide a rationale for your use of colour, with explicit discussion of its constituent visual channels of luminance, saturation, and hue; and of sequential, diverting, or cycling attribute characteristics.
### Exercise 4 - Contour heat maps with ggmap
#### 4a
- Download the `.csv` file for the [Vancouver Police Department Crime dataset](http://data.vancouver.ca/datacatalogue/crime-data.htm) (or find another dataset with similar data) and create a contour heat map (also known as a contour plot, a filled isocontour plot, or a level plot) with ggmap. Refer to Lesson # 3 in [this tutorial](http://data-analytics.net/cep/Schedule_files/geospatial.html) as a guide on how to do this.
#### 4b
- Discuss the results of your visualization (what did you find out about the data by creating the visualization).
- Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, any choices you made to derive additional attributes beyond the input dataset, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute (if categorical or ordered attributes are present). Provide a rationale for your use of colour, with explicit discussion of its constituent visual channels of luminance, saturation, and hue; and of sequential, diverting, or cycling attribute characteristics.