lab2.md

Lab 2 - Plotting using Python's matplotlib and pandas

General lab instructions

3 [Mechanics]

Overview

In this lab you will explore how to visualize tabular data using Python's matplotlib and pandas packages. You will recreate many of the plots you have made in previous courses using the titanic dataset and Gapminder datasets.

You may have to wrangle some data to make some of the plots, and you can do this in R with dplyr & purrr, or in Python with pandas. The choice is up to you. If you do it in R, you will have to save wrangled data as a feather or csv file to pass it to Python. Alternatively you can call R from python using the rpy2 package.

Some useful documentation and tutorials to help you with this lab:

Exercise 1 - Plotting two quantitative variables (value attributes) as a scatterplot in Python

1 [Code]   2 [Reasoning]   2 [Visualizations]
  • Plot the two quantitative variables (attribute values) from the titanic dataset (age and fare) and create a scatterplot to visualize the relationship between the two variables.
  • Focus on making effective plots, keeping in mind what you have learned about in lecture.
  • Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute (if categorical or ordered attributes are present).

Exercise 2 (optional) - Plotting two quantitative variables (value attributes) and one categorical variable (key attribute) as a scatterplot in Python

1 [Reasoning]
  • Plot the two quantitative variables (value attributes) from the titanic dataset (age and fare) and create a scatterplot where the points are coloured based upon a categorical variable (key attribute) you are interested in.
  • Focus on making effective plots, keeping in mind what you have learned about in lecture.
  • Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute.

Exercise 3 - Plotting two quantitative variables (value attributes) as a line-chart in Python

1 [Code]   2 [Reasoning]   2 [Visualizations]
  • Choose one country of interest from one of the these 10 Gapminder datasets and plot population versus year as a line-chart.
  • Focus on making effective plots, keeping in mind what you have learned about in lecture.
  • Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute.

Exercise 4 - Plotting a single variable in Python as a histogram

1 [Code]   2 [Reasoning]   2 [Visualizations]
  • Choose one variable from the titanic dataset you are interested in and plot the data as a histogram.
  • Focus on making effective plots, keeping in mind what you have learned about in lecture.
  • Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute.

Exercise 5 - Plotting a single one quantitative variable (value) stratified by one categorical variable (key) in Python

1 [Code]   2 [Reasoning]   2 [Visualizations]
  • Choose a quantitative variable (value attributes) from the titanic dataset (either fare or age) and one categorical variable (key attribute) you are interested in. Create boxplots to visualize the relationship between the two variables.
  • Focus on making effective plots, keeping in mind what you have learned about in lecture.
  • Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute.

Exercise 6 - Plotting two categorical variables (keys) in Python

1 [Code]   2 [Reasoning]   2 [Visualizations]
  • Pick two categorical variables/attributes from the titanic dataset and create a stacked bar-chart to visualize the relationship between the two variables/attributes.
  • Focus on making effective plots, keeping in mind what you have learned about in lecture.
  • Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute.