## Lab 2 - Plotting using Python's `matplotlib` and `pandas` ### General lab instructions  - Ensure your lab meets these basic lab requirements: [https://github.ubc.ca/ubc-mds-2016/general/blob/master/general_lab_instructions.md](https://github.ubc.ca/ubc-mds-2016/general/blob/master/general_lab_instructions.md) - For this lab, you must submit both the source code (`.py`, `.ipynb`, `.R` etc) **AND** a final report in `.md` format that contains the visualizations and your reflection/discussion on them. ### Overview In this lab you will explore how to visualize tabular data using Python's `matplotlib` and `pandas` packages. You will recreate many of the plots you have made in previous courses using the [titanic dataset](https://github.com/UBC-MDS/DSCI_551_eda-dsci/blob/master/data/titanic.csv) and Gapminder datasets. You may have to wrangle some data to make some of the plots, and you can do this in R with `dplyr` & `purrr`, or in Python with `pandas`. The choice is up to you. If you do it in R, you will have to save wrangled data as a `feather` or `csv` file to pass it to Python. Alternatively you can call R from python using the [`rpy2` package](http://rpy.sourceforge.net/rpy2/doc-2.1/html/introduction.html). ##### Some useful documentation and tutorials to help you with this lab: - [Beginner’s Guide to `matplotlib`](http://matplotlib.org/users/beginner.html) - [Visualization in `pandas`](http://pandas.pydata.org/pandas-docs/version/0.18.1/visualization.html) ### Exercise 1 - Plotting two quantitative variables (value attributes) as a scatterplot in Python      - Plot the two quantitative variables (attribute values) from the [titanic dataset](https://github.ubc.ca/ubc-mds-2016/datasets/raw/master/data/titanic.csv) (age and fare) and create a scatterplot to visualize the relationship between the two variables. - Focus on making effective plots, keeping in mind what you have learned about in lecture. - Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute (if categorical or ordered attributes are present). ### Exercise 2 (optional) - Plotting two quantitative variables (value attributes) and one categorical variable (key attribute) as a scatterplot in Python  - Plot the two quantitative variables (value attributes) from the [titanic dataset](https://github.ubc.ca/ubc-mds-2016/datasets/raw/master/data/titanic.csv) (age and fare) and create a scatterplot where the points are coloured based upon a categorical variable (key attribute) you are interested in. - Focus on making effective plots, keeping in mind what you have learned about in lecture. - Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute. ### Exercise 3 - Plotting two quantitative variables (value attributes) as a line-chart in Python      - Choose **one** country of interest from one of the these [10 Gapminder datasets](https://github.ubc.ca/ubc-mds-2016/datasets/tree/master/data/gapminder_countries) and plot population versus year as a line-chart. - Focus on making effective plots, keeping in mind what you have learned about in lecture. - Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute. ### Exercise 4 - Plotting a single variable in Python as a histogram      - Choose one variable from the [titanic dataset](https://github.ubc.ca/ubc-mds-2016/datasets/raw/master/data/titanic.csv) you are interested in and plot the data as a histogram. - Focus on making effective plots, keeping in mind what you have learned about in lecture. - Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute. ### Exercise 5 - Plotting a single one quantitative variable (value) stratified by one categorical variable (key) in Python      - Choose a quantitative variable (value attributes) from the [titanic dataset](https://github.ubc.ca/ubc-mds-2016/datasets/raw/master/data/titanic.csv) (either fare or age) and one categorical variable (key attribute) you are interested in. Create boxplots to visualize the relationship between the two variables. - Focus on making effective plots, keeping in mind what you have learned about in lecture. - Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute. ### Exercise 6 - Plotting two categorical variables (keys) in Python      - Pick two categorical variables/attributes from the [titanic dataset](https://github.ubc.ca/ubc-mds-2016/datasets/raw/master/data/titanic.csv) and create a stacked bar-chart to visualize the relationship between the two variables/attributes. - Focus on making effective plots, keeping in mind what you have learned about in lecture. - Reflect and discuss how the data is represented visually and why or why not you think it is effective. Explicitly state and comment on the marks and channels used in your visual encoding, the tasks that are well supported by it, and the scale of the data in terms of number of observations, and the number of levels of each categorical or ordered attribute.