Week 2 - Project Milestone 1: Proposal

Over the next three weeks you will work toward creating an interactive visualization app using Shiny. You'll submit three milestones:

A proposal for your app (this week)
An initial attempt to implement your app based upon your proposal
Refining your app with input from your peers

You'll be submitting work for each milestone to your inividual github.ubc.ca repositories, like you usually do for lab assignments. We'll be referring to these repos as your private repos.

In addition, the shiny app that you'll be developing will be developed on a public github.com repository. It's up to you to create this repository! All app development should happen here -- we'll be considering your commit messages when grading milestone 3, so be sure to commit early and often.

For this week, you won't need to submit a link to your public repo.

This week's proposal stage will require you to establish a dataset that you will visualize, a simple usage scenario for your visualization and tasks that are entailed in that scenario, and finally a preliminary description and sketch of your app. Each of these requirements is described in detail below.

Tidy submission (5%)

rubric={mechanics:5}

When submitting your proposal as a (separate) markdown document, please include the following sections in this order:

Overview
Description of the data
Usage scenario & tasks
Description of app & initial sketch

Your proposal should be no more than 1,000 words (for context, this equates to a little over one page of single-spaced Times New Roman size 11 text). You will also submit a sketch of your app, which is separate from this limit.

Sections 1-3 (75%)

The first three sections of your proposal will be marked as whole, and you will be assessed on the quality and clarity of your writing, the feasability of what you propose, and your initial app description and sketch, using the following rubrics:

rubric={reasoning:45,writing:30}

Each of the proposal sections are described below and include an example of what is expected. You don't have to write your own proposal exactly the same as the examples; the examples just serve as a guide. When writing your proposal consider whether what you are proposing is realistic to implement in a three week time frame.

Section 1: Overview

A few sentences that describe what problem your visualization is tackling and how. Be brief and clear.

Example:

Missed medical appointments cost the healthcare system a lot of money and affects the quality of care. If we could understand what factors lead to missed appointments it may be possible to reduce their frequency. To address this challenge, I propose building a data visualization app that allows health care administrators to visually explore a dataset of missed appointments. My app will use show the distribution of factors contributing to appointment show/no show and allow users to explore different aspects of this data by filtering and re-ordering on different variables in order to compare factors that contribute to absence.

Section 2: Description of the data

You can visualize your own data, if you have an interesting dataset available, or you may pick from one the following datasets

If you are using your own data, make sure you clear it with a TA first, and state who approved it in your writeup.

In your report, briefly describe the dataset and the variables that you will visualize. Note, all data has to be publicly available since you are required to create a public repo.

Please note, if your dataset has a lot of variables and you plan to visualize them all, then provide a high level descriptor of the variable types, for example say the dataset contains demographic variables instead of describing every single variable. You may also want to consider visualizing a smaller set of variables given the short duration of this course.

Example:

I will be visualizing a dataset of approximately 300,000 missed patient appointments. Each appointment has 15 associated variables that describe the patient who made the appointment (PatientID, Gender, Age), the health status of the patient (Hypertension, Diabetes, Alcohol intake, physical disabilities), information about the appointment itself (appointment ID, appointment date), whether the patient showed up (status), and if a text message was sent to the patient about the appointment (SMSsent). Using this data I will also derive a new variable, which is the predicted probability that a patient will show up for their appointment (ProbShow).

In the above example, specific variables names are indicated in the parenthesis; remember if your dataset has a lot of varibles stick to summaries and don't provide specific variable names. The example also differentiates variables that come with the dataset (i.e. Age) from new variables that you might derive for your visualizations (i.e ProbShow) - you should make a similar distinction in your write-up.

Section 3: Usage scenario & tasks

The purpose of the usage scenario is to get you to think about how someone else might use the app you're going to design, and to think about those needs before you start hacking. Usage scenarios are typically written in a narrative style and include the specific context of usage, tasks associated with that usage context, and a hypothetical walkthrough of how the user would accomplish those tasks with your app. If you are using a Kaggle dataset, you may use their "Overview (inspiration)" to create your usage scenario, or you may come up with your own inspiration.

Example usage scenario with tasks (tasks are indicated in brackets, i.e. [task])

Mary is a policy maker with the Canadian Ministry of Health and she wants to understand what factors lead to missed appointments in order to devise an intervention that improves attendance numbers. She wants to be able to [explore] a dataset in order to [compare] the effect of different variables on absenteeism and [identify] the most relevant variables around which to frame her intervention policy. When Mary logs on to the "Missed Appointments app", she will see an overview of all the available variables in her dataset, according to the number of people that did or did not show up to their medical appointment. She can filter out variables for head-to-head comparisons, and/or rank order patients according to their predicted probability of missing an appointment. When she does so, Mary may notice that "physical disability" appears to be a strong predictor missing appointments, and in fact patients with a physical disability also have the largest number of missed appointments. She hypothesizes that patients with a physical disability could be having a hard time finding transportation to their appointments, and decides she needs to conduct a follow-on study since transportation information is not captured in her current dataset.

Note that in the above example, "physical disability" being an important variable is fictional - you don't need to conduct an analysis of your data to figure out what is important or not, you just need to imagine what someone could find, and how they may use this information.

Section 4: Description of your app & sketch (20%)

rubric={viz:20}

Building from your usage scenario, give a high-level description of the interface for the app you will build. Remember to be realistic since you are actually required to implement this app, and you will be assessed on how much, and why, your final app deviates from this initial proposal.

In this description you are not required to use terminology specific to Shiny apps (i.e. widgets) or make reference to specific R libraries. Your sketch can be hand-drawn or mocked up using a graphics editor. If you can show the app visual design & interaction design in a single image that is ideal, but if you need more space to show some other planned features of your app you can include up to a total of THREE images for this proposal.

Do not submit a preliminary shiny app as a sketch, you will be submitting the first version of your app in the next project milestone.

Example description

The app contains a landing page that shows the distribution (depending on data type, bar chart, density chart etc) of dataset factors (hypertension, physical disabilities etc.) coloured coded according to whether patients showed up or didn't show up for an appointment. From a dropdown list, users can filter out variables from the distribution display, by patient demographics (i.e. only show female patients), by appointment data (i.e. if SMS was sent), and finally by the date range of appointments. A different dropdown menu will allow users to re-order variables according to the probability of patients being a no-show or in alphabetical order to comorbidities. Users can compare the distribution of co-morbidities by scrolling down through the app interface.

Example sketch

The example sketch shows the visual design of the app and one interactive feature (a tooltip).

A further note on the app sketches

I've choosen to draw up my sketch using Powerpoint and using icons from the noun project. You can use others graphics tools (i.e. Photoshop, Illustrator, GIMP, or Inkscape, etc.) or you can even draw you app by hand and upload the scanned version of your drawing. Whatever you choose to do, make sure that the final image in your report is legible. Please note, this is a very basic illustrative guide to help you in this milestone, it is by no means the limit of what you should submit.