Comprehension and Analysis of Large Systems

Comprehension and Analysis of Large Systems

538B, Winter 2014, Ivan Beschastnikh

MW 9:30-11:00, ICCS 206, UBC course page


Course description

Billions of people rely on correct and efficient execution of large software systems, such as the distributed systems that power Google and Facebook. Yet these systems are complex and challenging to build, maintain, extend, and understand.

  • How do engineers, those who design and implement such systems, cope with these issues today?
  • Are these pain-points inherent to complex software, or can they be mitigated?
  • As researchers, what tools/abstractions/software processes can we propose to help engineers do a better job in building reliable and robust systems?

This course will cover a broad range of topics that fall into the general category of comprehension and analysis of software systems, with the hope of shedding light on some of the above questions.

Format

This seminar-style course will be grounded in discussion of papers that appeared in top conferences in systems, software engineering, program analysis, and more. There is no textbook. The course will also include a substantial team-based course project.

  • The discussion in each class will focus on 1-2 papers
  • The discussion for each paper is led by two students, an advocate and a skeptic
  • Everyone except for the paper advocate and skeptic writes a review of each of the assigned papers, due at 9 PM the day before class
  • Students work in pairs on a course-long project in an area related to the class

Advocate and skeptic are in charge of leading the discussion for a paper. Do not use slides. Instead, give a concice summary of the reading and pose questions to dig deeper into why the paper was written, what can we take away from it, how does the paper relate to the course, why it is or is not the final word on the subject, etc.

Advocate and skeptic need to review any other paper(s) that are discussed on same day, but they do not need to write a review for the paper they are leading.

Please create an account on the hotcrp server. This is the system where you are responsible for entering reviews on the papers that we discuss in class.

Also, sign up for the class Piazza instance. We will use this for most class communication.

Schedule (a work in progress)

1/6
Mon
Introductions, course overview, and academic paper reading
  1. How to Read a Paper, Srinivasan Keshav.
  2. Writing reviews for systems conferences, Timothy Roscoe.
1/8
Wed
The software challenge
  1. No Silver Bullet: Essence and Accidents of Software Engineering. Fred Brooks. Computer 1987.
  2. A few billion lines of code later: using static analysis to find bugs in the real world. Bessey et al. CACM 2010.
A1:Nasim
S1:Daniel

A2:Bobak
S2:Nodir
1/13
Mon
Code investigation and mental models
  1. How effective developers investigate source code: an exploratory study. Robillard, Coelho, and Murphy. TSE 2004.
  2. Debugging Reinvented: Asking Why and Answering Why and Why Not Questions About Program Behavior. Andrew J. Ko and Brad A. Myers. ICSE 2008.
Come with 2 project ideas (project speed dating in class)
A1:Tahsin
S1:Nayantara

A2:Nithya
S2:Glenn
1/15
Wed
  1. Maintaining Mental Models: A Study of Developer Work Habits. Thomas D. LaToza, Gina Venolia, Robert DeLine. ICSE 2006.
A1:Joyita
S1:Yidan
1/20
Mon
Bridging design and implementation
  1. Software Reflexion Models: Bridging the Gap between Design and Implementation. Gail C. Murphy, David Notkin, Kevin J. Sullivan. TSE 2001.
  2. ArchJava: Connecting Software Architecture to Implementation. Jonathan Aldrich, Craig Chambers, David Notkin. ICSE 2002.
Project proposal drafts due Monday 1/20, before 9 PM.
A1:Nayantara
S1:RJ

A2:Ben
S2:Tahsin
1/22
Wed
  1. Bandera : Extracting Finite-state Models from Java Source Code. Corbett et al. ICSE 2000.
Comments/feedback on proposal draft returned.
A1:Daniel
S1:Nithya
1/27
Mon
Symbolic execution and software testing
  1. KLEE: Unassisted and Automatic Generation of High-Coverage Tests for Complex Systems Programs. Cristian Cadar, Daniel Dunbar, Dawson Engler. OSDI 2008.
  2. Symbolic Execution for Software Testing: Three Decades Later. Cristian Cadar and Koushik Sen. CACM 2013.
Project proposals due Monday 1/27, before 9 PM.
A1:Yidan
S1:Nodir

A2:Joyita
S2:Nasim
1/29
Wed
  1. Automated Whitebox Fuzz Testing. Patrice Godefroid, Michael Y. Levin, David Molnar. NDSS 2008. See also [1].
On 1/30 Ivan is giving a DLS talk relevant to the course (DMP 110, 3:30PM - 5:00PM). If you have time, please attend.
A1:Ben
S1:Glenn
2/3
Mon
Building and checking distributed systems. Background reading: 1, 2, 3.
  1. Mace: Language Support for Building Distributed Systems. Killian et al. PLDI 2007.
  2. Life, Death, and the Critical Transition: Finding Liveness Bugs in Systems Code. Killian et al. NSDI 2007.
A1:Daniel
S1:Nasim

A2:Joyita
S2:RJ
2/5
Wed
  1. MODIST: Transparent Model Checking of Unmodified Distributed Systems. Yang et al. NSDI 2009.
A1:Nodir
S1:Nayantara
2/10
Mon
No class (Family day)
2/12
Wed
Performance and tracing of distributed systems
  1. Performance debugging for distributed systems of black boxes. Aguilera et al. SOSP 2003.
  2. Using Magpie for request extraction and workload modeling. Barham et al. OSDI 2004.
A1:Ben
S1:RJ

A2:Tahsin
S2:Yidan
2/17
Mon
No class (UBC midterm break)
Modeling web-assignment released by 9 PM.
2/19
Wed
No class (UBC midterm break)
2/24
Mon
Networks
  1. X-Trace: A Pervasive Network Tracing Framework. Fonseca et al. NSDI 2007.
  2. Reverse traceroute. Katz-Bassett et al. NSDI 2010
Modeling web-assignment due by 9 PM.

You must send email by 9 PM to schedule a private meeting with instructor to discuss your project status.
A1:Nodir
S1:Nasim

A2:Tahsin
S2:Joyita
2/26
Wed
  1. Header Space Analysis: Static Checking for Networks. Peyman Kazemian, George Varghese, Nick McKeown. NSDI 2012.
A1:Nithya
S1:RJ
3/3
Mon
Tracing and log analysis
  1. Diagnosing performance changes by comparing request flows. Sambasivan et al. NSDI 2011.
  2. Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems. Karthik Nagaraj, Charles Killian, Jennifer Neville. NSDI 2012.
A1:Joyita
S1:Nayantara

A2:Glenn
S2:Nasim
3/5
Wed
  1. Detecting large-scale system problems by mining console logs. Xu et al. SOSP 2009.
A1:Glenn
S1:Daniel
3/10
Mon
Logging support
  1. SherLog: Error Diagnosis by Connecting Clues from Run-time Logs. Yuan et al. ASPLOS 2010.
  2. Improving Software Diagnosability via Log Enhancement. Yuan et al. ASPLOS 2011.
A1:Ben
S1:Tahsin

A2:Nithya
S2:Daniel
3/12
Wed
  1. Be Conservative: Enhancing Failure Diagnosis with Proactive Logging. Yuan et al. OSDI 2012.
A1:Yidan
S1:Nayantara
3/17
Mon
Crowd-sourcing analysis and repository mining
  1. Bug Isolation via Remote Program Sampling. Liblit et al. PLDI 2003.
  2. RaceMob: Crowdsourced Data Race Detection. Baris Kasikci, Cristian Zamfir, George Candea. SOSP 2013.
A1:Yidan
S1:Tahsin

A2:Joyita
S2:Nithya
3/19
Wed
  1. Predicting source code changes by mining change history. Ying et al. TSE 2004.
A1:Nasim
S1:Glenn
3/24
Mon
Inferring errors, invariants, and abstract types
  1. Bugs as deviant behavior. Engler et al. SOSP 2001.
  2. Dynamically discovering likely program invariants to support program evolution. Ernst et al. TSE 2001.
A1:Nayantara
S1:Daniel

A2:RJ
S2:Nodir
3/26
Wed
  1. Dynamic Inference of Abstract Types. Guo et al. ISSTA 2006.
A1:RJ
S1:Ben
3/31
Mon
Natural language processing
  1. /* iComment: Bugs or Bad Comments? */. Tan et al. SOSP 2007.
  2. Using Natural Language Program Analysis to Locate and Understand Action-Oriented Concerns. Shepherd et al. AOSD 2007.
Project report drafts due Monday 3/31, before 9 PM.
A1:Yidan
S1:Nodir

A2:Ben
S2:Nithya
4/2
Wed
No class (NSDI)
Comments/feedback on report draft returned.
4/7
Mon
Last day of class.
Project presentations/demos due in class
4/14
Mon
Project reports due

Papers pool

Project

The project must be a substantial (three months long) software effort and must be done in collaboration with one other student who is taking the course for credit. Example project categories (don't limit yourself to these!):

  • Implement a new comprehension/analysis technique or re-create an existing technique and evaluate it.
  • Extend an existing comprehension/analysis tool in some interesting way, and evaluate it.
  • Simplify/constrain an existing technique/tool and evaluate it on a large software system.
  • Organize and perform a pilot study on a small set of developers to better understand/refine a software comprehension issue.

The project is structured as series of regularly occurring deadlines (listed in the schedule above). Don't miss these! The deadline deliverable must be submitted by email to the instructor by 9 PM of the day of the deadline.

  • The project proposal must be no longer than 5 pages and include at least the following sections: introduction, motivation, research questions, proposed methodology, timeline.
  • The project presentation/demo must be no longer than 10 minutes long and will be followed by a five minute Q/A.
  • The project report should be no longer than 10 pages long and resemble a research paper.

Grading

Final course mark will be based off of:

  • Paper reviews: 30%
  • Class participation: 20%
  • Project: 50%
    • Proposal: 10%
    • Presentation: 10%
    • Final report: 30%

How to do well in the course

Show up to class on time and be prepared to participate in discussion. This is a seminar-style course, which means that all of the class time will be devoted to discussion. It is expected that each student contributes to the discussion in each class session. The best way to prepare for class is to read the assigned paper(s), write a thoughtful review, and then read and carefully consider the reviews submitted by your peers. Periodically re-read the readings from 1/6 and work to improve your paper reading and reviewing abilities.

Invest time into the project. Do not underestimate the importance of a thorough (and interesting!) project proposal. Proposal write-ups that are vague or are incomplete will not be accepted. Do not delay finding a partner and put in consistent, weekly effort throughout the term. Rehearse and polish your presentation, and make sure your final report is well-written and conveys its ideas clearly.

Email and schedule a time to meet with the instructor to discuss the course, project ideas, etc. Be proactive.