Assignment 7: Video Segmentation

Due: At the start of class, Thursday, April 6, 2017.

The purpose of this assignment is to understand and implement methods for shot boundary detection in a video sequence.

The assignment

In this assignment we'll detect shot boundaries in 151 frames from a well known cartoon (google “Snow White and the Seven Dwarfs Heigh Ho Song”). Shot boundaries mark transitions between “shots.” A shot is a sequence of frames where the camera sees more or less the same scene. In this case, shot boundaries mark the transition from one dwarf to another.

We can identify shot boundaries in two different ways:

You are given skeleton code for the assignment,

The data

You are given 151 frames in a directory of .jpg files, each of size 90 by 120. You can get the images from

Manually find the shot boundaries. Record the last frame number of each shot (i.e., if the shot changes from frame 3 to 4, you should record 3 as a shot boundary). Note that the first and last frames in the entire sequence do not count as shot boundaries.

  1. (1 mark) Write the frame numbers of the shot boundaries as a list in the variable good_boundaries in the skeleton code,

Detecting boundaries using k-means

First, we will try k-means as a method to detect the shot boundaries. kmeans is a SciPy function that implements the k-means clustering algorithm -- look at its documentation to see how it works.

We are going to construct features for each frame using histograms of pixel values. The code to read the images and to create gray histograms is provided (see the function compute_gray_histograms).

Once we have the histograms for each frame, the task is to cluster them using k-means. We know, a priori, that there are 4 shots. Therefore, we use k=4 clusters. Once we have the cluster centers, we assign each frame's histogram to its associated cluster. Finally, we find frames where the cluster assignment changes, using the function cluster2boundaries provided. As a test, use the function get_boundaries_cost to evaluate how good this prediction is.

  1. (4 marks) Write the loop inside the code section starting with # === GRAY HISTOGRAMS ===, to compute histograms, to cluster them using k-means, to obtain shot boundaries and to evaluate the detected boundaries (storing the associated cost in the array gray_costs). This will produce a plot showing “Error in boundary detection” (vertical axis) versus the “Number of bins” used (horizontal axis).
  2. (2 marks) Write the body of the function compute_color_histograms. The function is analogous to compute_gray_histograms, but should compute one histogram for each color channel, and then concatenate the three histograms into one larger histogram.
  3. (4 marks) Write the loop inside the code section starting with # === COLOR HISTOGRAMS ===, analogous to the gray histogram loop above, but using the color histograms. This will produce a plot showing “Error in boundary detection” (vertical axis) versus the “Number of bins” used (horizontal axis).

Detecting boundaries using frame differences

Another way to detect shot boundaries is to detect changes between successive frames. For this part of the assignment, use only the gray value images.

Add code to your file to implement the following measures of change between successive frames:

  1. (2 marks) Absolute frame difference: Compute the sum of the absolute pointwise differences between successive frames.
  2. (2 marks) Squared frame difference: Compute the sum of the squared pointwise differences between successive frames.
  3. (2 marks) Average gray level difference: Compute the average gray level for each frame and compute the difference between the averages in successive frames.
  4. (2 marks) Histogram difference: Compute the gray level histogram for each frame and compute the Euclidean distance between the histograms in successive frames. For the histograms, use 10 bins.
In each part, 5–8, produce the indicated titled/labeled plots of the difference measure (vertical axis) versus the frame number (horizontal axis).

Answer the following written questions:

  1. (3 marks) Which frame difference method is better? Why?
  2. (3 marks) We worked with shots without camera movement. Which of our six methods do you think would work best if the camera were slowly moving instead of static? Why?


Hand in a printed copy of your file. This should include all the code added to complete parts 1–8. Include sufficient comments for others to easily understand what you have done. In addition, hand in the plots requested in parts 2, 4, and 5–8, and your answers to the questions in parts 9 and 10.

Electronic Handin

For this assignment, we are offering the option of electronic handin for the requested plots. Refer to the department's Handin instructions for information about submitting assignments electronically. The course account is cs425 and the assignment name is hw7.