Poetry Form Checker

Poetry Form Checker

Limericks, sonnets, haiku, and other forms of poetry each follow prescribed patterns that give the number of lines, the number of syllables on each line, and a rhyme scheme. For example, limericks are five lines long; the first, second, and fifth lines each have eight syllables and rhyme with each other; and the third and fourth lines each have five syllables and rhyme with each other. (There are additional rules about the location and number of stressed vs. unstressed syllables, but we'll ignore those rules for this assignment; we will be counting syllables, but not paying attention to whether they are stressed or unstressed.)

Here is a stupendous work of limerick art:

    I wish I had thought of a rhyme
    Before I ran all out of time!
    I'll sit here instead,
    A cloud on my head
    That rains 'til I'm covered with slime.

We're sure that you've all kept yourselves awake wondering if there was a way to have a computer program check whether a poem is a limerick or if it follows some other poetry pattern. Here's your chance to resolve the question!

The CMU Pronouncing Dictionary

The Carnegie Mellon University Pronouncing Dictionary describes how to pronounce words. Head there now and look up a couple of words; try searching for words like "Daniel", "is", and "goofy", and see if you can interpret the results. Do contractions like "I'll" (short for "I will") and "we'll" (short for "we will") work? Try clicking the "Show Lexical Stress" checkbox too, and see how that changes the result.

Here is the output for "Daniel" (with "Show Lexical Stress" turned on): D AE1 N Y AH0 L. The separate pieces are called phonemes and each phoneme describes a sound. The sounds are either vowel sounds or consonant sounds. We will refer to phonemes that describe vowel sounds as vowel phonemes, and similarly for consonants. The phonemes that are used were defined in a project called Arpabet that was created by the Advanced Research Projects Agency (ARPA) back in the 1970's.

In the CMU Pronouncing Dictionary, all vowel phonemes end in a 0, 1, or 2, with the digit indicating a level of stress. Consonant phonemes do not end in a digit. The number of syllables in a word is the same as the number of vowel sounds in the word, so you can determine the number of syllables in a word by counting the number of phonemes that end in a digit.

As an example, in the word "secondary" (S EH1 K AH0 N D EH2 R IY0), there are four vowel phonemes, and therefore four syllables. The vowel phonemes are EH1, AH0, EH2, and IY0.

In case you're curious, 0 means unstressed, 1 means primary stress, and 2 means secondary stress — try saying "secondary" out loud to hear for yourself which syllables have stress and which do not. In this assignment, your program will not need to distinguish between the levels of syllabic stress (although we cannot guarantee a completely stress-free experience while you work on this project).

Your program will read the file dictionary.txt, which is our version of the Pronouncing Dictionary. You must use this file, not any files from the CMU website. Our version differs from the CMU version: we have removed alternate pronunciations for words and words that do not start and end with alphanumeric characters (like #HASH-MARK, #POUND-SIGN and #SHARP-SIGN). Take a look at our dictionary.txt file to see the format; notice that any line beginning with ;;; is a comment and not part of the dictionary.

The words in dictionary.txt are all uppercase and do not contain surrounding punctuation. When your program looks up a word, use the uppercase form, with no leading or trailing punctuation. Function clean_up in the starter code file poetry_functions.py will be helpful here.

Poetry Form Descriptions

For each type of poetry form (limerick, haiku, etc.), we will write its rules as a poetry form description. For example, at the beginning of this handout, we gave the rules for what it means to be a limerick. Here's our poetry form description for the limerick poetry form:

    8 A
    8 A
    5 B
    5 B
    8 A

On each line, the first piece of information is a number that indicates the number of syllables required on that line of the poem. The second piece of information on each line is a letter that indicates the rhyme scheme. Here, lines 1, 2, and 5 must rhyme with each other because they're all marked with the same letter (A), and lines 3 and 4 must rhyme with each other because they're both marked with the same letter (B). (Note that the choice to use the letters A and B was arbitrary. Other letters could have been used to describe this rhyme scheme.) We say that two lines rhyme with each other when the final vowel phonemes and all subsequent consonant phoneme(s) after the final vowel phonemes match (i.e., are the same and are in the same order).

Some poetry forms don't require lines that rhyme. For example, a haiku has 5 syllables in the first line, 7 in the second line, and 5 in the third line, but there are no rhyme requirements. Here is an example:

    Dan's hands are quiet.
    Soft peace surrounds him gently:
    No thought moves the air.

And another one:

    Jen sits quietly,
    Thinking of assignment three.
    All ideas bad.

We'll indicate the lack of a rhyme requirement by using the symbol *. Here is our poetry form description for the haiku poetry form:

    5 *
    7 *
    5 *

Some poetry forms have rhyme requirements but don't have a specified number of syllables per line. Quintain (English) is one such example; these are 5-line poems with an ABABB rhyme scheme, but with no syllable requirements. Here is our poetry form description for the Quintain (English) poetry form (notice that the number 0 is used to indicate that there is no requirement on the number of syllables in the line):

    0 A
    0 B
    0 A
    0 B
    0 B

Here's an example of a Quintain (English) from Percy Bysshe Shelly's Ode To A Skylark:

    Teach us, Sprite or Bird,
    What sweet thoughts are thine:
    I have never heard
    Praise of love or wine
    That panted forth a flood of rapture so divine.

Your program will read a poetry form description file containing poetry form names and descriptions. For each poetry form in the file, the first line gives the name of the poetry form, and subsequent lines contain the number of syllables and rhyme scheme as described in this section. Each poetry form is separated from the next by a blank line. We have provided poetry_forms.txt as an example poetry form description file. We will test your code with other poetry form descriptions as well. You should assume that the poetry form names given in a poetry form description file are all different.

Data Representation

Poetry Pattern

A poetry pattern is our data structure for representing a poetry form description. It is a two-item tuple of:

For example, here is the poetry pattern for a limerick:

([8, 8, 5, 5, 8], ['A', 'A', 'B', 'B', 'A'])

Pronunciation Dictionary

A pronunciation dictionary is our data structure for representing the mapping of words to phonemes. It is a dict of {str: list of str}, where:

For example, here is a (very tiny) pronunciation dictionary:

{'DANIEL': ['D', 'AE1', 'N', 'Y', 'AH0', 'L'],
 'IS': ['IH1', 'Z'],
 'GOOFY': ['G', 'UW1', 'F', 'IY0']}

Valid Input

For all poetry samples used in this assignment, you should assume that all words in the poems will appear as keys in the pronunciation dictionary. We will test with other pronunciation dictionaries, but we will always follow this rule.

Required Functions

In the starter code file poetry_functions.py, complete the following function definitions. In addition, you must add some helper functions to aid with the implementation of these required functions.

Function name:
(Parameter types) -> Return type
Full Description (paraphrase to get a proper docstring description)
get_poem_lines:
(str) -> list of str
The parameter represents a poem. Return a list of non-blank, non-empty lines from the poem with whitespace removed from the beginning and end of each line.
count_vowel_phonemes:
(list of list of str) -> int
A vowel phoneme is a phoneme whose last character is 0, 1, or 2. As examples, the word BEFORE (B IH0 F AO1 R) contains two vowel phonemes and the word GAP (G AE1 P) has one.

The parameter represents a list of lists of phonemes. The function is to return the total number of vowel phonemes found in the list of lists of phonemes.
last_phonemes:
(list of str) -> list of str
A vowel phoneme is a phoneme whose last character is 0, 1, or 2. As examples, the word BEFORE (B IH0 F AO1 R) contains two vowel phonemes and the word GAP (G AE1 P) has one.

The parameter represents a list of phonemes. The function is to return a list that contains the last vowel phoneme and any subsequent consonant phoneme(s) in the given list of phonemes. The ordering must be the same as in the given list. The empty list is to be returned if the list of phonemes does not contain a vowel phoneme.
check_syllable_counts:
(list of str, poetry pattern, pronunciation dictionary) ->
list of str
The first parameter represents a poem as a list of lines (as produced by get_poem_lines), the second represents a poetry pattern, and the third represents a pronunciation dictionary. Return the list of the lines from the poem that do not have the right number of syllables for the poetry pattern. The lines should appear in the list in the same order as they appear in the poem. If all lines have the right number of syllables, return the empty list. (The number of syllables in a line is the same as the number of vowel phonemes in the line.)
check_rhyme_scheme:
(list of str, poetry pattern, pronunciation dictionary) ->
list of list of str
A vowel phoneme is a phoneme whose last character is 0, 1, or 2. We say that two lines rhyme if and only if their final vowel phonemes and all subsequent consonant phoneme(s) after the final vowel phonemes match (i.e., are the same and are in the same order).

For example:
  • THE (DH AH0) and A (AH0) rhyme
  • TREETOPS (T R IY1 T AO2 P S) and TRICERATOPS (T R AY2 S EH1 R AH0 T AO2 P S) rhyme
  • ABSURD (AH0 B S ER1 D) and ADJOURNS (AH0 JH ER1 N Z) do not rhyme

The first parameter represents a poem as a list of lines (as produced by get_poem_lines), the second represents a poetry pattern, and the third represents a pronunciation dictionary. Return a list of lists of lines in the poem that should rhyme with each other (according to the poetry pattern) but don't. If all lines rhyme as they should, return the empty list.
Notes:
  • The lines should appear in the inner lists in the same order as they appear in the poem.
  • If n lines are supposed to rhyme with each other and at least one line does not, all n lines should appear in the inner list. For example:
    • if the rhyme scheme is ['A', 'A', 'B', 'B', 'A'],
    • and the lines are ['On the', 'plains, a', 'triceratops climbs treetops.', 'The day adjourns.', 'Absurd!'],
    • this function should return [['On the', 'plains, a', 'Absurd!'], ['triceratops climbs treetops.', 'The day adjourns.']] or [['triceratops climbs treetops.', 'The day adjourns.'], ['On the', 'plains, a', 'Absurd!']].


In the starter code file poetry_reader.py, complete the following function definitions.

Function name:
(Parameter types) -> Return type
Full Description (paraphrase to get a proper docstring description)
read_pronunciation:
(file open for reading) ->
pronunciation dictionary
The parameter represents an open file in the format of the CMU Pronouncing Dictionary. Return the pronunciation dictionary based on the given file.
read_poetry_form_descriptions:
(file open for reading) ->
dict of {str: poetry pattern}
The parameter represents a poetry form description file that has been opened for reading. Return a dictionary where each key is a poetry form name and each value is the poetry pattern for that form based on the given file.

The main program

Once you have correctly implemented the functions in poetry_functions.py and poetry_reader.py, execution of the main program (poetry_program.py) will:

  1. Read our version of the CMU Pronouncing Dictionary (dictionary.txt)
  2. Read poetry_forms.txt
  3. Repeatedly ask the user for a poetry form to check and the name of a file containing a poem. The program will report on whether or not the poem satisfies the poetry form description for the chosen poetry form.

Required Testing (unittest)

Write (and submit) a set of unittests for functions count_vowel_phonemes and check_syllable_counts. Name these two files test_count_vowel_phonemes.py and test_check_syllable_counts.py. For each test method, include a brief docstring description specifying what is being tested. For unittest methods, the docstring description should not include a type contract or example calls.

Files to Download

All of the files that you need to download for the assignment are listed in this section. These files must all be placed in the same directory (folder).

a3_type_checker.py and doctest

Additional requirements

How to tackle this assignment

Principles:

Steps:

Here is a good order in which to solve the pieces of this assignment.

  1. Read this handout thoroughly and carefully, making sure you understand everything in it.
  2. Read the poetry_functions.py starter code to get an overview of what you will be writing.
  3. Implement and test the required functions in poetry_functions.py, along with helper functions. Now is also a good time to write the unittest test files test_count_vowel_phonemes.py and test_check_syllable_counts.py.
  4. Next, read the starter code poetry_reader.py, and implement and test those functions.
  5. Read the code provided in poetry_program.py and run it. If there are any problems with the results, try to identify which of your functions has an issue, and go back to testing that function.

Marking

These are the aspects of your work that we will focus on in the marking:

Submitting your assignment

You must hand in your work electronically, using the MarkUs online system. Instructions for doing so are posted on the Assignments page of the course website.

The very last thing you do before submitting should be to run a3_type_checker.py one last time and ensure that the type checks pass. This will prevent your code from receiving a correctness grade of zero due to a small error that was made during your final changes before submission.

For this assignment, hand in four files:

Once you have submitted, be sure to check that you have submitted the correct version; new or missing files will not be accepted after the due date. Remember that the correct spelling of filenames, including case, is necessary. If your files are not named exactly as above, your code will receive zero for correctness.