Courselets: Learning by a Rigorous Approach to Elementary Computer Programming

Michael R. Levy
Department of Computer Science
University of Victoria
Box 3055, Victoria, BC
V8W 3P6

Janice Miers
Department of Computer Science
University of Victoria
Box 3055, Victoria, BC
V8W 3P6

---

ABSTRACT

Superficially one might believe that approaches to teaching of Computer Programming must lie on an axis of formality. The dominant approach is informal: present the basic facts, show some examples, give exercises. The formalists use "methods", such as Dijkstra's weakest pre-conditions, teaching formal programming derivation instead of programming. Although the formalists can state good cause for their approach, they are fatally undermined by the fact that, for the most part, practicing programmers do not use formal methods. However, we claim that the apparent dichotomy (formal/informal) is a false one: the issue is really one of teaching rigor: good programming is a rigorous activity, and this can (and should) be taught. Teaching rigor is, by its nature, an interactive process. Where can the time be found to add this interactively into a computer-programming course? We propose an answer here: by judicious use of World-Wide-Web tools, and by a conscious restriction of the goals of the offered courses, we can offer students "courselets": interactive tutorials they can use to enhance their programming skills. The topic is broad: in this paper, we briefly argue the difference between rigor and formalism, survey and critique the Web tools available for developing a courselet, and we describe a sample courselet.

Keywords: Introductory Programming, Interactive Course, Web, Invariants, While loops

---

1. Introduction

The thesis of this paper is applicable to many university courses: class time is an inadequate and inappropriate place for a student to learn how apply rigorous thought to the subject material of the course. Inadequate because lectures work best when they are used to present and survey material and to motivate each student to spend additional time on the presented material. Inadequate because this kind of learning is intrinsically self-paced, and the time required is likely to vary significantly between students.

A professor teaching a typical 39-hour lecture course can not devote the time required helping each student learn to be rigorous.

Not every section of every course requires the kind of learning that we have in mind. In a typical CS1 or CS2 curriculum, for example, there may be two or three topics that cry for this approach. We have identified, in particular, just two: the development of while loops and of recursive functions. We believe that, by restricting attention to a tiny but significant part of the course, we can offer students a feasible option for self-paced, interactive tiny courses - "courselets", using Web technology for course delivery.

The reader may be wondering about the use of the word "rigor" in the above discussion. We believe that, in Computer Science at least, attempts to teach other than the most common informal approach have been thwarted by the use of excessive formalism. There is an alluring appeal of the idea that computer programming is an activity that is identical to mathematical theorem proving. That is, a program specification is a theorem, and a program is something derived from a proof that the theorem holds. In fact, this view is widely studied in theoretical computer science, but has seen no real use in software development. Formalisms such as Floyd-Hoare style assertions, Dijkstra's weakest pre-conditions and others have been used in CS1 courses - see for example [1] and [2]. However, most such endeavors do not succeed. A telling reason is given in [2] - essentially student disillusionment that the techniques they have been taught seem to be abandoned in all subsequent CS courses. Actually, the disillusionment should go further: formalism systems are not used to develop real software. To reject formalism is not to reject rigor. This is the main point that we will elaborate in the next section of the paper.

The paper is organized in the following way: In the next section we will suggest how some formal techniques can be abstracted into informal but rigorous techniques that are practical. We then discuss, in a general way, how these techniques can be presented using standard Web technology. We describe a course on the using the formal notion of invariants to develop while loops. We discuss our plans to monitor and evaluate the course, and we present our conclusions.

2. Teaching programming fundamentals

From the earliest days of the development of the subject of programming, most effort in developing methods for programming has focused on formalism rather than on rigor. This is a distinction well recognized by mathematicians, most of whom have no difficulty distinguishing, for example, between a working mathematician's proof and an exercise in formal logic. In Computer Science, the inherent difficulties of scaling formalism to realistic examples has led to the abandonment of these methods in practical software development. Nevertheless, attempts to use formalist approaches in CS1 are reported from time to time. The papers by Denman [1] and McLoughlin and Hely[2] are just two examples. McLoughlin and Hely report that the grades of their students were higher, but they report interesting reactions from the students: the students reflect frustration and disappointment that the techniques are not used in subsequent CS courses. A formalist would argue that the fault lies with the subsequent courses, an argument the authors of this paper make, pointing out that their own colleagues are resistant to the approach, but perhaps the fault lies elsewhere. We contend that the problem is the use of formalism itself. In principle, all software could be derived systematically using formalism. But this is like saying that, in principle all mathematical theorems can be checked mechanically - a statement that is completely untrue in the world of working mathematics.

Formal systems are not practical for several reasons. Firstly, they do not scale well. That is, large (realistic) programs have formal counterparts that are impossibly large. A formal derivation that is too large is not comprehensible. Hence confidence in the end product (the program) is not likely to be high, even though correctness is the goal of the formalism. A second problem with formalism is that they are indiscriminate: trivial matters, easily understood by humans may require intricate symbolic formulations. A simple example is an event loop: informally speaking, a loop that tests for an external event (such as a mouse click) and responds appropriately. Formalizing such a program is non-trivial and yet programs that handle event loops are the bread-and-butter of the desktop application market. Thirdly, formalisms rely on symbols so that their rules can be concisely stated. These symbols are usually not intuitive, and they must be taught to students before any progress can be made in applying the formal rule. This implies a level of detail that is not only unnatural but is unnecessary when considering the real issue, which is rigor. Finally, the use of symbolism requires an artificial mapping to the necessarily general domain of the symbols, diverting the programmer from the discovery of more appropriate and helpful mappings.

A fascinating dialogue between formalists and others can be found in a panel that was asked to respond to Dijkstras On the Cruelty of Really Teaching Computer Science, presented at the 1988 ACM Computer Science Conference. The talk and the response of several notable panelists were subsequently published in the Communications of the ACM[3]. In essence, Dijkstra advocates a formalist approach to teaching first year students, and we cannot do justice to his discourse here. Remarks by the two responders do bear on the thesis of this paper. Firstly, we completely agree with one of Terry Winograds comments:

"The primary subtext of his diatribe is a complaint about the lack of rigor in computer science education. I fully support his pleas that as educators we demand rigorous thinking. But, he confuses this with the claim that the essential part of computing education lies in the ability to manipulate formal abstractions, detached from considerations of operational devices, their behaviors, or their embedding in a world of people and activities. If he deludes the students into thinking this, they are in for a rude awakening" (page 1413).

Karp also argues with Dijkstras interpretation of mathematics as a formal process. But while he does argue convincingly against formalism, we would suggest that his conclusion is too extreme, although it is concordant with the dominant teaching paradigm in CS1:

"Instead [of formalism] I would advocate increased research on debugging methods, on the use of modern programming environments including work stations, revision control aids, editors, and high-level programming languages" (page 1411)

(Perhaps this helps explain why the most vigorous debate amongst CS1 curriculum developers revolves around the choice of programming language.) To be fair to Karp, he does go on to suggest that some training in set theory and predicate calculus is useful since these notations are useful in informal proof as well as formal proof. To us, the word informal does not imply invalid, erroneous or sloppy, but just what it says - not formal.

Dijkstra is, of course, is not the only person to offer a formalist solution to the problem of teaching programming. Gries argues for the teaching of calculation skill in the CS1 class[4]. His paper is a classic example of a common confusion between rigor and formalism. He in turns cites a US government report that calls for a more rigorous use of mathematical techniquesPromising directions include the application of formal methods

Gries presents this example to illustrate his point:

Mary has an even number of apples. Twice the number of apples that Mary has plus the number of apples that John has is some (unknown) constant C. Suppose Mary throws half her apples away. What should be done with Johns apples so that twice the number of apples that Mary has plus the number of apples that John has is still C?

He proceeds to demonstrate how this simple invariance problem can be solved formally, by finding an expression E that makes the following Hoare triple valid:

        {even(M) ^ 2*M+J = C}
        M,J := M div 2, E
        {2*M+J = C}
        

Even a student who is conversant with algebra must understand that the variable M in the third line is not the same variable M that appears in the first line. Of course, then it is fairly easy to solve the two simultaneous equations 2M + J = C and M + J + E = C (assuming the student realizes that this is what is required), but I would suggest that the diagram in Figure 1 is a much simpler way to go:

Figure 1: A Picture is worth at least 13 words

The problem is not with the use of invariants: it is the need, imposed by the formal system, to use a collection of symbols that, in this particular case obscures rather than clarifies the problem. If we separate the problems of formalism, and its obsession with symbols, and try to look at fundamental techniques applied rigorously, we can perhaps find realistic methods to develop high-quality computer programs. We believe that the best programmers use these skills. A superb example of the kind of rigorous, informal programming we have in mind can be found in the two-volume series Programming Pearls[5] and More Programming Pearls[6] by John Bently. We also believe that enough is known about these skills to teach them to first-year Computer Science students.

In computer programming, the real value of formalism is that they can help programmers to identify fundamental programming derivation techniques. The counter-part to induction, for example, is recursion. Loop invariants have also found their way into the informal tool-set of skilled programmers.

Formally, an assertion is a proposition that may use program variables. An invariant is an assertion that is always true. The way it is used formally depends upon the chosen formalism.

Informally, an invariant is a statement about the state of computation that captures the way in which the while loop operates. Deriving an invariant is not always easy. However, the effort usually pays off in much simpler code.

We will take a beautiful example from Bently, which he attributes to Nico Lomuto of Alsys, Inc. This problem is part of Quicksort[7]. Given an array X, a range A..B of indices, and a value T, we are to rearrange X[A..B] and compute the index M (for middle) such that all elements less than T are to one side of M, while all other elements are on the other side. Bently presents the invariant using the diagram shown in Figure 2.

Figure 2: An Invariant for part of Quicksort

He presents both the derivation and proof of correctness of the resultant Quicksort algorithm, using exactly the informal, rigorous approach that our students should be learning. (If you have not seen this example before, derive the loop by considering the relationship of X[I] to T, and compare your answer to the complex solutions presented in many otherwise fine CS1 or CS2 textbooks.)

This particular problem can be done formally, of course, but notational difficulties immediately arise. How for example, should arrays be described formally? For an elegant solution to this particular part of the problem, see Reynolds[8]. But surely the diagrammatic style used by Bently is adequate? Of course, pictures have their limitations, but any competent programmer will understand their limitations and use English if necessary, check all the boundary cases in detail and exercise the caution that rigorous thinking implies.

3. Rigor and the curriculum

Having accepted the premise that it is desirable to try to help students learn a rigorous approach to programming, how is this best accomplished? How are the significant parts of a formalism extracted? How are they best adapted to the rigorous non-formal approach that we propose? How are they best presented to students? Is there time within a crowded curriculum to present this additional material?

Using invariants to teach while loop derivation is now quite common. See, for example, articles by Astrachan[9], Troger[10] and Tam[11]. Arnow[12] describes their use in a literacy course. It is also possible to teach a systematic method for deriving recursive functions, although we know of only a report on formalist attempt[1]. Formally, recursion is the counterpart to proof by induction, and it is possible to derive various kinds of recursive programs using (informal) inductive reasoning. One other example of the application of formalism is the use of code inspections (see for example Russell[13]). Van Emden[14] describes how Floyds (formal) method for the verification of flowcharts can be adapted to the informal derivation of programs using what he calls an inspection protocol: basically a technique that can enhance the ease of the code inspection process. No doubt, future formal work on Object-Oriented programming will suggest techniques that could be applied to the problem of the derivation of classes and hierarchies.

Our experience suggests that it is difficult to teach rigor in the CS1 and CS2 curricula. Traditionally, there are three options for those who would try (we use invariants as the example):

i) Spend a substantial amount of lecture time and laboratory time teaching invariants;
ii) Give one or two lectures on invariants and hope that the students get it;
iii) Give a few lectures on invariants, and use them for all subsequent examples presented to students.

Option (i) is precluded by time constraints. Unfortunately, both other options are not that satisfactory. (ii) For obvious reasons, and (iii) because few examples in the course use non-trivial loops. Furthermore, we believe that traditional teaching methods are intrinsically inappropriate for teaching rigorous thinking. The difference between an explanation, however lucid, and a derivation is too significant. What we really want to do is force a student to go through all the steps required from problem formulation to program derivation, and to develop the skill of discriminating between a sound reasoning step and a fallacious one.

Doing this for an entire course is a lofty goal, but we do not think it is achievable or necessary. We do think that we can help students learn to program by offering very small interactive tutorial courses that serve to supplement only selected material from the class. The idea is not to supplement every lecture, but just those two or three areas that (a) are amenable to this approach and (b) are core subject areas for the given course. We call these tutorials courselets.

A courselet should be supplementary to lectures, easily accessible, learner-led, and ideally would provide interactivity and immediate feedback. Each courselet should focus on a single concept and develop the skills within context. We have developed two courselets, using the World Wide Web as a delivery medium. The courselets rely on extensive cgi-programming to provide interactivity and to track student progress. They assume only a basic knowledge of propositional logic, and use standard C syntax for all examples.

The courselets use a measured approach to each skill. This is done by presenting a problem, and guiding the students one step at a time, to a correct solution. We use a mixture of English, diagrams and some algebra to provide precise and complete assertions about the state of a program. The students fill-in expressions, statements and assertions, typically one at a time. For example, we teach "how to write a while loop" by forcing students to develop the code in the proper order: describe the invariant; write the body of the while loop; determine the stopping condition, and finally, add the preamble and "post"amble code.

What we hope will distinguish a courselet from a conventional exercise will be our ability to monitor the errors that students make. Our plan is to have emailed sent to the course developers whenever a previously unseen response is received from a student. Over time, we hope to build a database of common incorrect responses and an understanding of the reason these incorrect responses are made.

For the approach to work, courselet developers should be able to concentrate on the material at hand rather than the delivery technology. HTML is now widely known and supported and has proven to be an ideal language for development of courselets. On the server end, we found that Perl was too general and complex for our purposes. We have developed a small scripting system called Guide[15], which is used as the cgi language. A discussion of Guide is beyond the scope of this paper, but we will make the guide interpreter available for free to any University or College that would like to use it.

4. Sample Course

4.1 A While Courselet

The While Courselet guides the student through a systematic approach to writing a while loop using the formal notion of invariants.

First, the student is given a simple example (summing the elements of an array) and the loop invariant, and led through the process of completing a while loop. At each submission, the student's answer is compared to a set of possible answers, and if it is acceptable, this portion of the while loop code is filled in, and the student goes on to fill the next blank. If the answer given is in error, the student is given a hint, and asked to try again. After two tries, the correct answer is given and the student moves to the next screen.

The next portion of the courselet details the steps required to write a while loop, with particular emphasis to the task of deriving a suitable invariant. In another example the student fills in the code needed to partition an array, again with a derived invariant. As in the first example, immediate feedback and hints are provided as the student proceeds.

The final portion asks the student to create the loop invariant, and fill in the code to insert an element into an integer array.

A student can exit from a courselet at any time. When they next login, they can choose to resume the course where they left off, or to start again. Personal data and answers from previous pages are saved between sessions, and HTML pages are customized to reflect the student's data.

Figure 3 shows an early part of the while courselet. Notice that the only areas the student can fill in are the right-hand sides of the two statements in the body of the while loop. The idea is to force the students to complete the body of the while loop before doing the termination condition or the set-up statements.

Figure 3: A Sample of the While Courselet

4.2 PLANS FOR EVALUATION

Volunteers from a second-term programming class will be solicited to take part in an experiment to monitor and evaluate the effectiveness of this approach. To eliminate a possible bias (volunteers may be more motivated, perhaps affecting their grade), the volunteers will form the entire experimental population.

A random sample (the number in the sample will depend on the number of volunteers) of students will be selected from the volunteer population to work through the example courselet. The remainder of the population will be a control group. A standard two-sided t-test will test the null hypothesis: Working through the courselet had no effect on the student's grade in the course.

5. Conclusions

Many Computer Science instructors have a natural and appropriate aversion to formal systems. Regrettably, the dominant teaching paradigm for computer program is not just antithetical to formalists, it does not serve the average student well. We believe that formal approaches can be used for the purpose of extracting central ideas of program development. By recasting these ideas informally, but rigorously, that is, by using them precisely using natural aids such as English and diagrams, it is possible to develop a more systematic approach to teaching programming. Doing this in a fifty-minute lecture is difficult. Perhaps by exploiting Web technology, and by restricting attention to the smallest set of core concepts, we can improve the way in which students learn to program.

References

1.     Denman, Richard T. Derivation of Recursive Algorithms for CS2. Proc. 27th SIGCSE Tech. Symp. On Computer Science Education, March 1996.

2.     McLoughlin, Henry and Hely, Kevin. Teaching Formal Programming to First Year Computer Science Strudents. Proc. 27th SIGCSE Tech. Symp. On Computer Science Education, March 1996.

3.     Dijkstra, Edsger W., Parnas, David, Scherlis, William, van Emden, M. H., Hamming, Richard, Karp, Richard M and Winograd, Terry. A Debate on Teaching Computer Science, Comm. of the ACM, vol. 32, no. 12, December 1989, pp. 1397-1414.

4.     Gries, David. Teaching Calculation and Discrimination: a More Effective Curriculum, Comm. of the ACM, vol. 34, No. 3, March 1991, pp. 45-54.

5.     Bently, Jon. Programming Pearls, Addison-Wesley, 1986.

6.     Bently, Jon. More Programming Pearls: Confessions of a Coder, Addison-Wesley, 1988.

7.     Hoare, C.A.R, Quicksort, Computer journal 5, no. 1, April 1962, pp 10-15.

8.     Reynolds, John. Reasoning about Arrays, Technical Report, DAI-38, University of Edinburgh, Department of Computer Science, July 1977.

9.     Astrachan, Owen. Loop invariants as Pictures. Proc. 22nd SIGCSE Tech. Symp. On Computer Science Education, March 1991.

10. Troeger, D. Experiences Teaching Loop Invariants to Beginners, in Logic, Loops and Literacy Workshop, May 1993.

11. Tam, W. Teaching Loop Invariants by Example, Proc. 23rd SIGCSE Tech. Symp. On Computer Science Education, March 1992.

12. Arnow, David. Tecahing Programming to Liberal Arts Students: Using Loop Invariants, Proc. 24th SIGCSE Tech. Symp. On Computer Science Education, March 1994.

13. Russell, G. W. Experience with inspection in ultralarge-scale developments. IEEE Software vol. 8 no. 1, pp25-31.

14. van Emden, Maarten H. Structured Inspections of Code. Software Testing, Verification and Reliability, Vol. 2, (1992) pp. 133-153.

15. Levy, Michael R. Guide: A Tiny Programming Language for Guiding HTML Forms. In preperation.

---

Name: Michael Levy and Janice Miers
Department: Department of Computer Science
Institution: University of Victoria
Postal address: Box 3055, Victoria, BC V8W 3P6
E-mail address: mlevy@uvic.ca miers@uvic.ca