Automatic Testing and Submission
of Student Programs via E-mail

W. Gardner

Dept. of Computing and Information Science

University of Guelph

ABSTRACT

A new system is described that accepts C and C++ source files e-mailed by students to a host computer. It carries out automatic testing according to instructor-designed acceptance tests, and e-mails the test results back to the students in real time. They use the same procedure to submit their programming assignments via e-mail, after which the system is used to automate functional testing for marking purposes. The system is a script-driven front end for an existing software component testing tool called cew.

Keywords: automated software testing, functional testing, black box testing, teaching C++

1. Introduction

For a number of years, the Computing Science Department of Malaspina University-College has been using a homegrown automatic testing tool called Component Exerciser Workbench or cew (pronounced “cue”) [1][2] for their C++ programming courses. The tool allows students with accounts on the department’s Unix system to test their assignments prior to submission, and allows markers to carry out a good deal of automated testing on the submissions. As explained in [1], experience has shown that the use of cew helps students break out of their prior pattern of turning in overly complex programs that typically accomplish 80% of the specified functionality.

In a paper on teaching OO programming at Trinity Western University [3], a new e-mail front end for this test suite was mentioned by the author in passing. The purpose of this paper is to explain the function of that front end so that other interested parties can use it, and to discuss its benefits and possible drawbacks.

1.1 Testing with cew

To utilize cew for a programming assignment, the instructor prepares a test script, which is actually a series of m4 (a Unix system program) macro invocations. Each line of the test script sets up a test case by instantiating objects, invoking methods on those objects, and evaluating an expression for comparison with a specified correct value. The test case may alternatively specify that a particular C++ exception must be thrown. Under control of a makefile, the m4 macro processor converts the test script into a C++ main program, which is then compiled and linked with the student-supplied source code. Upon execution of the main program (i.e., the compiled test script), each test is carried out in a separate process under strict Unix resource limitation (memory, CPU time, file creation) to insure that the student code can neither damage the account nor get out of control. Cew keeps track of the number of passed and failed tests, and displays a detailed trace of the latter so that the expected and actual results are easy to compare.

Students are given accounts on the cew host, and are encouraged to run the prepared test script against their own source code as often as they like so as to perfect it before submitting. This has a number of distinct benefits, among them the following:

1. Having to pass a rigourous external test emphasizes that programming is about accurately meeting specifications. It is not, as many students initially imagine, about creating a program to do something the programmer considers good, clever, or useful. They learn that putting in hours of effort is not a substitute for correct functionality.

2. An external test serves as an oracle to answer students’ unanticipated questions about the problem specification. Since no programming assignment written in English prose is likely to have an airtight specification, especially if they are continually being revised to avoid plagiarism from prior semesters, having an oracle on-line makes up for unavoidable ambiguities in the spec.

3. Using an external test prior to submission reduces the shock that occurs to many students when they receive their assignment mark: they thought they had the program working, but the mark is so low! An external test both strips away illusions about a program’s completeness and correctness, and helps focus students’ energies on incomplete or buggy features which, pointed out early, may be conquered to obtain a higher mark.

Additional benefits of using cew are discussed in [2].

After submission, the instructor uses cew again to carry out testing for the purpose of marking. It is a common practice for the instructor to insert additional, more stringent tests in the script so that students will not rely exclusively on the supplied basic script, to the neglect of developing their own specification-based test cases as well. Used in this way, cew saves a tremendous amount of time in marking programming assignments, especially for large classes, and makes it feasible for the department to maintain a “must pass basic acceptance test” policy. Passing a basic acceptance test ensures a level of correct functionality that greatly facilitates more detailed testing and finer-grained evaluation. In contrast, assessing broken programs so as to come up with meaningful marks is a frustrating and overly time-consuming duty.

1.2 Motivation for an e-mail front end

Cew requires that students have their own accounts on which to run the cew software. But at Trinity Western University, we did not customarily give students in early programming courses accounts on our Linux server. Instead, our C++ course, based on textbooks [4] and [5], nominally utilized Visual C++, and we had VC++ installed throughout our labs under a site license. Students could install VC++ (student edition) on their Windows home computers, or install g++ under Unix variants, the theory being that by strictly adhering to ANSI C++, the compiler they used should be immaterial. (Amazingly, this worked!) So, without having Linux accounts and the knowledge to use them, and with most students programming on their home computers, it seemed that student testing using cew would not be an option.

In order to tap the benefits of cew in our circumstances, an e-mail front end was constructed by the author. The concept was that whenever students wished to test their code, they would send e-mail to a special testing account, specifying which assignment’s test script was to be executed on their attached source files. Cew would be automatically invoked, and a reply mailed back containing the test results. This allowed students in any location having even a low-speed Internet connection to conveniently carry out external testing, and get back their results with less than one-minute turnaround.

With the basic e-mail processing facilities described above in place, it was practical to add the capability of accepting student submissions by e-mail. People working at home could continue doing so up to the deadline, typically late at night, which was a boon to commuters living far from the campus labs and to those feeling insecure away from home at those hours.

The following section explains how the e-mail front end works, which will be useful to anyone wishing to adopt this approach or address similar needs in their own programming courses. The necessary files may be obtained via download (see Section 4.2).

2. Technical description

In the description below, “Unix” refers to most Unix variants, including Linux.

2.1 E-mail front end

Figure 1 below shows the processing of a student’s e-mail addressed to the course account. Every e-mail receives a reply.

The sendmail Unix system program, which handles all incoming e-mail, looks in the destination account’s home directory to see if a .forward file exists. Since it does, the command found there is executed with the e-mail file piped into it.

For security reasons, users may be restricted from putting arbitrary commands in their .forward files. This restriction is enforced by having sendmail use smrsh (restricted shell for sendmail) in lieu of the ordinary shell sh. The system administrator must insert a link to slocal into smrsh’s directory (which is /etc/smrsh in Red Hat Linux) in order for the command in .forward to succeed.

The slocal system program provides an automatic mail filtering facility, taking its instructions from the .maildelivery file in the account’s home directory. This file can be used to direct to redirect e-mail based on its subject field. At this point, e-mail with the subject of “test” is routed to the mailtest script, and that having “submit” to the mailsubmit script. If neither subject is recognized, a warning message is immediately returned to the sender.

The mailtest script is written in csh; its processing is shown In Figure 2. The mhn system program is invoked to unpack the MIME-encoded attachments, which ought to be the student’s source files. These are unpacked in a fresh, uniquely-named subdirectory of /tmp so that the system can process multiple incoming e-mails concurrently without mutual interference.

The first word of the e-mail’s body is taken as the name of a test directory prepared by the instructor; if it does not exist, an error message is returned to the sender. The selected test directory will contain a cew makefile and basic acceptance tests (BATs) in a bats.script file prepared for the assignment. The makefile is now invoked. Cew requires running bats.script through the Unix m4 macro processor to generate C++ source code for each test case in the script. The resulting file is compiled together with the student’s source files using g++, and linked to create a test program. This test program is executed under control of the mailtest script in its own resource-limited process. Cew runs each test, keeping count of the number of tests passed or failed, and prints out a test log with details of failed test cases. C++ exceptions are caught (whether expected in a given test case or not), and out-of-memory conditions, segmentation faults (usually due to pointer abuse), and infinite looping are diagnosed for each individual test case. The mailtest script deletes the fresh /tmp subdirectory before exiting. Its stdout stream is piped by slocal (Figure 1) to the mail system program, which replies to the sender with the test results.

The mailsubmit script, shown in Figure 3, is invoked when “submit” is entered in the e-mail’s subject. In this case, the script extracts the first word of the e-mail’s body, which should be a student ID number (supposed to be kept secret). The instructor will have prepared a submission directory with an ar-style archive for each student registered in the course. If no archive matches the given number, an error message is returned. Otherwise, the attached source files are unpacked, using mhn as before, and inserted into the student’s archive to await marking.

Mailsubmit allows a student to e-mail submissions as often as desired, with newer files replacing older files of the same names. The script also creates a file in the student’s archive containing the e-mail address of the sender, in case a student later claims that someone (who must have known their ID number) maliciously overwrote their files with a bogus submission. This feature was advertised to the class, and possibly acted as a deterrent; in any event, no such cases occurred.

Finally, to help students verify which files they have submitted, an archive listing is performed showing filenames, dates, times, and lengths. The listing is e-mailed back as part of the submit log. Students are warned to check this reply carefully to be sure that nothing went wrong with their submission.

It is easy for the instructor to arrange to turn off submissions at the announced deadline. Just use the Unix at command to execute chmod on the submissions directory. When mailsubmit finds that access to this directory has been disabled, it replies that submissions are closed.

2.2 Setting up an assignment

The sections below describe the kind of testing that can be performed with cew, which should be kept in mind when formulating a programming assignment. Setting up a test script is also explained.

2.2.1 Functional testing using cew

The kind of functional testing that cew provides is ideally suited for three situations, the first occurring in procedural programming and the last two in OO programming:

1. Checking the output of a function called with certain arguments.

2. Checking the output of a member function (perhaps with arguments) of an object that has been created.

3. Checking that a member function called with certain arguments throws a particular exception.

The notion of “function output” of course includes a function’s return value, but can also encompass checking the value of variables used as output-type arguments.

Both stateless and “stateful” testing can be performed: A single test case can include a set-up stage—where local variables are defined, objects are created, and member functions are invoked—prior to obtaining the result that will be verified.

It is also possible to define global variables/objects that are in the scope of all test cases; however, changes to global variables made by one test case will not be seen by another. This is because every test case is executed in an independent Unix process using fork. This copies the memory of the parent process, so that global variables are in scope, but changes to the child’s memory are not reflected in the parent’s. A child process can “crash” and the cew parent will diagnose the reason as best it can, entering a failure message into the test log that will be e-mailed back to the student.

The instructor should keep the above “test palette” in mind when setting the terms of the assignment, so as to make it easy to test using cew. Unfortunately, GUI testing is ruled out (although the underlying functions can be tested before the GUI layer is put in place), if only because there is no way for the cew test program as it is currently constituted to turn execution over to an autonomous GUI framework (such as VC++ with MFC, or Tcl/Tk) and still retain the controllability and observability it needs.

2.2.2 Preparing the test scripts

Some or all of these chores can be farmed out to a competent teaching assistant, where available. Experience shows that they enjoy writing the model program and setting up the test cases “to torture the students.” The instructor should monitor the test cases, suggesting ones that have been overlooked, and keeping the degree of “torture” within bounds. This provides an excellent educational opportunity to help the TA understand how to set about black box testing in a sensible way, creating cases that exercise all the requirements of the specification without going over the same ground unnecessarily, and testing the boundaries of input and output ranges.

First of all, a subdirectory is created with the same name that the test will be called in e-mail messages (e.g., the assignment name or number). It is populated with a makefile and test scripts. The makefile is principally the boilerplate required by cew (running m4, running g++, etc.), but needs to be customized with the names of the source files that the students will be sending in. One may require the students to send needed .h files, or they can be provided in the test directory. Sometimes, students have been asked to submit a demo program, in which case the makefile must include the commands to compile it.

Instructors must then write BATs, which will be available for e-mail testing, and may wish to prepare additional RATs (Really Awful Tests) to be kept in reserve for use in marking. Typically, the BATs represent a level of minimal acceptable functionality that should be within the reach of an average student. The RATs test for odd or difficult cases that would be covered by the most diligent. The author’s practice is to give 0 marks if any test of the BATs is failed, or 10 marks if all are passed, then to deduct 1 mark for each RATs case failed.

Note that the test script contains the cew test program’s main() function. Students will need to write their own main() function to test their own code apart from cew, but they must not e-mail it for automated testing, otherwise they will receive an error message complaining of multiple definitions for main(). If they modularize their code, keeping their main() separate from the source files to be submitted, they can avoid this problem. This kind of transgression is good fodder for an “autotester FAQ.”

Four features of cew testing must be clearly understood and catered for, or frustrating problems may occur with the test script:

1. Since test scripts are passed through the m4 macro processor, the latter’s syntax must be respected. In particular, the number of comma-separated arguments in the test case macro invocation must be scrupulously observed. Each argument can be a single C++ statement (without trailing semicolon) or multiple statements separated by semicolons. Experience shows that problems creep in when copying and pasting the text of similar test cases.

2. Macro arguments are treated as text by m4, and are substituted into the midst of quoted strings in the cew program. This means that the arguments must not include double quotes, otherwise C++ syntax errors will result. Single quotes are OK. If quoted strings are needed in test cases, they must be initialized into variables or arrays defined for that purpose outside the macro invocations.

3. A test case provides for verifying only a single value (cew_Ncase macro) or checking for a single exception (cew_Ecase macro). Nonetheless, one can check multiple values by combining them in a Boolean conditional expression and then verifying if it equals “true”. This reduces the number of test cases, but also reduces visibility in the test log: The student sees that the output of the test case was “false” instead of the expected “true”, but cannot know which value was wrong. Of course, the student can investigate the matter by replicating the test case.

4. Verification is done by converting the expression under test and the expected value to the string data type, and then comparing strings. To be precise, these arguments appear in expressions of the form: <string object> + <argument>, utilizing the overloaded concatenation sense of “+”. This implies that arguments must be convertible to type string. The cew program provides a helper function to handle integer arguments (signature const string operator+(const string&, int). Values that the compiler can implicitly convert to int or string, including bool, should work transparently. For other values, and particularly user-defined classes, the script writer should define an additional operator+ helper function.

5. Floating point results should be compared within a tolerance. A test case based on comparing fabs(test_expression – expect_value)<tolerance with true will be satisfactory; however, the values will not print in the test log.

In all but the most trivial cases, the instructor will wish to debug the BATs against a model program before opening up testing to the class, and should be prepared to make changes to the BATs if problems come to light.

2.3 Marking the submissions

A number of csh scripts have been created to set up the necessary directories and files for testing and submission, open and close submissions, and run the RATs and BATs for marking purposes. These are listed in the table below:

Script Name	Purpose
opensub	create archives and set up to accept submissions
close	close submissions
procsub	process submissions to set up for marking
get	get one student’s submission; compile and run BATs, RATs, and demo
peek	look at student’s source files, applying given tab spacing
rat	used by marker for testing out rats.script prior to marking
scanarc	search archive for suspicious patterns (plagiarism check)

The get script is used to mark functionality, and peek is used to examine file headers, coding style, variable names, indentation, comments, and any other features that the instructor wishes to emphasize. The author tends to give 10 marks for functionality, as determined by the automated tests, and 10 marks for coding style, determined at the marker’s discretion. The marker can conveniently fill in a spreadsheet with the numerical marks and personal comments concerning tests failed and coding style.

After the assignment is marked, the BATs and RATs scripts may be made available so the students can confirm the failed tests for themselves. The author also favours making a model program available, ideally drawn from a superior student submission.

3. Discussion

3.1 Technical challenges

One might expect that testing code written for disparate C++ compilers running on diverse hardware and operating system platforms—such as Microsoft VC++, Borland C++Builder, and Gnu g++—would create compatibility problems. In fact, very few problems of that nature arose in compiling e-mailed code on Linux g++. The main trouble spot proved to be the precision of floating point calculations. This made it necessary for test scripts dependant on floating point results to validate output within a fairly generous tolerance, otherwise students would experience the confusion of receiving failure traces which they were unable to duplicate in their own environments.

The biggest challenge arose from the divergent operation of the students’ different e-mail clients. Students were strongly encouraged to use clients known to be compatible, such as Netscape and the university’s webmail facility (which they all had accounts on). Nonetheless, some persisted in using clients that were configured to send HTML messages by default—a egregious problem with Hotmail accounts. The varied clients took a variety of approaches to naming and flagging MIME-encoded attachments (thought to be an international standard). These two problems were largely overcome by making the mailtest and mailsubmit scripts smarter. If they detected the telltale tags of HTML in the e-mail body, they replied that the HTML feature should be turned off and the e-mail resent. Then the e-mail text was filtered through sed in an attempt to put the attachments into a canonical form that could be consistently unpacked by mhn. Finally, to be friendly to Eudora users on Macs, attachments found to be in binhex format were translated using hexbin.

3.2 Pedagogical questions

One question that arises when students have ready access to unlimited automatic instructor-provided testing facilities is whether they will take it as a crutch and not learn to test their own code. When external testing is easy, one can adopt siege tactics in lieu of attempting to deeply understand the specifications and debug one’s code using hypothesis testing, desk checking, and so on.

If this concern is strong, then the e-mail tester could be enhanced to put limits on the number of tests that any single student is allowed to run. This might force them to analyze their test results carefully and not squander the autotester resource.

On the whole, though, the author’s experience is that students without automated testing generally fail to test their code thoroughly, but instead jump to the conclusion that it works well enough. They discover the inadequacy of their own testing when surprised by low marks.

3.3 Testing other languages

While cew was developed specifically to test C++ programs, and is OO in its current design, it can be used successfully with C programs. Versions have been added for testing VHDL code and MC6HC11 assembly language programs, and these are in active use at Malaspina University-College. Furthermore, the approach, though not the literal cew code, should be adaptable to compilers that can be invoked from a command line. Most likely, it would work with Java, whether in the form of Sun’s JDK bytecode or Gnu’s compiled Java. The e-mail front end would require relatively little modification.

4. Conclusion

The e-mail front end to cew has been used successfully for two C++ courses at Trinity Western University, and for two courses at Kwantlen University College, one in C/C++ programming (based on Borland C++Builder) and the other in data structures and algorithms using C++. The autotester host was an x86 PC running Red Hat Linux. Experience has shown where some future work is needed, as described below.

4.1 Future enhancements

The most fussy aspect of the e-mail front end, in terms of day-to-day operation by the instructor, is that root access is occasionally required. In one case, this is probably unavoidable: the .maildelivery file must have a special set of permissions, or slocal considers it insecure and will not use it. But the main issue is changing the ownership of directories and files after submissions and in preparation for marking. Scripts make this relatively convenient, but the instructor still needs to type the root password, which is impractical in some system environments. It should be possible to obviate this need with more creative use of permissions, possibly employing groups and access control lists.

Aside from this, the most welcome area of enhancement would be clearer failure reports and feedback on submitted files leaving nothing to the students’ imagination. Despite having a FAQ on line, students are sometimes apt to misinterpret responses from the autotester, assuming, for instance, that their code has passed tests which it has not, or that their e-mailed files have been stored, when they have not.

There are currently no plans to incorporate automatic analysis of coding style, but this is another area that could prove very fruitful. As for detection of possible plagiarism, a basic scanarc script is provided to “grep” submissions for suspicious patterns, but it is preferable to separately e-mail all submitted source files in a batch to the Moss tester [6] for sophisticated analysis of matching code patterns.

4.2 Availability

A zip file with the e-mail front end scripts and the cew programs may be downloaded from the author’s website (see author contact information below; go to Research and Downloads link). The version of cew bundled with the e-mail front end has been modified slightly to return more information when a test crashes. For the official versions of cew sans e-mail front end, testing C++, VHDL, and MC68HC11 assembly code, contact Dr. Peter Walsh at Malaspina University-College (e-mail: pwalsh@csciun1.mala.bc.ca).

5. References

1. P. Walsh and J. Uhl. “cew: A C++ Component Exerciser Workbench.” Technical Report MC10, Department of Computing Science, Malaspina University-College, 1999.

2. P. Walsh and J. Uhl. “On the Benefits of Integrating Systematic Verification into CS1 and CS2.” The First Annual Consortium for Computing in Small Colleges Northwest Conference, Spokane, Washington, 1999. Postscript version at http://malun1.mala.bc.ca:8080/~pwalsh/research/spokane99.ps.

3. W. Gardner, R. Sutcliffe, and D. Ariel. “Refactoring the Teaching of Object-Oriented Programming.” WCCCE 2000, Kamloops, BC. HTML version at http://www.cs.ubc.ca/wccce/program00/sutcliffe/sutcliffe.html.

4. Deitel & Deitel. C++: How to Program. Second Edition. Prentice Hall, 1998.

5. Deitel & Deitel. Getting Started with Microsoft Visual C++ 6 with an Introduction to MFC. Prentice Hall, 2000. The book includes a CD-ROM with the student edition of Visual Studio.

6. Moss: A Measure of Software Similarity. The Moss site at U.C. Berkeley accepts batch submissions of students’ source files and ferrets out cases of plagiarism, e-mailing back a report. http://www.cs.berkeley.edu/~moss/general/moss.html

Author

William B. Gardner