Assessing Individual Contributions to Group Software Projects

W. Gardner

Dept. of Computing and Information Science

University of Guelph

ABSTRACT

A simple numerical peer evaluation procedure is described for assessing individual contributions to group software projects. The technique is described in the context of assessment objectives, and is contrasted with less satisfactory techniques. Full instructions for administering the technique are provided, along with ideas for motivating student buy-in and cooperation.

Keywords: software projects, group projects, assessment methods

1. Introduction

Most computer science curricula include one or more software projects that are carried out by groups of students working in teams. Despite the significant challenges involved for the instructor in preparing, guiding, and assessing such projects, faculty tend to feel that group projects are indispensable for a variety of sound reasons (some listed farther below).

A key problem that arises with group projects is how to assess them fairly. One can simply assign the same grade to every member of a project group, but it is difficult to feel an easy conscience about this degree of expediency. The instructor is turning a blind eye to inaccuracy in his or her assessment mechanism, and the students can rightfully object that neither are hard workers being duly rewarded nor slackers penalized. Clearly, some fair means of assessing individual contributions is needed.

The author has been teaching software engineering and other computer science courses with group project components, while experimenting with different assessment techniques. The purpose of this paper is to share the experiences of what did not work well, and, more important, to describe a technique found to be highly satisfactory. This technique is described in enough detail, with supporting materials, so that anyone may put it into practice. Before describing the assessment techniques, the background section following will give a rationale for utilizing group projects, examine what aspects of individual project work are open to assessment, and identify objectives for candidate assessment techniques to meet.

Note on terminology: The terms “assessment” and “evaluation” are used interchangeably. Canadians tend to use the term “mark” (a marker marks an assignment, gives 80 out of 100 marks, a student’s mark for the project) where Americans would say “grade” (a grader grades an assignment, a student’s grade for the project) and count “points” (80 points). These terms are also used interchangeably, without any intended shades of meaning.

2. Background

2.1 Why group software projects?

Many benefits can be obtained through organizing students to carry out software projects in groups or teams, among them the following:

1. Group projects allow students to participate in developing software of much larger scope than they can by working alone. This is more true-to-life than writing “toy” programs, and some activities of the software engineering process can only be motivated by applying them to larger designs. Students completing a group project may feel great pride in seeing what “we” have accomplished.

2. Some software engineering activities are difficult for individuals to carry out effectively on their own (e.g., requirements elicitation, CRC cards), and most activities benefit greatly from group review (e.g., design review, code review).

3. Working in groups helps weaker students learn theory and practical skills, because the stronger students in a well-functioning group will explain the theory and demonstrate the skills. This benefit can be enhanced by intentionally seeding the groups with students at different levels of competence.

4. Group work develops vital interpersonal communication and relational skills that will be essential for success in students’ future careers.

5. Group projects reflect the style of work that graduates will encounter in many companies, so they will benefit by getting prior exposure during their education.

Since group projects typically are associated with a large fraction of a course grade, even over 50%, the issue of fairly evaluating the students’ work is no small matter. The following section looks at qualities that assessment methods should be capable of measuring.

2.2 Assessable aspects of individual work

What is an instructor looking for in the way of assessment methodologies? Before listing some objectives, we should attempt to analyze the nature of an “individual contribution” to a group effort.

First, we take for granted that the instructor has some accurate means of assessing the quality of the group’s work products, so that assigning a group mark is not an issue. Then, turning to the individual dimension, there are several aspects one can focus on:

1. Individual’s time devoted to the project: This includes time spent working alone or with other group members on delegated tasks, attending group meetings (whether in person or by Internet chat), and otherwise communicating with group members (telephone, e-mail, bulletin boards).

2. Individual’s quality of effort: This aspect is intended to encompass “attitude” and the spirit of being a “team player.” It distinguishes members who may dutifully come to every meeting but sit there saying nothing, from those who come prepared to show what they’ve done, to actively discuss and to plan. It distinguishes those who do nothing unless having been ordered and badgered, from those who sacrifice their personal convenience, eagerly look for tasks, “fight fires,” and volunteer to do whatever is needed, especially in the face of looming deadlines.

3. Individual’s quality of result: This is the objective evaluation of what a member produces (e.g., a written section of a design document, a software module, the final editing of a user’s manual), regardless how much or how little time and effort were needed to produce it.

It is fair to note that our typical assessment techniques only tend to measure quality of result. For example, if a student’s program works correctly, we don’t care how many hours it took them to debug it. Conversely, if it doesn’t work, few instructors would give someone more marks simply because they plead, “I worked 50 hours on this program!” Nonetheless, for the purpose of assessing an individual’s performance in a group, the first two aspects of time and effort should not be neglected. They are extremely germane to the exercise of working in a team, and if they are lacking, it is doubtful whether the individual is experiencing many of the benefits that group projects are intended to convey.

2.3 Objectives for assessment methodology

Here are some objectives that any proposed technique arguably should meet:

1. It should yield a result that is likely to be correlated with some or all of the assessable aspects listed above.

2. Students should feel that the assessment process is transparent and its results are fair, and that the instructor’s responsibility to deliver the official assessment has been carried out. Even if peer evaluation techniques are employed, there should be no sense that the instructor’s assessment authority has been largely abdicated to the students, otherwise legitimacy of the assessment may be undermined.

3. Application of individual contribution assessment techniques should yield “appropriate” variance compared to simply giving every group member the same mark. “Appropriate” means, on the one hand, that the variance should be large enough to be viewed by everyone as worth the trouble, and not smack of a cosmetic exercise having negligible impact. On the other hand, the variance should not be so large as to become difficult for the instructor to justify, which could end up triggering grade appeals.

4. Complexity of any manual input processing and calculation should not be unduly burdensome, say O(n) with the size of the class.

3. Unsatisfactory approaches

These unsatisfactory experiences are worth studying, if only to avoid repeating the mistakes. But it is also possible that other practitioners could adapt and refine these techniques to create more workable and appealing methods.

Before getting into the evaluation methods per se, the context of the project courses will be described.

3.1 Course organization

The courses in question were second- and third-year software engineering courses, with projects covering requirements analysis, software design, implementation, and testing, carried out over two semesters for live clients. As is typical in such courses, milestones were set up based on formal documents and other assessable work products such as those modeled in a textbook like [3]: product request, draft requirements specification, SRS (software requirements specification), SDS (software design specification), demonstration and public presentation, and project report (updated SDS plus user’s manual and project experience reports). Groups were required to follow a rigid, uniform template for the principle sections of their documents.

All of the above were easiest to assess corporately, i.e., as products of the entire group. The project was worth 35% in the first semester, and 65% in the second semester where the focus was much more on implementation and testing, and less on learning course material.

The author prefers to appoint managers for project groups, based on demonstrated competence in precursor courses and on professed willingness to give managing a try. Thus, managers become the main interface between the groups and the instructor. Managers are told that their role is not to “do the lion’s share of the work.” If they do relatively little of the writing or coding, but succeed in delegating the work evenly among the team members, meanwhile scheduling and coordinating everything to a brilliant conclusion, then they will be excellent managers. As an encouragement to take up this burden, they are rewarded with a “manager’s bonus”: the manager’s project grade will not be less than the highest team member’s.

Following manager selection, team members are assigned by the instructor so as to achieve a mix of skill levels in each group. The objective is to avoid friends clubbing together, self-selection of “star” teams, ostracism of reputed “losers,” congealing of “dud” teams from the leftovers, and linguistic-based groupings that may escape having to function in English. One hope expressed above is that weaker students may learn from the strong ones in each group. This process of team composition can optimize educational “goodness” for the class as a whole, at the price of individual high-achievers feeling annoyed at being shackled with the company of lesser mortals. This organizational style is also justified as throwing students into something resembling a real-life work environment where employees can rarely choose with whom they will work on a project.

It is worth noting in passing that school-based project groups have one very non-real-life drawback: the evidently NP-complete problem of busy students dovetailing their schedules! In a normal workplace, they would likely be together five days a week, but students, especially commuters, may be forced to utilize alternative collaboration technologies in order to “meet”—e-mail, Internet chat, etc. This is unavoidable but regrettable, and has been found to be a significant source of inefficiency, and a contributing factor to much group tension and dysfunction.

Now two unsatisfactory assessment techniques will be introduced, based on attribution of work products to individuals and on peer evaluation, respectively.

3.2 Fine-grained attribution of work products

The first technique attempts to trace the principle authorship of small portions of work products (including the deliverable software as a whole) back to the responsible individuals. Intuitively, this should be much more accurate than giving everyone the same mark!

Managers are required, prior to the instructor’s evaluation of the work products, to identify on a spreadsheet row opposite each document section, the members who contributed to it: Members contributing 1/3 or more of the section get 2 marks, <1/3 get 1 mark, with 0 for no contribution to that section. After evaluating each section, spreadsheet formulas are used to calculate a weighted average for each member, multiplying a section’s mark by the manager-assigned weighting of 0-2. These scores are then scaled to 0-100. The result is that a student who contributed substantially to the best sections would get a higher score, and similarly, any blame could also be fairly apportioned. Students who did barely any work would get low scores by virtue of the 0-1 weights.

In the event, the scores across all the teams (class size of 19 in 5 teams) fell in a narrow range of 74 to 79 out of 100. A student’s project grade was then composed of ¼ of his/her individual score plus ¾ of the group score. The surprising result of all this fevered calculation was a total variance of 1 point!

This method requires a fine-grained assessment of sections of documents, which may not be to every instructor’s taste. Furthermore, it relies heavily on input from only one source, the manager, which at least provides a way to settle possible conflicting claims from members concerning their own contributions. While appealing to a sense of fairness and accuracy, this method yielded too little variance to be useful, and was a nuisance to calculate. It was not used again.

3.3 Peer evaluation with many inputs

Looking for another approach, peer evaluation was given a try. Work products were evaluated on a group basis as above, but no attempt was made to attribute portions to individuals. Instead, team members were required to submit a 5-question survey evaluating each member of their group, including themselves. Thus, each member in a 6-person group would turn in 6 complete surveys. The questions are shown in the table below:

Q1 Did the member do an appropriate quantity of work?	Scale for Q1-Q4 5 Outstanding! Super asset to team 4 Good solid effort; took initiative 3 OK, but nothing special 2 Some obvious shortcomings 1 Better off without member, in this regard
Q2 How about the quality of the member’s work?
Q3 Rate the member’s attitude as a team player (eager to do assigned work, communicated with others, kept appointments, etc.).
Q4 Rate the overall value of the member’s technical contribution.
Q5 Would you want to work with this person on a project again?	5 I’d consider myself lucky! 4 Yes, I have no reservations 3 OK, but not my first choice 2 Only if no one else available 1 Definitely not

Note how the numerical scales were tied to explicit qualitative descriptions in hopes of getting uniform application of the scale. Despite this, inspection of the results showed that some students tended to take an optimistic view and give everyone in their group high numbers, while others showed the opposite trend. This phenomenon was identified by taking the “personal average” A_s of all the numbers coming from a given student s. Of the “optimistic” type, one student simply gave everyone “5” for all questions, and others had averages of 4.9 and 4.8. In contrast, “pessimists” exhibited averages as low as 3.6. Optimists and pessimists arose in the same groups, so it seemed difficult to support a theory that wonderful groups naturally had optimistic numbers, and lousy groups the opposite. Instead, these trends seemed to represent the internal “calibration” of the evaluators, despite an attempt at imposing externally calibrated scales.

The original intention was to average together all the numbers evaluating a particular team member in order to arrive at a single overall rating R_s pertaining to member s of 1-5. But to correct for the “personal calibration” phenomenon, normalization was undertaken by these operations: First, all the A_s in a group were averaged to give a “norm,” which might be interpreted as the group’s overall feeling about themselves, and then d_s was calculated as the norm minus A_s. This delta was considered to represent a personal calibration bias of student s relative to the group’s norm. Then, before averaging the ratings for a given member, the ratings from student s were corrected by subtracting his or her calibration bias, d_s. Finally, the 1-5 rating was scaled to 100 points.

For example, before normalizing, the scaled ratings R_s for 4 members of one team were {97, 82, 89, 71}. After normalizing, the ratings became {96, 86, 94, 76}. This process did yield some significant variance: The top-rated student in the class received R_s=98, and the lowest got 76, with an average of 88.

The last step is to calculate the student’s net project grade composed of 30% of his/her individual rating plus 70% of the group score. (Note that the individual evaluation component was being raised from the first semester’s 25% in an attempt to give it more significance.) In each project group, the point spread on a 100-point scale from high member to low member was 1-6 points, or ±3% relative to the group mark, with the average maximum spread being about 5 points.

The above technique succeeded in producing a greater variance than the fine-grained attribution technique, but at the cost of vastly greater and error-prone calculations, including some perhaps dubious statistical manipulations. Because of the multiple passes over the inputs and the normalization calculation, the processing would become quite wearisome for a large class.

4. Successful technique based on peer evaluation

Casting about for yet another approach resulted in consulting some literature on assessment techniques for groups [1][2]. A number of arguments can be advanced in favour of utilizing peer evaluation, but the bottom line is that the students themselves “are often the only ones who really know what contribution was made by each group member” [1]. The specific technique called “share-out” is suggested as “the simplest way,” and it appeared that by hedging it around with enough conditions to discourage abuse, it was worth trying. The modified technique and the results of applying it are described in the following sections.

4.1 Single input of “credit deserved”

The essence of share-out is that each student is conceptually given a pot of points, and must allocate them among their peers based on some stated criteria. The allocation then represents that student’s numerical evaluation of his or her team members.

In order to provide a straightforward, effective criterion for the sharing out, each student was given 100 points and told to allocate them according to the amount of credit each team member deserved in terms of the total project work. The term “credit” was intentionally selected to be both intuitive and elastic, a single metric that could easily serve as a proxy for time and quality of effort. In terms of quality of result, since the final project submission had not been marked when the peer evaluation was carried out, team members could not know precisely how the instructor would assess it, but since several earlier work products had been marked, the students could have a pretty good notion of whether the project was a success or not, as well as who was chiefly responsible for the result.

To turn this numerical allocation into a project grade, the numbers allocated to student s by his or her team members are averaged, and the resulting number converted to a personal adjustment factor A_s so that 100% represents a normally expected contribution, >100% is praiseworthy, and <100% shows the team members’ disapproval. A_s is multiplied by the team grade T to arrive at an individual’s project grade.

Thus, in a 5-person team, if everyone was thought to have contributed the same work and been worthy of the same credit, they would each be allocated 20 points. Therefore 100/N (here N=5) represents a “normal” contribution, so 20 should map to A_s=100%. By extension, students were told that someone effectively performing the work of two members should get twice the points, and one performing half the expected work should get half the points.

We may here observe two great strengths of the share-out technique: First, its fixed sum constraint is essentially self-calibrating. Unlike the technique described in the preceding section, both optimists and pessimists will end up allocating 100/N for normally expected contributions. Second, the fixed sum prevents the kind of collusion that can occur with the previous technique, whereby team members can inflate all their grades simply by agreeing to give each other all 5’s.

Two further refinements were introduced to improve accuracy:

1. Students were required to give themselves at least 100/N points, in recognition that there would likely be little enthusiasm for self-incrimination, but then the self-evaluations were not averaged in with those of the team members. With this constraint in place, ethical struggles over whether to admit one’s own poor performance were shunted aside. If some students really performed poorly, their team members could do the dirty work; their own confessions were not required, nor could their grade be inflated by making phony claims. This meant that each student was strictly at the mercy of his or her team members, and had to be seen to be contributing, which is what peer evaluation is all about.

2. Since the managers were in the best position to know most clearly everyone’s contribution and true degree of cooperation, the manager’s evaluation was given double weight.

Two choices remained: (a) what range to allow for A_s, and (b) how to calculate A_s as a function of the shared out points. These choices presented an opportunity to obtain “buy in” from the class, a step encouraged by [2]. That negotiation is described next.

4.2 Negotiating the evaluation parameters with the class

All the details of the above algorithm were explained to the students, then they were asked to discuss appropriate upper and lower bounds on the peer-evaluated adjustment factor. A demand for wide bounds would represent relative confidence in the evaluation process and eagerness to see marks meted out fairly in the team. Conversely, a desire for narrow bounds could reflect a sense that the process might be less than fair.

The instructor proposed bounds of ±10% on either side of the team grade T, but this was overturned by a class vote, and ±15% (A_s =85%—115%) was adopted instead. Likely because of the instructor’s offering the class this chance to negotiate some parameters of the process, there were subsequently no complaints about the results of peer evaluation.

4.3 Collecting the peer evaluations

Precise instructions (reproduced in the Appendix) were posted on the course website, and students were told to e-mail their peer evaluations in confidence to the instructor by a deadline (the last day of classes). This was before the final project submissions were assessed, but after earlier work products, including the demonstrations and presentations, had been marked. Thus, all team members would have formed realistic ideas of how the project had gone and who deserved credit for its successes or failures.

E-mail was an extremely easy method to process, since it simply involved copying the given numbers into a spreadsheet. The student e-mails were kept confidential, and they self-validated the sender. In case of forged e-mails purporting to come from a certain team member (none were received), it would have been easy to check back with the supposed sender.

As an inducement to cooperate, students were told that if they did not submit the numbers, their maximum adjustment factor would be 95%, representing a potential penalty of 5% below the team grade. All but one student sent in their peer evaluations.

4.4 Calculating the grades

Constructing an Excel spreadsheet to record the peer evaluations and compute the A factors was not difficult. The sample spreadsheet below shows the actual numbers turned in by a 5-member team (which turned out to have one “dud” member); it can also be downloaded here.

	Manager's bonus: gets max factor						10	40	A=0.65 + 0.0225x - 0.00025x²
	Eval by						85%	115%
Eval of	Peter	Paul	Marie	Robin	Bozo	Average	Floor	Ceiling	A Factor	Mgr Bonus
Peter	20	25	25	14	20	21.0	21.0	21.0	101.2%	107.2%
Paul	28	26	27	30	20	26.6	26.6	26.6	107.2%
Marie	28	26	25	28	20	26.0	26.0	26.0	106.6%
Robin	22	20	20	28	20	20.8	20.8	20.8	101.0%
Bozo	2	3	3	0	20	2.0	10.0	10.0	85.0%
Sum	100	100	100	100	100				100.2%

The peer evaluation numbers coming from one member, say Marie, were entered under her column opposite the team members she is evaluating. The Sum row verifies that each student has followed the instructions to make the points add to 100. The yellow cells indicate self-evaluations and are not taken into the average. The green column is the manager’s evaluations, which are counted in the average twice.

Floor and ceiling are applied to each average, which serves to align the input assumption—that a reasonable range of contributions is from half-strength (10) to double-strength (20)—with the output’s agreed range (85-115%). This number is then converted to a peer evaluation factor by the indicated formula. Coefficients were derived to give a smooth function through points (10,85%), (20,100%), and (40,115%). Once the spreadsheet is set up, the amount of manual processing just involves entering the columns of data, which is linear with the class size.

In this example, Bozo reported that all members contributed equally, but his team members strongly disagreed. Bozo was thought to have contributed practically nothing, and the floor constraint operated to keep his grade at the lowest amount agreed by the class, 85%. In contrast, they considered that Paul and Marie deserved extra credit, and their factors >100% produced that effect. For this team, the total spread from low to high was 22%, which is much greater than the variance produced by the unsatisfactory methods, and still justifiable.

It can be observed that the average of the factors, here 100.2% in the “Sum” row, is close to 100%, which is a result of the share-out method. It shows how the method hardly inflates or shrinks the size of the team grade “pie.”

4.5 Dealing with challenges

It was anticipated that challenges would arise when the results became known. Three goals in handling these were (a) preserving evaluator anonymity (to prevent possible backlash), (b) providing legitimate accountability, and (c) discouraging frivolous inquiries. The instructor announced that anyone could obtain his or her team mates’ anonymous evaluations in person, and that the raw numbers would simply be read out with no identification of source. The numbers would not be posted on the Internet or bulletin board, nor casually e-mailed. This allowed students concerned about a low adjustment factor to find out what their peers assigned them, without singling out any individuals for reprisal. It required initiative to obtain the numbers, and the instructor found that only two students, both having low factors, actually made inquiries. In both cases, as soon as they heard the numbers their peers had given them, they had nothing more to say.

4.6 Providing for exceptional cases

As noted above, the peer evaluation process was intended to deal with a typical range of individual contributions. It was not the instructor’s intention to allow students to deal out very low scores to their classmates, possibly triggering course failure, without some element of due process and the instructor’s hands-on involvement. Thus, in addition to the peer evaluation scheme, another mechanism was needed to handle exceptional cases, i.e., students whose performance was so poor that their team mates could not bear to see them receive even 85% of the group project grade. Still, it was the instructor’s hope was that unhappy team members would be satisfied with imposing a 15% penalty in most cases and that it would prove unnecessary to invoke “exception processing” very often.

Students who do little or no work in a project group, sloughing off their responsibilities on others, are sometimes called “hitchhikers” or “couch potatoes” [4]. Naturally, they are deeply resented by their team mates. The best thing is to prevent hitchhiking behaviour by warning students up front that it will have serious consequences, and then by keeping the channels of communication open between the instructor and the managers. Some instructors may wish to carry out peer evaluation several times during the course of a project as a way of detecting emergent hitchhiking patterns.

What about allowing groups to “fire” a hopeless member? Some people favour firing as a sharp taste of the real world of employment, but the analogy is strained. Employees are paid to work, whereas students are paying tuition to be educated.

The author does not allow firing for several reasons: First, a fired student will become a burden on the instructor, who will then need to find something for the student to do that is comparable in weight to the abandoned project. That “something else” will need to be assessed using different techniques than the group project, and this assessment will create extra work for the instructor. Second, the group will be left short-handed, whereas it would be far better if they could settle their differences, perhaps with the instructor’s mediation, and learn to work together. They may be less apt to work at reconciliation if they know firing is an option. Third, in a project course, participating in a group project is the very essence of the course. If students can effectively opt out by getting themselves fired, hoping to carry out some less troublesome individual assignment instead of the messy group work, is it honest to let those students pass, or should they be required to repeat the course and attempt the group work again?

In conjunction with the peer evaluation scheme, students were told that managers would be allowed to bring exceptional cases to the instructor’s attention. In such cases, the manager would need to prepare a written statement documenting the unsatisfactory performance of the exceptional member. Meanwhile, the accused member would also prepare a statement in his or her own defence. Then the two statements would be exchanged, and a hearing set up for the instructor to ascertain the facts surrounding any conflicting stories. Finally, the instructor would issue a written decision on what share of the group’s project grade the accused team member deserved; i.e., the instructor determines the member’s adjustment factor directly. This could result in any factor that the instructor can justify, which in severe cases could mean failing the course.

This rather pedantic procedure has at least 3 benefits: First, it is onerous enough that only the most unhappy teams will bother to invoke it; most would prefer to utilize the peer evaluation route. Second, it is likely to result in a fair and accurate outcome: an intentional hitchhiker will get what he or she deserves, whereas less guilty parties will be shielded from the wrath of a dysfunctional team looking for a scapegoat. Third, the statements submitted by both parties and the instructor’s written decision provide all the documentation needed to justify the outcome should the grade be appealed.

In the semester where successful peer evaluation was applied, the instructor learned of three groups that claimed to have one “dud” member. Only in one case did the manager and team feel they needed to raise an exception. Written statements were solicited and the graduate teaching assistant, who met with the groups almost weekly and kept tabs on their progress, was asked to attend the hearing, both to provide input and to act as a witness and calming influence. The result of the investigation was that the accused team member was assigned an adjustment factor of 30%. This outcome was not appealed. In the other two cases, the teams were content to let the peer evaluation take its course, and the dud members experienced penalties of 14-15%.

5. Conclusion

As a result of the experiments described above, the single-input peer evaluation technique was judged successful and will be used in the future. Student acceptance of this assessment method appeared to be widespread, as evidenced by receiving zero appeals of project grades, even in cases where individuals received grades at the low end of the possible range.

Where details of groups’ inner workings were known to the instructor, it seemed that the peer evaluation technique was up to the challenge: In a semi-dysfunctional group that was nearly falling apart due to haphazard leadership and personality conflicts, one member was recognized by the manager and some other members as has having made an outstanding contribution. Even though the group’s mark was relatively low, the member’s peer evaluation was so strong that it boosted her mark significantly, thereby largely making up for the fact that she landed chance in a problematic group. Such an outcome was considerably more equitable that just giving her the group grade.

Variations on this technique are possible. For example, a colleague teaching a similar course is performing this kind of peer evaluation after each of three major project milestones rather than just once at the end of the project. Similarly, it would be easy to tailor the ceiling and floor, the mapping function, and the maximum spread of factor A to suit different philosophies.

It is hoped that with a solid way of assessing individual contributions, both instructors and students can be more confident and enthusiastic about group software projects.

6. References

1. S. Brown, C. Rust, and G. Gibbs. Strategies for Diversifying Assessment in Higher Education. Oxford Centre for Staff Development, 1994.

2. S. Brown, P. Race, and B. Smith. 500 Tips on Assessment. London: Kogan Page Ltd., 1997.

3. R. Pressman. Software Engineering, A Practitioner's Approach, Fifth Edition, McGraw-Hill, 2001.

4. B. Oakley, "It Takes Two to Tango," Journal of Student Centered Learning, Volume 1, Issue 1, 2003, pg 19-28. New Forum Press. A shortened version of this article is available under the title “Coping with Hitchhikers and Couch Potatoes on Teams” from the Tomorrow’s Professor mailing list archive http://ctl.stanford.edu/Tomprof/postings/441.html. It is recommended reading for instructors and students alike.

7. Appendix: Wording of peer evaluation instructions

These instructions seem excessively complex, though the underlying concepts of the constrained share-out are not. There should be a better way to boil down the essentials while leaving enough verbiage to be unambiguous.

You will have the opportunity to evaluate the relative contributions of your team members toward the entire project effort. The simple method is as follows:

Take 100 points, and divide them among the N team members, including yourself. Give points based on your opinion of what proportion of the credit each member deserves. You may consider quality and quantity of contributions, team-player attitude, and/or any aspects that you feel are relevant. You must give yourself at least 100/N points, whether you honestly feel you deserve them or not. (This is to avoid an unrealistic “self-incrimination” requirement.)

E-mail the Instructor your list of team members and point breakdown by [the last day of classes]. Please wait until after your presentation, in case that affects your evaluation. Team members who otherwise feel shaky may wish to redeem themselves by making an outstanding presentation.

Each member’s total project grade will be the team grade T, multiplied by a peer-evaluated adjustment factor A in the range of 85% to 115%. In the base case where everyone in the team is allocated the same share of points, A=100%, so there’s no adjustment and everyone just gets T. In the unlikely case where everyone agrees that member M deserves 100% of the credit, M will get T*115% and the others T*85%. Members failing to submit peer evaluations will receive A not exceeding 95%.

Adjustment factor A for member X will be averaged from all members except X, and then scaled so that the range 50/N to 200/N points maps onto 85-115%. (This is to avoid the degenerate case where the members all give themselves 100 points, and to recognize that a likely range of contributions is “half as much” to “twice as much” as expected.) The implication is that teammates must recognize the value of your contributions! The manager’s evaluation will be given double weight. You may apply in person at the Instructor’s office to see your N numbers (without names attached), but they will not be posted publicly. Other requests for disclosure, such as via e-mail, will not receive a response.

Obviously, this simple method breaks down in extreme cases (such as where a member does virtually no work), and those can be handled on an exceptional basis by the manager’s reporting to the Instructor. Such exception reports must be received by [the day after the end of classes], or the peer-submitted evaluations will be taken as final. Drastic exceptions will result in interviews with the down-rated team member, so that anyone so accused will have a chance to defend themselves. The Instructor’s judgment in such cases is final. Reports to the effect that “member Y did nearly everything” will not be accepted, since that will be considered a management failure. For example, if the manager accurately claims to have done the entire project without assistance, the team will share out the grade nonetheless.

Finally, under a provision called the “manager bonus,” the manager’s project grade will not be lower than the highest team member’s grade. This is to compensate the manager for being willing to take on the risk of running the project, along with the additional stresses and responsibilities of this role. If there appears to be collusion within a team to obtain higher evaluations by manipulating the manager bonus—i.e., artificially depressing the manager’s evaluation without justification—the Instructor may declare the peer evaluations invalid. In that case, all team members will be given adjustment factors of 100%.

Author

William B. Gardner

Dept. of Computing and Information Science

University of Guelph

Guelph, ON N1G 2W1

wgardner@cis.uoguelph.ca

http://www.cis.uoguelph.ca/~wgardner