CGI vs. PLT web-server

Abstract

@@@ CGI old and hoary, page-oriented; PLT new and whizzy, does cool things

The goal of this project was to evaluate PLT web-server in the context of an actual application, and to compare that experience to my past experience of developing with CGI. In particular, PLT was advertised as:

being more modular
handling persistence more cleanly

with an implication that development would be faster.

This project was not designed to evaluate PLTWS against other continuously-running web application frameworks like FastCGI, ColdFusion, or Active Server Pages.

Related work

To-do list managers

For this project, I developed a prototype of a to-do list manager.

To-do list managers have proliferated widely. Radicati recently reported that two groupware tools with integrated to-do list management, Microsoft Outlook and Lotus Notes, are on track to have 126 million and 88 million users, respectively, in 2005[1]. Personal Digital Assistants, like those from Palm and Blackberry, will have sold approximately 15 million units by the end of 2005[2]. However, Bellotti et al[3] found that few people actually use electronic task management tools, preferring instead to use post-it notes, index cards, notebooks, electronic mail, and calendars to manage their tasks. In their study, they found that only 6.7% of to-do items were logged in to-do applications.

While I have not found research to support or refute it, I believe that electronic to-do list managers suffer from an overabundance of information. Tasks that cannot (or should not) be started yet can crowd out more immediate tasks.

Tools

CGI

CGI, the Common Gateway Interface, was developed by Rob McCool [CGI@@@] in 1993[CGIlist@@@] to aid development of dynamic web content. It is conceptually extremely simple:

Each time a client requests a URL from the server, a new process is spawned.
A number of parameters from the request are put into environment variables (or, in one special case, to standard in).
The process executes, using information from the environment variables. Any characters sent to standard out of the program are captured and piped to the Web browser.
The process terminates.

One of the most important features of CGI is that it is stateless. All information that goes from one web page to another must go through either the user's browser or out to persistent storage. State information that needs to persist across a multi-page transaction must be encapsulated in URLs and embedded in URLs in Web pages, bookmarks, and/or the user's browser history or cookies, or written on the server's file system.

Because CGI input is exclusively environment variables and standard in, CGI is language-neutral. Any language that can read environment variables and write to standard out can act as a CGI program. Because it is so useful (and so simple), almost all servers have support CGI natively.

Embedding state into URLs is annoying, tedious, and very important to get right. In order to not interfere with standard HTML, HTML meta-symbols (e.g. greater-than, less-than, space, and ampersand) must be escaped before they are released to the client. Because HTTP uses 7-bit ASCII, any characters outside of 7-bit ASCII must also be encoded.

This escaping and encoding must be reversed when a parameter enters the system, and carefully: any control sequences that enter a computer are a possible transmission vector for malware. Whenever a variable that derives from user input is used in any sort of system calls, there is a danger that escape characters in the string could be used to obtain unauthorized access to the system. For example, if a parameter is used for the name of a temp file in the current working directory, a backslash in the name could allow the user to see other parts of the file system (e.g. "/etc/passwd").

There are many libraries (e.g. the perl MIME::Base64 package[MIMEperl]) and tools (e.g. taintperl[Taintperl]) that help with the encoding/decoding process, but it is still an area of significant effort and concern.

PLT web-server

The PLT web-server, by contrast, is very stateful. Each page is a new continuation; in practice, moving from one page to another corresponds to a calling a procedure. State is held in global variables or parameters passed from function to function. (In reality, some state information is held in URLs in the user's browser, but that is almost invisible to the application developer.) @@@ references, related work

This has several implications:

Initialization code relating to an application or to a transaction in

that application only needs to happen once.

Fixing memory leaks is critical, not optional, in long-running applications. With CGI, the damage from memory leaks is limited: each process only lives a small number of seconds. With PLTWS, each process can gather and hoard resources for days.
System crashes obliterate state, causing PLTWS to return to the original state of the process. Because CGI has to keep state persistently in URLs and file systems, it is immune to crashes.
The overhead of encoding and decoding parameters when it crosses to and from client and server -- and the concommittant danger

of security breaches if it is done incorrectly -- vanishes.

Implementation

I developed a prototype of a to-do list manager using PLTWS. While there are not many web pages, there are a lot of interactions between state variables in interesting ways.

The main page of the application is the list of tasks, as shown in Figure 1. (showEverything.gif) In Figure 1, all tasks are visible in the top portion. Under the tasks is a text box for adding tasks, and below that are controls for hiding and showing tasks based on several different criteria.

Tasks can be "deferred", as shown in Figure 2 (hideDeferred.gif). After "Hide deferred" is clicked, all tasks whose "hide-until" date are later than the current time are hidden. When deferred tasks are visible (e.g. the "Show all deferred" link has been clicked), they have "Undefer" next to them instead of "Defer".

Tasks that have been completed have a strike through them. Completed tasks can be hidden by clicking on the "Hide completed" link, as shown in Figure 3 (hideCompleted.gif).

Tasks which can be acted upon -- ones that do not have pending subtasks and which are not yet completed -- have a "Done" link at the end of that line. Clicking on the "Done" link puts a strike through that task and removes the "Defer" and "Done" links.

Tasks that cannot be acted upon can be hidden by clicking on the "Hide all supertasks" link, as shown in Figure 4 (hideDependents.gif). Note that a blocking task is still indented, reminding the user that other tasks depend upon its completion.

When all of a blocked task's subtasks are done, then that task becomes active: "Done" and "Defer" appear next to it, and if it was hidden by "Hide all supertasks", it becomes active, as shown in the sequence shown by Figures 5, 6, and 7. (doneFirst.gif, doneSecond.gif, doneThird.gif)

Notice that some tasks are displayed with different saturation: tasks with higher importance are less saturated and hence more visible.

Clicking on the short task description allows the user to see more detailed information about the task, and to edit it, as shown in Figure 8. (editPage.gif) A briefer page allows the user to defer the task in a similar way, as seen in Figure 9. (deferTask.gif)

@@@ more about code struture here

Results

Tool ecosystem

There were a number of significant areas where CGI's greater longevity and popularity made it much more compelling than PLTWS. The ecosystem surrounding CGI is much more abundant and mature.

This is perhaps not PLTWS's "fault" -- CGI had a @@@-year head start -- but it is a reality which indelibly colours the experience of using PLTWS or CGI.

Documentation.

Finding good documentation -- including example code -- was a major issue. CGI is very wide-spread, well-documented, and -- because it is conceptually so simple -- easy to explain. PLTWS is harder to explain and has much less documentation. I spent a lot of time trying to figure out how to do things.

One example of a lack was so trivial it was quite depressing. I was having trouble tracking down a bug; it was clear that there was something I was not understanding, and one of the areas that I didn't understand was the difference between send/suspend and send/suspend/callback. I looked all over for information about the difference, and finally asked a Scheme guru with whom I had a personal connection. He told me that the differences were so minor and buried so deep in the implementation that I didn't need to worry about it. This allowed me to stop wasting time on that blind alley; had I not had access to a guru, I would have spent even more time for wont of better documentation.

There was also an issue which might have been documentation or might have been a bug. While normally in HTML, tags have attributes in the form foo="bar", radio buttons and check boxes have an exception. To specify that a radio button or checkbox is checked, the developer must specify the attribute "checked" without a value. I could find no way to specify an attribute without a value. I tried to drop down into raw HTML text, e.g. <input name="foo" checked> but found to my dismay that the server helpfully escaped the metacharacters for me.

I did eventually find a workaround, but it is fragile. It depends upon the browser treating checked="true" the same as checked. While Firefox exhibits this behavior, it is not guaranteed that other browsers will.

Infrastructure

It is normal for web hosting services to provide access to CGI and Apache. Conversely, it is not normal for web hosting services to run PLT web-server. While my web hosting provider (Dreamhost) originally said that they would allow me to run PLTWS, they later reversed themselves.

Installation

When I tried to install PLT on Dreamhost, the binary installation failed. Building from source, I got different failures, which I didn't get debugged before Dreamhost told me they would not allow PLTWS to run.

Widely-used tools evolve to deal more gracefully with different configurations. Simpler implementations are easier to build.

Persistence

I heard the buzz about how PLTWS took care of persistence and (perhaps naively) thought that meant that state was automatically stored to the file system. It does not.

This means that the server needs to store valuable data (like credit card numbers) every time that data changes, which is not so different from CGI's requirements.

It does mean that it is easier to deal with "the back button" problem with PLTWS. It is relatively common to have situations where the user does something to make the server perform some action which is difficult to reverse and which should not be done twice. (Examples include charging a credit card or deleting the current record.) With PLTWS, it is trivial to prevent a user from hitting the back button: simply use a send/finish call in place of a send/suspend/callback call.

When using CGI, the server must keep two a persistent token representing the client's position in a transaction, and give the client a matching token. Then, at crucial steps, the server must compare the server's token with the client's token to ensure that the client is not trying to perform an action out of sequence.

Security

PLTWS and CGI each have advantages and disadvantages in the security arena. CGI can be easily used with https, while the only well-documented method for using PLTWS with HTTPS is a complicated method involving a proxy server.[Cookbook] At the end of that web page, there is a suggestion for how one would approach the task of making PLTWS work with HTTPS, but there are not step-by-step instructions. This might be yet another documentation issue.

On the other hand, PLTWS is clearly superior to CGI in some ways.

Because state must pass through the client, care must be taken to avoid malware.
The security of any state that leaves a CGI application is only as good as the

security on the client side. State is often embedded in URLs, which can live in browsers history caches, get emailed, shoulder surfed, or posted on Web pages.

Memory leaks

PLTWS processes live for much longer than CGI processes, they need to manage memory more carefully. Furthermore, because PLTWS maintains state for visited pages (so that the user can go back and choose a different set of pages to traverse), PLTWS has a built-in memory leak.

To work around that, PLTWS allows the developers to set an expiration time on pages. Unfortunately, the error message that greets users when they come to an expired page is absolutely identical to the error message they get if a page never existed. I observed that I and another were both quite confused -- even distressed -- when pages "disappeared".

Debugging

CGI has not traditionally been the easiest system to debug, but PLTWS is worse. In both cases, text that normally goes to standard error goes off into unintuitive places. With CGI/apache, it goes into an error file, usually under /var/log, with restricted read access. Libraries[Carp] also exist that will redirect error messages from stderr to stdout (and hence to the browser window).

With PLTWS, the documentation claimed that it would go to a log file, but I couldn't find any such log file. Instead, the errors went to the terminal window where the server was started. Particularly if PLTWS is running on a server in a data center, this is less than useful.

Finally, while most systems have somewhat cryptic error messages, I found the PLTWS messages to be particularly cryptic. For example, sometimes there were line numbers in the error messages; sometimes there weren't.

Modularity and code structure

One of the benefits touted for PLTWS was the ability to operate at a procedure-per-page level. The biggest benefit that I found was in not having to encode and decode all the state information. In particular, it was very nice to be able to pass around a task instead of the task's UID. While encoding/decoding the UID itself is not very difficult, finding the task that corresponds to that UID was trickier.

(I was in the middle of debugging exactly such a procedure when I slapped my head and asked myself, "Why am I doing this?" It was a joy to pass the task around directlyinstead.)

User interface issues

In the course of the project, I changed several things in the user interface design as a result of either technical complications, UI difficulties, or UI opportunities.

First, I had expected (perhaps naively) PLTWS to handle persistence for me. I decided not to expand the scope of my work to include attaching to a database because

I heard from the other 511 group working on PLTWS group that it was hard,
I cannot use this code myself after the class, since I am not allowed to run it on my web service, and
I am more interested in user interface issues than database issues.

With no persistence and no way to put this system into production, registration and authorization became less interesting. I did figure out how to make login work, and I wil submit that, but I didn't get around to connecting the two.

I liked tying the saturation of the tasks to importance. This let me judge priority at pre-perceptual speed.

With the easy ability to defer tasks, I found that I didn't need an "urgency" field separate from the "importance" field. For urgency, the high order bit of information -- can I work on it now or can't I? -- seemed to outweigh everything else. There also wasn't a good way to convey the urgency information. I could have perhaps converted the main display's task tree into a task table, and put urgency into a column, but users wouldn't be able to acquire the urgency information rapidly.

Because it was easy to trim the number of tasks that were visible, I didn't feel compelled to be able to expand/collapse individual branches of the tree. I left that for last and didn't get to it.

I liked keeping the indentation of subtasks when their supertask was hidden, but in an informal user test, I got the feedback that that they found the extra indentation confusing.

I could not figure out how to specify that a task should be a child of a particular task. In particular, I couldn't come up with a good way to specify the parent on the command-line interface (the "Create new task" box on the main page). The only way I could see to specify the parent was to display the UID of all the tasks, but that seemed unbearably ugly to me. If UIDs were assigned in creation order (as they are implemented now), then the task numbering would not correspond to the ordering on-screen. If the UIDs rearranged themselves as the tasks were re-ordered, users might get confused about the numbers' impermanence.

With no good way to specify the parent, and no need for an urgency field, the need for a command line interface plummeted, so I did not implement one.

Speed

@@@ depends

Future work

I would like to redo this application in Javascript, as I believe Javascript has many of the advantages of both with relatively few disadvantages.

Like PLTWS, Javascript is able to maintain a rich (although transitory) state and

pass parameters between functions that cause actions to happen in the browser.

Like PLTWS and CGI, I can keep persistent data on the server.
Like CGI, memory leaks will not negatively impact my server. (They might

negatively impact the user, but better one user than all of them.)

Like CGI, Javascript is a very robust, mature technology with widespread availability, good documentation, and a thriving ecosystem.
Like with CGI, I can use https easily.
Like PLTWS, I can avoid sending transmitting a lot of the state information

(in the case of Javascript, by keeping it local).

The debuggers are better than for either CGI or PLTWS.
It is imperative to find a way to specify which task(s) a new task depends on or blocks. Javascript offers drag and drop functionality.

It is essential that I go beyond the current implementation to add persistence of user data, registration, and authentication.

I would also like to do some user testing, even if it is only brief questions.

Is the indentation level confusing?
Do users prefer specifying tasks in the order that they block or the order that they are blocked?
Do people need an urgency field?
How worthwhile is it to allow expanding/collapsing individual branches of the task tree?
Is it helpful to change the color based on the importance? Is saturation the appropriate characteristic, or should I use hue or brightness instead?

Parting words

PLTWS and its associated libraries, in general, do a very good job of abstracting away the details of transferring state information around a program. This is good, but when implementing abstractions, particular care must be taken that they are not leaky abstractions as described by Joel Spolsky [LeakyAbstraction].

The Scheme library attempted to hide the messy details of how checkboxes and radio buttons are described in HTML, but in doing so, kept me away from tools I needed to declare a button/box checked.

PLTWS also abstracts state persistance. While this is nice in theory, it is not a perfect abstraction because it does not persist information across server restarts. "Model information" needs to be explicitly persisted, and "view information" will be lost.

Working on this project, I sometimes felt like I was living in an exquisitely beautiful mansion which automatically adjusted the lighting for me and got it right 99% of the time -- but 1% of the time I needed the lighting to be something different, and the controls were hidden away so discreetly that I had a really hard time finding them to override the system. (And of course, the mansion had an inadequate users' manual.)

Visible light switches might be ugly, and might be annoying to have to operate all the time, but I always know where they are.

@@@ Thanks to Christopher Dutchyn, Jim DeLaHunt.

References

[CGI] Rob McCool, The Common Gateway Interface, http://hoohoo.ncsa.uiuc.edu/cgi/overview.html, 1993. [CGIlist] Original discussions about the Common Gateway Interface -- originally called the Common Gateway Protocol -- is archived at archived at http://www.webhistory.org/www.lists/www-talk.1993q4/0518.html

[WorseBetter] http://www.dreamsongs.com/WIB.html [Taintperl] http://gunther.web66.com/FAQS/taintmode.html [MIME perl module] http://search.cpan.org/~gaas/MIME-Base64-Perl-1.00/lib/MIME/Base64/Perl.pm [LeakyAbstractions] http://www.joelonsoftware.com/articles/LeakyAbstractions.html

[Cookbook] http://schemecookbook.org/Cookbook/HttpsWebservering

[Carp] http://search.cpan.org/dist/CGI.pm/CGI/Carp.pm

Raw edit | More topic actions

Topic revision: r2 - 2006-04-21 - TWikiGuest