OmniMark Programming Principles

www.serverside.com.au

Chapter 6
Writing CGI programs with OmniMark.


[Back to the General Index] [Back to the Chapter Summary]

This chapter covers:

Topic Index

6.1: CGI programming, in general

OmniMark CGI programs are no more complex than many of the sample programs seen in the previous chapters in this book - they employ exactly the same principles and syntax.

However, be warned that there are some rather complex setting up issues with CGI programs. You have to have access to a host machine on the web, you have to have OmniMark installed on that machine, and you have to have permission from the web server administrator to install and execute your programs on the host machine. Debugging CGI programs is also a little more tedious than with standalone programs.

I will cover all these complexities in later topics. The topics first cover the theory of the Common Gateway Interface in some detail, then the setting up issues and finally the programming principles and syntax itself.

Although you can write and run many of the programs in this chapter without using a real CGI environment, you won't actually see the results as they are intended - this is because the input and output for CGI is assumed to be managed by a web server and the user interface is assumed to be a web browser (like Netscape or MS Internet Explorer) which communicates with the web server.

Topic List

6.2: How CGI actually works

CGI is an acronym for 'Common Gateway Interface' which is a set of basic rules for how web browsers can talk to executable programs on a host machine via web server software.

6.2.1 Simple web page serving

To understand this, first think about what happens when a person is browsing on the web and wants to link to another web page. The picture below shows an open web page. The person looking at it (called a 'client') has pointed their mouse onto a link in the page but has not yet clicked on it.

Notice the status bar along the bottom of the window frame. It contains the URL of the web page which will be loaded when the link is clicked. In this case, the URL is simply the name of an HTML file called 'index.html' which is located in the directory 'omnimark' on a host machine called 'clio.mit.csu.edu.au'.

When the client actually clicks the link, their browser will send a message to the web server on the host machine requesting that the indicated page be sent to the browser. This basic interaction between browser and web server is shown in the following diagram.

Following this request, the web server reads the file and transmits it back to the client's machine and the client's browser displays it. The web server on the host machine is just a computer program whose job it is to serve pages to clients.

6.2.2 CGI serving

The Common Gateway Interface allows clients to send similar messages to web servers, but instead of requesting a web page (an HTML file), the client asks the web server to execute a program on the host machine. Programs which can be executed this way must conform to the rules of the Common Gateway Interface and are usually just called 'CGI programs'.

The picture below shows a client's browser just before they click on a link to a CGI program - note the URL in the status bar.

We can see the location of the CGI program in the status bar. It is a program called 'formtest' which is located in a directory called 'cgi-bin' on the host machine 'clio.mit.csu.edu.au'.

When the client actually clicks the link, their browser will send a message to the web server on the host machine requesting that the indicated program be executed. This interaction between browser and web server is slightly more complicated than the page serving interaction because the browser may also send data to the server (such as the data typed into a form on the original page). When the server executes the CGI program, it passes this data to it. This is shown in the following diagram.

The CGI program may do any number of processing tasks and ultimately it will send data back to the web server which will pass this on to the client's browser for display. In typical CGI programs, the output is just HTML markup so when this arrives at the client's browser it appears like any standard web page. The following diagram shows the flow of output data from the CGI program through the web server and back to the client's browser.

An important point to know is that unlike software which you typically run on your own local machine, CGI programs live no longer that the time it takes for them to accept input, process it and deliver output. Once a CGI program has terminated, it loses all knowledge of the interaction. This means that a CGI program does not retain any information about previous calls to it - its does not maintain any state between executions. There are several techniques for helping CGI programs to maintain state between calls - such as writing and reading temporary files, setting cookies etc, but these are not dealt with in this book.

Topic List

6.3: Data streams and CGI programs

The above discussion indicates that a CGI program gets incoming data from the web server. Any data which is output by the CGI program is sent to the web server. As authors of CGI programs, we need to know what kind of data our program will get, how to interpret it, and how to output the correct data in response.

6.3.1 The CGI input stream

The Common Gateway Interface defines that all incoming data that a program receives comes in through its standard input stream. As you will see later in this chapter, OmniMark can process this input stream very easily. In addition to the standard input stream, the web server also sets several environmemt variables which the CGI program can access - these environment variables will be discussed in a later topic. OmniMark can easily deal with these environment values by using functions in its CGI library.

The most common data sent to a CGI program is that which has been captured by a web form on the client's machine. When the form is submitted (usually via a SUBMIT button), the browser collects the data in the form fields and encodes it in a special way (called 'URL encoding') before submitting it.

When the CGI program gets this data it must unencode it. CGI programs written with OmniMark can do this easily by, once again, using functions from the CGI library.

6.3.2 The CGI output stream

To send output data back to a web server, a CGI program simply writes the data onto its standard output stream. OmniMark does this with the simple 'output' actions seen often in the previous five chapters of this book.

Since the output from the program is destined eventually for display by a client's browser, the format of the output is usually just HTML. The program first outputs a special header indicating that the output is HTML, then just outputs the HTML markup.

Topic List

6.4: Setting up OmniMark for CGI on a host machine

The following discussion assumes that the OmniMark CGI programs will be run on a host machine running the UNIX operating system. However, the principles apply equally to other systems such as MS Windows - albeit with some slight variation of directory naming etc.

6.4.1 Gaining CGI permissions

CGI programming is treated as a more sophisticated operation that just publishing web pages in HTML. Before you can install your CGI programs on a host machine, you must gain permission to do so from the web server administrator and/or the system administrator of the host. When you contact the administrator to seek permission, explain that you want to be able to save your programs in one of the special directories which are configured for CGI executables. On a typical UNIX machine, running a typical web server (such as 'Apache'), the most common directory is one called 'cgi-bin' which usually lives under the installation directory of the web server itself. A location of

  /local/apache/cgi-bin

is quite common.

The administrator must give you permission to save your programs in the 'cgi-bin' directory and also give you execute permission in that directory (for debugging).

6.4.2 Installing OmniMark

The OmniMark C/VM must be installed on the host machine along with its associated function libraries and include files - this is the standard installation. Once installed, you should make sure you know the absolute pathname to the OmniMark executable 'omnimark'. In a typical installation on a typical UNIX host, this might be:

  /local/bin/omnimark

You must also know the absolute pathnames of the external function libraries and the include file directories. On my machine the function libraries are in

  /local/omnimark53/lib

and the include files are in

  /local/omnimark53/xin

Topic List

6.5: Getting started with OmniMark CGI programs

This topic presents the two files you need to create to build a first (and very simple) OmniMark CGI program. You can use these as a general guide for all the OmniMark CGI programs provided in in later topics.

6.5.1 Create an executable OmniMark 'argument' file

Using any text editor, create a text file in the 'cgi-bin' directory of your web host machine. Name this file 'cgiTest'. This file will not contain the OmniMark program itself, it will contain the command-line arguments which set up the locations of the OmniMark libraries and specify the name of the OmniMark program. The name of this arguments file (ie 'cgiTest') will be used by clients when they call your program.

This file is actually an OmniMark 'arguments file' as discussed in Chapter 5, Topic 4. Using the example installation paths discussed above, this argument file will contain:

001  #!/local/bin/omnimark -f
002  -sb cgiTest.xom
003  -x /local/omnimark53/lib/=L.so
004  -i /local/omnimark53/xin/

Line 1 in the above file is crucial. It must start at the beginning of the first line, as shown, with the symbols '#!'. These symbols are called 'hash-bang', or sometimes just 'shebang', and when they are read by the host's operating system they are interpreted as 'call the following command'.

The command which is called is '/local/bin/omnimark -f' which calls OmniMark and instructs it to interpret the subsequent lines as command-line arguments. Note that '/local/bin/omnimark' is the absolute pathname of the executable OmniMark C/VM on the host machine.

Line 2 contains the call to our OmniMark CGI program called 'cgiTest.xom'. Note the typical '-sb' options which specify that 'cgiTest.xom' is our source file and it is to be run in 'brief' mode.

Line 3 contains the '-x' option (which specifies the location of OmniMark libraries) followed by the absolute path of the libraries. The notation '=L.so' appended to this path indicates that on this system (UNIX), the function libraries have a suffix of 'so'. If you are writing CGI programs on a Windows system, the notation will have to be '=L.dll' since 'dll' is the Windows file suffix for dynamic linked libraries.

Finally, on line 4, the option '-i' specifies the location of the OmniMark include files.

This arguments file 'cgiTest' must be set as 'executable' by the operating system. On UNIX, executable mode can be specified for the file with the command

  chmod 755 cgiTest

and can be confirmed with the UNIX command

  ls -l cgiTest

which, on my local machine displays

-rwxr-xr-x   1 echoppin academic      99 Nov  7 15:45 cgiTest

The 'x' at the end of the first part of this listing shows that the file 'cgiTest' is executable by everyone.

6.5.2 Create an OmniMark source file

Now to write the OmniMark program - at last!

Using any text editor, create the source file 'cgiTest.xom' in the same directory as the arguments file above. Our source will be a minimal CGI program which ignores any incoming data and simply outputs a header and the smallest possible HTML markup for display on the client's browser. Here is the program:

[Code Sample: C06T05a.xom]

001  ; a minimal OmniMark CGI program
002  
003  process
004    output "Content-type: text/html%n%n"
005    output "<HTML>%n"
006    output "<HEAD>%n"
007    output "<TITLE>OmniMark CGI says Hi</TITLE>%n"
008    output "</HEAD>%n"
009    output "<BODY>%n"
010    output "<H2>Hi there from my OmniMark CGI program</H2>%n"
011    output "</BODY>%n"
012    output "</HTML>%n"

Line 4 of the program outputs (to standard output, naturally) the special header which tells the browser what kind of data is to follow. In this case the header is

Content-type: text/html

followed by two newlines. The content type 'text/html' specified is a mime-type which informs the browser that the data to follow is plain ASCII text and is marked up as HTML. The two newlines are essential because the blank line they create after the header tell the recipient that the header is finished and to get ready for the real data to follow.

The eight lines from line 5 to line 12 output simple HTML markup which the browser will display.

6.5.3 Testing the program

Before calling the program from a browser, it is wise to test it from the command line. This can be done (in UNIX), just by entering the name of the executable arguments file as a command...

  cgiTest

When this is done, the operating system sees the hash-bang and so invokes the OmniMark C/VM on the file. OmniMark sees the option '-f' and realises that this is an arguments file, then sees the '-sb cgiTest.xom' and calls our program. The program's output appears on our console screen (our standard output device) and should appear as:

Content-type: text/html

<HTML>
<HEAD>
<TITLE>OmniMark CGI says Hi</TITLE>
</HEAD>
<BODY>
<H2>Hi there from my OmniMark CGI program</H2>
</BODY>
</HTML>

Of course, if there are OmniMark compile-time or run-time errors generated, they should be fixed and the program tested again from the command line. If it appears to work correctly it's time to test it live on the web.

The following picture shows my browser window after a call to 'cgiTest' has been made from within the browser. Note the location URL - it shows where I placed my program. Note also the title on the browser titlebar - it shows that the program's output has been interpreted correctly.

Topic List

6.6: Some fine tuning for CGI programs

Although this sample program works correctly, it is a good idea in general OmniMark CGI programming to include a couple of extra settings. With these added, our sample program can be used as a template for all our following and more complex programs.

6.6.1 Settings for the IO streams

To make our programs more efficient in capturing incoming data we can specify that our standard input stream is not buffered. This means that the data is not accumulated in memory before being fed to our program. Unbuffering standard input can be specified with the instruction

  declare #main-input has unbuffered

at the top of our '.xom' source code file.

To ensure that our output is sent without modification and as efficiently as possible we can specify that our standard output stream is sent in binary rather that ASCII mode. This setting is specified by writing

  declare #main-output has binary-mode

at the top of our source code.

6.6.2 Including the OmniMark library functions

The external function libraries supplied with our OmniMark installation can be accessed simply by including them near the top of our source code. The following three lines bring the utility library, the CGI library and the date library respectively into our source code.

  include "omutil.xin"
  include "omcgi.xin"
  include "omdate.xin"

Note that the libraries are named with the '.xin' suffix because they are OmniMark include files. OmniMark knows where to find these on our system by following the '-i' path specified in the arguments file. We do not have to import the external function library files themselves - these are accessed by OmniMark automatically within the '.xin' files. We do have to tell OmniMark where on our system these libraries are and that was done with the '-x' option in the arguments file.

Be careful to always include 'omutil.xin' before including 'omcgi.xin' - this is because some of the funtions in the CGI library require the use of some of those in the utility library.

With the include files available, we can easily and directly call any of support functions we need.

6.6.3 Another example...

I present here, for completeness, another sample CGI program which includes the IO settings, and the libraries. It outputs HTML markup showing the exact date and time the program was called.

The executable arguments file (called 'showtime') which is called by a client is as follows

#!/local/bin/omnimark -f
-sb showtime.xom
-x /local/omnimark53/lib/=L.so
-i /local/omnimark53/xin/

The full listing of the source code (in a file called 'showtime.xom') is:

[Code Sample: C06T06a.xom]

001  ; An OmniMark CGI program which outputs the current
002  ; date and time within HTML markup
003  
004  ; set up the IO streams
005  declare #main-input has unbuffered
006  declare #main-output has binary-mode
007  
008  ; include three libraries
009  include "omutil.xin"
010  include "omcgi.xin"
011  include "omdate.xin"
012  
013  ; declare a variable to hold the time value
014  global stream theTime
015  
016  process
017    set theTime to now-as-ymdhms
018    output "Content-type: text/html%n%n"
019    output "<HTML>%n"
020    output "<HEAD>%n"
021    output "<TITLE>OmniMark showtime CGI</TITLE>%n"
022    output "</HEAD>%n"
023    output "<BODY>%n"
024    output "<CENTER>%n"
025    output "<P>This program was executed at</P>%n"
026    output "<PRE>%n"
027    output "%g(theTime)"
028    output "</PRE>%n"
029    output "</CENTER>%n"
030    output "</BODY>%n"
031    output "</HTML>%n"

The current date and time is captured into the stream variable 'theTime' on line 17 and output on line 27. The function 'now-as-ymdhms' is available to the program from the 'omdate.xin' library which was included on line 11.

When I call the program from my browser, the display I get is shown in the following picture. You can obviously see when I captured this picture - year 2000, month 11, day 08, hour 10, minute 34, second 22. The +1100 on the end of the time value indicates that here in Bathurst, Australia we are 10 hours ahead of GMT and we currently have 1 hour of daylight saving time operating.

Topic List

6.7: General form processing with OmniMark

Perhaps the most useful purpose of CGI programs is capturing and processing data which comes from web forms.

6.7.1 Sample Web form

The picture below shows a simple web form into which our friend Hugo has entered some data:

Here is the HTML used to create it.

001  <HTML>
002  <HEAD>
003  <TITLE>A simple web form</TITLE>
004  </HEAD>
005  <BODY>
006  <FORM METHOD=POST
007        ACTION="http://clio.mit.csu.edu.au:88/cgi-bin/ombook/form1">
008  
009  What is your name? <INPUT NAME="name" TYPE=TEXT SIZE=30><BR>
010  What is your age? <INPUT NAME="age" TYPE=TEXT SIZE=5><BR>
011  What is your email address? <INPUT NAME="email" TYPE=TEXT SIZE=20><BR>
012  Submit this form: <INPUT TYPE=SUBMIT VALUE="Hit Me">
013  
014  </FORM>
015  </BODY>
016  </HTML>

The FORM element starts on line 6 where the transmission method is defined as 'POST'. On line 7 the ACTION attribute contains the URL of the CGI program which will accept the form data when it is submitted. Note that this form contains only three fields into which data can be entered, one called 'name' on line 9, one called 'age' on line 10, and one called 'email' on line 11.

When the form is submitted, the browser encodes the names of the fields and their contents like this, and this data is sent to the CGI program.

name=Hugo+First&age=31&email=hugo%40myplace.com

The encoding pattern is a sequence of name/value pairs separated by an ampersand symbol '&'. Each pair contains a field name, an equal symbol and a field value. You might notice that spaces have been encoded as '+' symbols and some non-alphanumeric characters (like the '@' symbol) have been converted into the hexidecimal equivalent of their ASCII code, preceded by a percent symbol.

We could write an OmniMark pattern matching program (See Chapter 3) to decode this raw data but why bother? OmniMark has already provided a function which does it for us. The function is called 'cgiGetQuery' (within the 'omcgi.xin' library), it neatly decodes the data into an OmniMark shelf. All our form processing CGI programs will make use of this function, as shown in the following example.

6.7.2 Form processing example

The following OmniMark CGI program responds when the form above is submitted. The output from the program is HTML markup which includes the form data so the client can verify the data they submitted. Although most form processing programs are often longer and more complex, this example demonstrates the main principles.

The executable arguments file (called 'form1') is

#!/local/bin/omnimark -f
-sb form1.xom
-x /local/omnimark53/lib/=L.so
-i /local/omnimark53/xin/

The CGI program (called 'form1.xom') is

[Code Sample: C06T07a.xom]

001  ; An OmniMark CGI program which processes a form
002  ; The program simply sends the form data back to the
003  ; client.
004  
005  ; set up the IO stream settings
006  declare #main-input has unbuffered
007  declare #main-output has binary-mode
008  
009  ; include three libraries
010  include "omutil.xin"
011  include "omcgi.xin"
012  include "omdate.xin"
013  
014  ; a shelf to hold the field names and values
015  global stream formData variable initial-size 0
016  
017  ; a variable to hold the number of fields
018  global counter numFields
019  
020  process
021    cgiGetQuery into formData  ;; decode and capture incoming data
022  
023    ; output to client
024    output "Content-type: text/html%n%n"
025    output "<HTML>%n"
026    output "<HEAD>%n"
027    output "<TITLE>Form 1 CGI</TITLE>%n"
028    output "</HEAD>%n"
029    output "<BODY>%n"
030    output "<P>Thanks for submitting the form.</P>%n"
031  
032    set numFields to number of formData
033    do when numFields = 0
034      output "<P>No form fields were received.</P>%n"
035    else
036      output "<P>%d(numFields) form fields were received:</P>%n"
037      repeat over formData
038        output "The field '" || key of formData || "' "
039        output "contains <KBD>" || formData || "</KBD><BR>%n"
040      again
041    done
042  
043    output "</BODY>%n"
044    output "</HTML>%n"

Line 15 contains a declaration of an OmniMark stream shelf to hold the data.

Line 21:

   cgiGetQuery into formData

calls the 'cgiGetQuery' function, it decodes all the raw data and copies it into the 'formData' shelf. The names of the fields will be the keys of the shelf items and and the field values will be the values of the shelf items.

The ten lines from 32 through 41, do all the processing of the form data. We first check if any fields have been received (lines 32 and 33), and if so we loop over all the elements of our shelf (lines 37 to 40) and output the shelf keys (line 38) and values (line 39).

When I submit the form to the above program via my browser, the resulting display is:

Topic List

6.8: Processing specific form fields

The above program processes any web form, with any number of fields and without knowing what the names of the fields are. In many cases we want to process form data in a more specific way. The following example involves processing a registration type web form. Clients fill in and submit the form to register themselves for a conference. The CGI program captures the data, writes it onto the end of a log file and then outputs HTML markup to thank the client for their interest.

6.8.1 The 'registration' web form

The web form, as seen by a client, is shown in the following picture. The fields have been filled in with sample data:

The HTML which produces the form is:

001  <HTML>
002  <HEAD>
003  <TITLE>Registration Form</TITLE>
004  </HEAD>
005  <BODY>
006  <H2>Registration Form</H2>
007  <P>Please complete and submit this form to register
008  for the conference.</P>
009  
010  <FORM METHOD=POST
011        ACTION="http://clio.mit.csu.edu.au:88/cgi-bin/ombook/regoform">
012  
013  Your Name: <INPUT NAME="name" TYPE=TEXT SIZE=30><BR>
014  Your Email: <INPUT NAME="email" TYPE=TEXT SIZE=30><P>
015  
016  Please choose one of the following:<BR>
017  <OL type="a">
018  <LI><INPUT NAME="regotype" TYPE=RADIO VALUE="full" CHECKED>Full Registration
019  <LI><INPUT NAME="regotype" TYPE=RADIO VALUE="early">EarlyBird Registration
020  <LI><INPUT NAME="regotype" TYPE=RADIO VALUE="student">Student Registration
021  </OL>
022  
023  Submit this form: <INPUT TYPE=SUBMIT VALUE="Register">
024  
025  </FORM>
026  </BODY>
027  </HTML>

To process this data correctly, a CGI program must know the names of the expected fields. We can see there are two ordinary text fields called 'name' on line 13, and 'email' on line 14. The last field, defined on lines 18, 19 and 20, needs special attention. It is a radio button type field. Note that there are three radio buttons each defined with the same field name: 'regotype' so there is only one field called 'regotype'. Having multiple buttons with the same field name makes them mutually exclusive - the client can choose only one of them.

It is important that when mutually exclusive buttons are used, that one of them is automatically selected as the default - this is done with the 'CHECKED' attribute value on line 18. If the form is submitted with no radio button selected, the CGI program will not receive the field name 'regotype, nor a value for it.

6.8.2 Processing the form

The full listing of the CGI program to process the registration form data is given below, followed by an explanation of its features, process by process.

[Code Sample: C06T08a.xom]

001  ; An OmniMark CGI program which processes a form
002  ; The program writes the data to a log file and
003  ; responds to the client.
004  
005  declare #main-input has unbuffered
006  declare #main-output has binary-mode
007  
008  include "omutil.xin"
009  include "omcgi.xin"
010  include "omdate.xin"
011  
012  global stream formData variable initial-size 0
013  global stream logFileName initial {"regolog.xml"}
014  
015  ;; Error Message Function
016  define function showError as
017    output "<HTML>%n"
018    output "<HEAD>%n"
019    output "<TITLE>Error 1</TITLE>%n"
020    output "</HEAD>%n"
021    output "<BODY>%n"
022    output "<H2>Error in form processing</H2>%n"
023    output "Can't process registration - form data is invalid.%n"
024    output "</BODY>%n"
025    output "</HTML>%n"
026  
027  process-start
028    cgiGetQuery into formData
029    output "Content-type: text/html%n%n"
030  
031  
032  ;; deal with bad forms
033  process
034    do unless number of formData = 3
035      showError
036      halt
037    done
038  
039    do unless (   formData has key "name"
040              AND formData has key "email"
041              AND formData has key "regotype" )
042      showError
043      halt
044    done
045  
046  ;; write data to a log file  
047  process
048    local stream fileStream
049    reopen fileStream as file logFileName
050    put fileStream "<REGO>%n"
051    put fileStream "<NAME>" || formData key "name" || "</NAME>%n"
052    put fileStream "<EMAIL>" || formData key "email" || "</EMAIL>%n"
053    put fileStream "<TYPE>" || formData key "regotype" || "</TYPE>%n"
054    put fileStream "</REGO>%n%n"
055    close fileStream
056  
057  ;; respond to client
058  process
059    output "<HTML>%n"
060    output "<HEAD>%n"
061    output "<TITLE>Registration Feedback</TITLE>%n"
062    output "</HEAD>%n"
063    output "<BODY>%n"
064    output "<H2>Registration Feedback</H2>%n"
065    output "<P>Thanks for registering, a full conference kit will"
066    output " be sent to you by email shortly.</P>%n"
067    output "</BODY>%n"
068    output "</HTML>%n"

Starting, obviously, at the 'process-start' rule on line 27, the program captures all the form data in the shelf 'formData' and replies to the client's browser with the content-type header.

The process rule at line 33, checks that the expected form data has arrived and all the fields are present. Line 34 tests that there are three fields and lines 39, 40 and 41 check that all the field names are correct. If either of these tests fire, we cannot process the form correctly and so we deliver a small error message (formatted in HTML) to the user. The function 'showError' defined on line 16 delivers the error message. Note that after we show the error message we halt the entire program with the 'halt' action on either line 36 or line 43.

The process starting on line 47 writes the form data onto a file called 'regolog.xml'. The log file is reopened (line 49) so each registration's information is appended to the file. A CGI program is only allowed to write to a file on the host machine if the file already exists and is writable by the web server. So, before the program is run for the first time, the file 'regolog.xml' must be created and have its write permission enabled for the user who the web server simulates.

In this program, the format of the data written to the log file is XML. The XML starttags and endtags are written so they surround the form data on lines 50 through 54.

The final process (starting on line 58) outputs a response to the client in simple HTML.

When the form is submitted, with Rhoda's information, the file 'regolog.xml' contains

  <REGO>
  <NAME>Rhoda Dendron</NAME>
  <EMAIL>flower@someplace.org.au</EMAIL>
  <TYPE>early</TYPE>
  </REGO>

and the response the user sees on their browser is

Topic List

6.9: Accessing Environment Variables

Each time a web server calls a CGI program, it creates several environment variables and sets them with appropriate values. These variables and their values can easily be read by an OmniMark CGI program by using the function 'cgiGetEnv' which comes from the 'omcgi.xin' library.

The function places all the environment variable names and values into a shelf - each key in the shelf is the name of a variable and the shelf element holds the associated value. Capturing this data into a shelf is done in the same way as we captured form data in the previous topic.

Different web servers, running on different machines and acting on behalf of different clients may set more or fewer environment variables.

6.9.1 Environment example.

Below is a program which outputs all the names and values of all the environment variables which are set by my web server when it calls a CGI program.

[Code Sample: C06T09a.xom]

001  ; An OmniMark CGI program which outputs
002  ; the names and values of all environment variables
003  ; set by the web server which calls it.
004  
005  declare #main-input has unbuffered
006  declare #main-output has binary-mode
007  
008  include "omutil.xin"
009  include "omcgi.xin"
010  include "omdate.xin"
011  
012  global stream envData variable initial-size 0
013  
014  process-start
015    cgiGetEnv into envData
016    output "Content-type: text/html%n%n"
017  
018  process
019    output "<HTML>%n"
020    output "<HEAD>%n"
021    output "<TITLE>Environment variables</TITLE>%n"
022    output "</HEAD>%n"
023    output "<BODY>%n"
024    output "<H2>CGI Environment variables and their values.</H2>%n"
025    output "<PRE>%n"
026  
027    repeat over envData
028      output key of envData || " = "
029      output envData || "%n"
030    again
031  
032    output "</PRE>%n"
033    output "</BODY>%n"
034    output "</HTML>%n"

The stream shelf 'envData' is declared on line 12 ready to hold the environment variables and values. On line 15, we call the function 'cgiGetEnv' and capture the result in the shelf. The lines 27 through 30 output the keys and values of the shelf - these appear in a 'PRE' (preformatted) HTML element so the client sees them in a monospaced font with one variable name and value per line.

The results which appear on my browser contain 19 variables and values which are:

DOCUMENT_ROOT = /local/WWW
GATEWAY_INTERFACE = CGI/1.1
HTTP_ACCEPT = image/gif, image/x-xbitmap, image/jpeg, image/pjpeg, image/png, */*
HTTP_ACCEPT_CHARSET = iso-8859-1,*,utf-8
HTTP_ACCEPT_LANGUAGE = en
HTTP_CONNECTION = Keep-Alive
HTTP_HOST = clio.mit.csu.edu.au:88
HTTP_USER_AGENT = Mozilla/4.61 [en] (WinNT; I)
REMOTE_ADDR = 137.166.17.134
REMOTE_PORT = 1095
REQUEST_METHOD = GET
REQUEST_URI = /cgi-bin/ombook/cgiEnv
SCRIPT_FILENAME = /local/apache/cgi-bin/ombook/cgiEnv
SCRIPT_NAME = /cgi-bin/ombook/cgiEnv
SERVER_ADMIN = echopping@csu.edu.au
SERVER_NAME = clio.mit.csu.edu.au
SERVER_PORT = 88
SERVER_PROTOCOL = HTTP/1.0
SERVER_SOFTWARE = Apache/1.3.6 (Unix)

When you run the program on your host, with your browser, you will get similar but not identical results.

6.9.2 Accessing specific environment variables

Be careful when accessing any individual environment variable. Remember that the keys in OmniMark shelfs are case sensitive and that you can't assume a particular variable will be set by a server.

Suppose you wanted to include the email address of the web server administrator in your response to a client. If you boldly write an action like this:

  output "Contact the administrator at " || envData key "server_admin"

then an OmniMark error would occur - there is no item in the 'envData' shelf with a key of 'server_admin', the lowercase key name is not valid.

If you, again boldy, write

  output "Contact the administrator at " || envData key "SERVER_ADMIN"

and this variable has not be set by the server, you will get the same error from OmniMark. The safest way to deal with this is to guard against accessing a non-existent key...

  do when envData has key "SERVER_ADMIN"
    output "Contact the administrator at " || envData key "SERVER_ADMIN"
  else
    output "Have a nice day."
  done

Topic List

6.10: Processing SGML or XML with CGI

Since OmniMark has strong built-in support for SGML and XML processing, it is easy to incorporate this into our CGI programs. I provided an example earlier (in topic 8) which wrote some form data in XML format onto a log file. The data represented registrations for a conference. Here I present a program which reads the conference registration log file and delivers an HTML version of it to the client. The first version of the program delivers the entire list of registrations, the second version allows the client to search for a particular person by specifying part of their name as a command-line argument when the program is called.

Below is a copy of the contents of a sample log file which contains the registration details of several (fictitious) people. It is this file which will be processed by the CGI program. The file's name is 'regolog.xml'

<!-- Registration Log File -->
<REGO>
<NAME>Rhoda Dendron</NAME>
<EMAIL>flower@someplace.org.au</EMAIL>
<TYPE>early</TYPE>
</REGO>
<REGO>
<NAME>Sean Lamb</NAME>
<EMAIL>sheepy@woolmark.com.au</EMAIL>
<TYPE>full</TYPE>
</REGO>
<REGO>
<NAME>Hugh Jass</NAME>
<EMAIL>jassy@chairs.com</EMAIL>
<TYPE>student</TYPE>
</REGO>
<REGO>
<NAME>Lorrie Driver</NAME>
<EMAIL>trucker@mobile.edu.au</EMAIL>
<TYPE>early</TYPE>
</REGO>
<REGO>
<NAME>Wayne Maker</NAME>
<EMAIL>maker_w@weather.net.au</EMAIL>
<TYPE>student</TYPE>
</REGO>

A header file (called 'regolog.header') containing a document type declaration, a DTD and a root element is also made available in the same directory. It contains:

<!DOCTYPE ENTRIES[
<!ELEMENT ENTRIES - o (REGO)+>
<!ELEMENT REGO - - (NAME,EMAIL,TYPE)>
<!ELEMENT (NAME,EMAIL,TYPE) - - (#PCDATA)>
]>
<ENTRIES>

6.10.1 Deliver all entries as HTML

The first version of the CGI program scans these files and uses element rules (see Chapter 4), to translate the information into HTML. It looks a little long but has a simple structure, discussed below. It is called 'showRego.xom'.

[Code Sample: C06T10a.xom]

001  ; An OmniMark CGI program to translate SGML data
002  ; to HTML
003  
004  declare #main-input has unbuffered
005  declare #main-output has binary-mode
006  
007  include "omutil.xin"
008  include "omcgi.xin"
009  include "omdate.xin"
010  
011  global stream logFileHeader initial {"regolog.header"}
012  global stream logFileName initial {"regolog.xml"}
013  global counter numPeople initial {0}
014  
015  process-start
016    output "Content-type: text/html%n%n"
017  
018  process
019    output "<HTML>%n"
020    output "<HEAD>%n"
021    output "<TITLE>Registration Data</TITLE>%n"
022    output "</HEAD>%n"
023    output "<BODY>%n"
024    output "<H2>Registration Data</H2>%n"
025    output "<P>The following people have submitted registrations"
026    output " for the conference</P>%n"
027  
028    do sgml-parse document
029      scan file logFileHeader || file logFileName
030      using group showPeople
031        output "%c"
032    done
033  
034    output "</BODY>%n"
035    output "</HTML>%n"
036  
037  
038  group showPeople
039  element entries
040    output "<HR>%n"
041    output "<DL>%n"
042    output "%c"
043    output "</DL>%n"
044    output "<HR>%n"
045    output "There are %d(numPeople) registrations listed.%n"
046    output "<HR>%n"
047  
048  element rego
049    increment numPeople
050    output "<DT>%d(numPeople): "
051    output "%c"
052  
053  element name
054    output "<STRONG>%c</STRONG>%n"
055  
056  element email
057    local stream address
058    set address to "%c"
059    output "<DD><EM>Email:</EM> "
060    output "<A HREF=%"mailto:%g(address)%">%g(address)</A>%n"
061  
062  element type
063    local stream theContent
064    set theContent to "%c"
065    do when theContent matches "full"
066      output "<DD>Full Registration%n"
067    else when theContent matches "early"
068      output "<DD>EarlyBird Registration%n"
069    else
070      output "<DD>Student Registration%n"
071    done
072  
073  
074  element #implied
075    suppress

The code which initiates the processing of the registration data file is shown above in lines 28 through 32. To make a legal SGML document, the header file and the log file are concatenated and scanned on line 29.

Although there is only one set of element rules in the program I have chosen to place them in an element group and specify the group name explicitly, on line 30. This makes it easy to extend the program to include other groups for other types of processing when necessary.

The group of element rules starting on line 38, simply output the registration data wrapped in HTML tags for display by the client's browser. When I call the program from my browser I get the following display:

6.10.2 Searching for a person

The following is a modification of the above program. It accepts a single command-line argument from the client. It then searches the registration log and displays the details of people who have a name containing the command line argument. The program's features are discussed after the code.

[Code Sample: C06T10b.xom]

001  ; An OmniMark CGI program which accepts a single
002  ; command line argument and searches for people
003  ; whose name contains it.
004  
005  declare #main-input has unbuffered
006  declare #main-output has binary-mode
007  
008  include "omutil.xin"
009  include "omcgi.xin"
010  include "omdate.xin"
011    
012  ;; Error Message Function
013  define function showError as
014    output "<HTML>%n"
015    output "<HEAD>%n"
016    output "<TITLE>Error</TITLE>%n"
017    output "</HEAD>%n"
018    output "<BODY>%n"
019    output "<H2>Error in arguments</H2>%n"
020    output "<P>Can't search for people. This program requires"
021    output " a single command line argument.</P>%n"
022    output "</BODY>%n"
023    output "</HTML>%n"
024  
025  
026  global stream logFileHeader initial {"regolog.header"}
027  global stream logFileName initial {"regolog.xml"}
028  global counter numPeople initial {0}
029  global stream searchData
030  global switch foundPerson
031  
032  process-start
033    output "Content-type: text/html%n%n"
034  
035  ; deal with error in arguments
036  process
037    do unless number of #command-line-names = 1
038      showError
039      halt
040    else
041      set searchData to #command-line-names item 1
042    done
043  
044  
045  process
046    output "<HTML>%n"
047    output "<HEAD>%n"
048    output "<TITLE>Registration Search</TITLE>%n"
049    output "</HEAD>%n"
050    output "<BODY>%n"
051    output "<H2>Registration Search</H2>%n"
052  
053    do sgml-parse document
054      scan file logFileHeader || file logFileName
055      using group findPeople
056        output "%c"
057    done
058  
059    output "</BODY>%n"
060    output "</HTML>%n"
061  
062  group findPeople
063  element entries
064    output "<HR>%n"
065    output "<DL>%n"
066    output "%c"
067    output "</DL>%n"
068    output "<HR>%n"
069    output "There are %d(numPeople) registrations found.%n"
070    output "<HR>%n"
071  
072  element rego
073    output "%c"
074  
075  element name
076    local stream theContent
077    set theContent to "%c"
078    deactivate foundPerson
079    repeat scan theContent
080      match ul"%g(searchData)"
081        activate foundPerson
082        increment numPeople
083        output "<DT><STRONG>%g(theContent)</STRONG>%n"
084        exit
085      match any
086    again
087  
088  element email
089    local stream address
090    set address to "%c"
091    do when foundPerson
092      output "<DD><EM>Email:</EM> "
093      output "<A HREF=%"mailto:%g(address)%">%g(address)</A>%n"
094    done
095  
096  element type
097    local stream theContent
098    set theContent to "%c"
099    do when foundPerson
100      do when theContent matches "full"
101        output "<DD>Full Registration%n"
102      else when theContent matches "early"
103        output "<DD>EarlyBird Registration%n"
104      else
105        output "<DD>Student Registration%n"
106      done
107    done
108  
109  element #implied
110    suppress

I've called this program 'searchRego.xom'. In lines 36 through 42, we check the built-in shelf '#command-line-names' to see how many items are in it. If there is not exactly one item, we output an error page and halt. When there is exactly one command line argument, we store it into the global variable 'searchData' and use it later when scanning the content of the NAME element.

On lines 79 through 86, when in the NAME element rule, we scan all the characters in the element's content - that is we scan through every person's name. If the 'searchData' pattern is matched (line 80), we set a boolean variable 'foundPerson' to true, increment the number of people found and exit the scan.

In each of the EMAIL and TYPE element rules we only output the content if the boolean variable 'foundPerson' is true.

6.10.3 A note about command-line arguments.

When running OmniMark as a normal console command, like this

  omnimark -sb myProgram.xom  one two three

any words typed in the command which do not have option symbol ('-'), are considered by OmniMark to be command-line arguments. This is the case with the words 'one', 'two' and 'three' above.

When calling a CGI program from a browser, we must append the first command-line argument to the location of the CGI program directly after a '?' (question mark) symbol. Any subsequent arguments are appended with plus signs ('+'). For example, a call the above CGI program with no arguments looks like this:

A syntactically legal call with two arguments, as shown below, results in the same error message:

Finally, a legal call, with a single argument of 'amb', searches for all registered people whose name contains the argument...

Topic List

6.11: Debugging CGI programs

Even though CGI programs are designed to be executed by a web server, it is still possible to run them from the console where their output is just displayed on your screen. Unfortunately it is quite difficult to simulate the delivery of form data when doing this. Sometimes you have to actually call the programs from your browser, and sometimes they don't run correctly and you get an error message from the server. In these cases, any error messages output by OmniMark go onto the standard error stream where they are piped by the web server onto the end of the server's error log file.

At times, a CGI programmer must check the server's error log file to see what the error messages say. This is not difficult, but you may have to ask your web server administrator exactly where the server's error log file is on your system. On my system, the command:

  tail -20 /local/apache/logs/error_log

writes the last 20 lines of the server's error log file onto my screen from which I can usually figure out why my CGI program went off into the weeds.

So, CGI programming can be pretty frustrating at times but when your programs work you start to use see some of the real power of 'programming the web'. Much of the impressive work done by modern web sites is made possible by CGI programs or technologies which are allied to it.

Because CGI programs are more demanding, harder to debug and require a larger infastructure, there are only two tasks below. However, they cover many of the principles discussed in this chapter.

Topic List


Tasks

Task 1

This task is quite small but is followed up by the next task. You might like to read both tasks before starting this one.

Write an OmniMark CGI program called 'getForm' which accepts no input but delivers a web form as output. The idea is that when this program is called, the client gets a form to fill in on their browser. The program's structure can be similar to that shown in the example program topic 6.5.2. Dealing with the form itself is covered with in the next task.

The HTML data which specifies the web form should contain the actual address of another CGI program in the ACTION attribute. At this stage it is recommended that you 'hand-code' the address into your output - using the same directory as the 'getForm' program but specifying a program name of 'processForm' - which I ask you to write in the next task.

It does not really matter to me what data your form collects, but my sample solution will output a form which asks the client for their choice of Pizza, and allows them a choice of some optional extra toppings.

Task 2

Write a program called 'processForm' which accepts the data posted by the client in response to the form given in the above task.

The output of the program should be HTML which simply provides a 'thank you' message for the client and confirms the type of Pizza they have ordered and any extra toppings they have specified. Note that this program will not get information about toppings which have not been checked on the form by the client; so, do not assume that your form data shelf will contain keys for toppings.

The program should contain tests on the form data to ensure that it has arrived and contains the correct field names.


Sample Solutions

Solution 1

The following program outputs a form which allows a client to order a Pizza. Note that the ACTION attribute's value points to another program called 'processForm' (on line 26). Note also that on line 26, I have used the escape symbol '%' to get literal quotation marks around the value.

[Code Sample: C06S01.xom]

001  ; An OmniMark CGI program which outputs
002  ; an HTML form. The form requests a choice
003  ; of Pizza.
004  
005  declare #main-input has unbuffered
006  declare #main-output has binary-mode
007  
008  include "omutil.xin"
009  include "omcgi.xin"
010  include "omdate.xin"
011  
012  process-start
013    output "Content-type: text/html%n%n"
014  
015  ; output form
016  process
017    output "<HTML>%n"
018    output "<HEAD>%n"
019    output "<TITLE>Pizza Time!</TITLE>%n"
020    output "</HEAD>%n"
021    output "<BODY>%n"
022    output "<H2>The OmniMark Pizza Gallery</H2>%n"
023    output "Choose your Pizza.<BR>%n"
024  
025    output "<FORM METHOD=POST%n"
026    output "ACTION=%"/cgi-bin/ombook/processForm%">%n"
027  
028    output "<SELECT NAME=pizzatype>%n"
029    output "<OPTION VALUE=aussie>Aussie Pizza (Bacon, Eggs and Kangaroo)%n"
030    output "<OPTION VALUE=supreme>Supreme (the lot)%n"
031    output "<OPTION VALUE=italian>Italian (Onion, Olives)%n"
032    output "<OPTION VALUE=irish>Irish (Potato, Guinness)%n"
033    output "</SELECT><P>%n"
034  
035    output "You can select extra toppings if you wish.<BR>%n"
036    output "<INPUT NAME=anch TYPE=checkbox>Extra Anchovies<BR>%n"
037    output "<INPUT NAME=pina TYPE=checkbox>Extra Pinapple<BR>%n"
038    output "<INPUT NAME=garl TYPE=checkbox>Garlic Sauce<P>%n"
039  
040    output "<INPUT TYPE=SUBMIT VALUE=%"Place Order%">%n"
041  
042    output "</FORM>%n"
043    output "</BODY>%n"
044    output "</HTML>%n"

Here is a picture of my browser when I call the above program:

Solution 2

This program captures the data from the above form and outputs a confirmation message. The form data shelf is checked before processing is attempted and error messages are output if the form data is not as expected. Note how access to the form fields 'anch', 'pina' and 'garl' are protected with selections on lines 60 through 70.

[Code Sample: C06S02.xom]

001  ; An OmniMark CGI program which accepts
002  ; data from the Pizza form and outputs
003  ; a confirmation message.
004  
005  declare #main-input has unbuffered
006  declare #main-output has binary-mode
007  
008  include "omutil.xin"
009  include "omcgi.xin"
010  include "omdate.xin"
011  
012  ;; Error Message Function
013  define function showError( value counter enum ) as
014    output "<HTML>%n"
015    output "<HEAD>%n"
016    output "<TITLE>Error %d(enum)</TITLE>%n"
017    output "</HEAD>%n"
018    output "<BODY>%n"
019    output "<H2>Error number %d(enum)</H2>%n"
020    do when enum = 1
021      output "Incorrect number of form fields%n"
022    else when enum = 2
023      output "Incorrect field names received%n"
024    else
025      output "Some wierd error%n"
026    done
027    output "</BODY>%n"
028    output "</HTML>%n"
029  
030  global stream formData variable initial-size 0
031  
032  process-start
033    cgiGetQuery into formData
034    output "Content-type: text/html%n%n"
035  
036  ; deal with form errors
037  process
038    do unless number of formData >= 1  ; pizza type plus topppings
039      showError( 1 )
040      halt
041    done
042  
043    do unless formData has key "pizzatype"
044      showError( 2 )
045      halt
046    done
047  
048  ; output confirmation
049  process
050    output "<HTML>%n"
051    output "<HEAD>%n"
052    output "<TITLE>Order Confirmation!</TITLE>%n"
053    output "</HEAD>%n"
054    output "<BODY>%n"
055    output "<H2>Thanks for your order.</H2>%n"
056    output "<P>We are now cooking your "
057    output "<STRONG>" || formData key "pizzatype" || "</STRONG>%n"
058    output " Pizza</P>%n"
059  
060    do when formData has key "anch"
061      output "Lots of anchovies are being added!<BR>%n"
062    done
063  
064    do when formData has key "pina"
065      output "Ripe pinapple slices are going on.<BR>%n"
066    done
067  
068    do when formData has key "garl"
069      output "We are smothering your pizza in garlic sauce.<BR>%n"
070    done
071  
072    output "</BODY>%n"
073    output "</HTML>%n"