SGML (Standard Generalised Markup Language) is a meta language. That is, it is used to define other languages. A language defined with SGML is then used to mark up files which are instances of a particular document type. Work usually starts with a need to define the structure of a set of similar documents or a set of data records. Sample data or documents are analysed to determine their common structural components (elements), to define the order these elements can appear, which ones are compulsory, the nesting of elements within elements, and which elements are qualified with attributes.
After the analysis, a Document Type Definition (DTD) is designed. The DTD is written in the SGML (meta) language and exactly defines the structure of any instance document according to the decisions above. Then instances of the document type are created using SGML markup. These instances can be checked for validity (parsed) to make sure they conform to the specifications in the DTD. Once any conforming instance is available it can be stored, translated, searched, merged or rendered into virtually any other required format. Importantly, the SGML instance is completely owned by the author or organisation who designed and created it. SGML is a non-proprietary standard so no particular commercial product need be used to save or read the instance and the data is not locked up behind any commercially secret file format. It is for this reason that SGML (and latterly XML) is considered most favourably when there is a need to share or transmit data between organisations, for storing data which needs regular updating, to produce documents or records in other formats in real time or to support information systems and web sites.
Many commercial products can be used to read and process SGML, but this does not mean that a particular commercial product has any copyright over it. Locally produced or in-house software can be developed reasonably easily to read and process SGML data and many free software products are available to parse and/or process it. OmniMark is one of these. It provides a built-in parser for both SGML and XML and a programming language which can accurately process SGML and XML instances.
At the markup level, an SGML instance is just an ASCII text file. It can be created or edited with any text editor or word processor and is equally useable by any current operating system or hardware platform. It is often said that the programming language Java produces 'platform independent software'. In a similar way, SGML (or XML) can be considered to produce 'software independent data'.
The basic structural component of any SGML instance is an element which is marked up with a starttag, an endtag between which is content. A sample is shown here and would be called a 'NAME' element:
<NAME>Wally Walpaper</NAME>
The content of the element as shown is data content because it contains just raw text. An equally valid way to structure a person's name might be as follows:
<NAME> <FIRST>Wally</FIRST> <LAST>Wallpaper</LAST> </NAME>
In which case the NAME element contains element content and we say that the elements FIRST and LAST are nested inside the NAME element.
Any element can be qualified with the addition of attributes. Attributes appear in the starttag only and take the form of an attribute name, an equal symbol ('='), and an attribute value. Below the NAME element contains a single attribute called TITLE:
<NAME TITLE="Mr"> <FIRST>Wally</FIRST> <LAST>Wallpaper</LAST> </NAME>
Elements can be repeated an arbitrary number or a fixed number of times, and elements can be completely optional. Attributes can be optional too. The DTD specifies these structural rules precisely. The following fragment contains multiple NAMEs and shows slight variations of structure. Without access to the DTD it is impossible to say which parts are valid and which are not.
<PEOPLE DATE="15 6 2000"> <NAME TITLE="Mr"> <FIRST>Wally</FIRST> <LAST>Wallpaper</LAST> </NAME> <NAME> <LAST>Jackson</LAST> </NAME> <NAME TITLE="Dr"> <FIRST>Susan</FIRST> <MIDDLE>Ramsay</MIDDLE> <LAST>Sukie</LAST> </NAME> </PEOPLE>
To resolve these differences, an SGML processing tool needs to know exactly what is allowed and what is not and so needs access to the DTD. A reference to the DTD is normally provided as the first piece of markup in the instance with a doctype declaration which indicates the location of the file containing the DTD. With the doctype declaration included, the previous instance would become:
<!DOCTYPE PEOPLE SYSTEM "people.dtd"> <PEOPLE DATE="15 6 2000"> <NAME TITLE="Mr"> <FIRST>Wally</FIRST> <LAST>Wallpaper</LAST> </NAME> <NAME> <LAST>Jackson</LAST> </NAME> <NAME TITLE="Dr"> <FIRST>Susan</FIRST> <MIDDLE>Ramsay</MIDDLE> <LAST>Sukie</LAST> </NAME> </PEOPLE>
As well as the location of the DTD, the doctype declaration also declares the name of the topmost element (the root element) of the instance. Once the root element and the DTD are available, an SGML processor can establish an element tree for the document type. An element tree for the above instance can be considered as:
PEOPLE
|
NAME
|
FIRST
MIDDLE
LAST
In the context of this booklet it is not crucial to completely understand nor be able to produce DTDs. I present one here just for completeness. A document type definition for the above 'people' instance could be as follows, and would be stored in a file called 'people.dtd'
001 <!ELEMENT people - - (name+)> 002 <!ATTLIST people date NUMBERS #REQUIRED> 003 004 <!ELEMENT name - - (first?, middle?, last)> 005 <!ATTLIST name title CDATA #IMPLIED> 006 007 <!ELEMENT first - - (#PCDATA)> 008 <!ELEMENT middle - - (#PCDATA)> 009 <!ELEMENT last - - (#PCDATA)>
Lines 1 and 2 define the element PEOPLE which contains one or more NAME elements and has a compulsory attribute called DATE whose value must contain only numerical values.
Line 4 defines the NAME element as containing one optional FIRST element, followed by one optional MIDDLE element followed by exactly one compulsory LAST element. In the NAME element a non-compulsory attribute called TITLE is allowed which can contain ordinary text. This is defined on line 5.
Line 7, 8 and 9 specify that the content of the elements FIRST, MIDDLE and LAST is ordinary text data.
The symbols '- -' which appear between an element name and its content model specify that both a starttag and an endtag are required for the element.
In these topics I present some sample OmniMark programs to process SGML documents. The depth to which I discuss this processing is quite limited. I will deal with the elementary and fundamental principles only. At this level the OmniMark programs which process the documents are fairly small.
SGML processing in OmniMark almost always involves first parsing a source document to check that the markup conforms to the specifications in the DTD followed by element rules which fire as each of the elements in the document stream through the program.
A minimal OmniMark program to parse an SGML file is presented here:
[Code Sample: C04T02a.xom]
001 ; minimal parsing program 002 003 process 004 do sgml-parse document 005 scan file "people.sgml" 006 output "%c" 007 done 008 009 element #implied 010 suppress
In this program, the action starting on line 4 calls the OmniMark parser on an SGML document. For our purposes, an SGML 'document' is an SGML instance with an associated DTD.
Line 5 contains an instruction indicating which SGML file should be scaned. The file 'people.sgml' contains the markup and, in this case, contains the instance used as an example in the previous topic.
On line 9, is the single element rule of the program. This element rule is the most general of all and the word '#implied' means every element seen in the input stream for which there is no specific rule provided. Since there are no other (more specific) rules in this program, the rule on line 9 fires for all elements in the instance. The action on line 10, 'suppress', causes any more specific rules to be disabled.
It is a principle of OmniMark SGML processing programs that there must be an element rule provided for every element in the instance. The
element #implied
rule ensures this principle is upheld.
The file 'people.sgml' does, in fact, conform to the DTD in the file 'people.dtd' so OmniMark produces no error messages. To demonstrate what happens when there are errors, consider the following incorrect instance of the PEOPLE document type. Can you see the error before reading on?
<!DOCTYPE PEOPLE SYSTEM "people.dtd"> <PEOPLE> <NAME TITLE="Mr"> <FIRST>Wally</FIRST> <LAST>Wallpaper</LAST> </NAME> <NAME> <LAST>Jackson</LAST> </NAME> <NAME TITLE="Dr"> <FIRST>Susan</FIRST> <MIDDLE>Ramsay</MIDDLE> <LAST>Sukie</LAST> </NAME> </PEOPLE>
When the OmniMark parsing program is run the following error message is produced:
omnimark -- Markup Error (0259) on line 2 in file Markup Stream: In a start tag or ENTITY declaration, every required attribute must be given a value. In the start tag for element "PEOPLE", the REQUIRED attribute "DATE" is not specified. There was 1 SGML error detected.
This message indicates that the DATE attribute of the PEOPLE element is missing (did you pick it?). OmniMark knows that the DATE attribute is compulsory from line 2 in the DTD. Here is another incorrect version of the instance. See if you can detect the error before reading on.
<!DOCTYPE PEOPLE SYSTEM "people.dtd"> <PEOPLE DATE="15 6 2000"> <NAME TITLE="Mr"> <FIRST>Wally</FIRST> <LAST>Wallpaper</LAST> </NAME> <NAME> <LAST>Jackson</LAST> </NAME> <NAME TITLE="Dr"> <MIDDLE>Ramsay</MIDDLE> <FIRST>Susan</FIRST> <LAST>Sukie</LAST> </NAME> </PEOPLE>
Here the error message is less obvious but states:
omnimark -- Markup Error (0056) on line 12 in file Markup Stream: A start tag must not be used if the element is neither allowed by the current content model or by an inclusion. The element is "FIRST". There was 1 SGML error detected.
which actually means that if Susan's middle name appears, it can only be followed by her last name, not here first name. OmniMark knows the right order of FIRST, MIDDLE and LAST from line 4 of the DTD.
Mostly we want to process SGML files to extract or render the content in some way. Suppose we want to output all the last names of all the people in our sample SGML instance. To get to the LAST element we need to process the PEOPLE element and all of the NAME elements in order to get down into the tree structure to the level of the LAST element. When we reach the LAST element we want to output its content. We do not want to process the FIRST or the MIDDLE elements at all. The following program does this in a minimal way:
[Code Sample: C04T03a.xom]
001 ; Output last names 002 003 process 004 do sgml-parse document 005 scan file "people.sgml" 006 output "%c" 007 done 008 009 element people 010 output "%c" 011 012 element name 013 output "%c" 014 015 element last 016 output "%c%n" 017 018 element #implied 019 suppress
The element rule on line 9 fires when the PEOPLE element come through the input stream and its action
output "%c"
means 'now process this element's content'. This technique is also used by the rule for NAME elements on line 12 and 13.
The LAST element is processed on line 15 when the next LAST element comes through. The content of all LAST elements is ordinary text so the action on line 16 writes each last name data onto the output stream followed by a newline.
The 'catch-all' rule on line 18 is fired for all elements which don't have explicit rules of their own. It is necessary due to the principle that all elements must have matching rules.
The output from this program is
Wallpaper Jackson Sukie
What is not immediately apparent is that after the rules for PEOPLE and NAME fire the program is actually working inside the PEOPLE element and also inside one of the NAME elements. When the FIRST, MIDDLE and LAST elements are processed, control returns to the current NAME. When all the NAME elements are processed control returns to the PEOPLE element. One way of understanding this is to realise that we are traversing a tree structure. We traverse down the tree as each element opens and return up the tree to that element as it closes. Another way to deal with the concept is to see the processing as stack based, each time we start a new element it is pushed onto a stack and when its content is exhausted we pop back to the parent element.
This stack or tree behaviour can be demonstrated by generating output before and after the content of each element is processed. The following program is a modification of the previous one.
[Code Sample: C04T03b.xom]
001 ; Output last names 002 003 process 004 do sgml-parse document 005 scan file "people.sgml" 006 output "%c" 007 done 008 009 element people 010 output "-Starting the PEOPLE element%n" 011 output "%c" 012 output "-Ending the PEOPLE element%n" 013 014 element name 015 output "--Starting a NAME element%n" 016 output "%c" 017 output "--Ending a NAME element%n%n" 018 019 element last 020 output "---Starting a LAST element%n" 021 output "%c%n" 022 output "---Ending a LAST element%n" 023 024 element #implied 025 suppress
for which the output is:
-Starting the PEOPLE element --Starting a NAME element ---Starting a LAST element Wallpaper ---Ending a LAST element --Ending a NAME element --Starting a NAME element ---Starting a LAST element Jackson ---Ending a LAST element --Ending a NAME element --Starting a NAME element ---Starting a LAST element Sukie ---Ending a LAST element --Ending a NAME element -Ending the PEOPLE element
It is a principle of SGML that elements in an instance form a tree and that any processing of them includes stack-based behaviour.
Knowing the stack behaviour makes the positioning of output actions clear. The sample program below outputs a heading before the list of last names and the number of names after the list. Note carefully where the counting is done and where heading and total are output.
[Code Sample: C04T03c.xom]
001 ; Output last names with a
002 ; heading and total
003
004 process
005 do sgml-parse document
006 scan file "people.sgml"
007 output "%c"
008 done
009
010 global counter numNames initial {0}
011
012 element people
013 output "List of last names.%n"
014 output "%c"
015 output "There were %d(numNames) last names listed.%n"
016
017 element name
018 output "%c"
019
020 element last
021 increment numNames
022 output "%c%n"
023
024 element #implied
025 suppress
The output is:
List of last names. Wallpaper Jackson Sukie There were 3 last names listed.
Adding the output of people's full names is a simple modification and is presented here:
[Code Sample: C04T03d.xom]
001 ; Output full names with a
002 ; heading and total
003
004 process
005 do sgml-parse document
006 scan file "people.sgml"
007 output "%c"
008 done
009
010 global counter numNames initial {0}
011
012 element people
013 output "List of names.%n"
014 output "%c"
015 output "There were %d(numNames) names listed.%n"
016
017 element name
018 output "%c"
019
020 element last
021 increment numNames
022 output "%c%n"
023
024 element first
025 output "%c "
026
027 element middle
028 output "%c "
and outputs:
List of names. Wally Wallpaper Jackson Susan Ramsay Sukie There were 3 names listed.
Note that the element rules don't have to placed in any particular order in a program's source code. They do not necessarily fire in the order they are written. Just as in pattern matching programs, it is a principle of OmniMark that the order element rules fire is solely determined by the order the elements come in from the SGML input stream. Also note that an 'element #implied' rule is not needed in the above program because there is an explicit rule provided for every possible element in the document type. An implied rule is still allowed in all programs.
Some of the interesting data in an SGML instance can be held in attributes and their values. OmniMark provides easy access to these by using the format modifier '%v' which, when used on an attribute's name, converts its value into a stream ready for output or further processing.
The following sample is another modification of earlier programs. It tries to output each person's last name prefixed by their title. The title information is held in the attribute TITLE in each NAME element's starttag. Note that this program contains an error because some NAME elements do not specify a TITLE attribute.
[Code Sample: C04T04a.xom]
001 ; Output last names with titles 002 ; NOTE: contains an error! 003 004 process 005 do sgml-parse document 006 scan file "people.sgml" 007 output "%c" 008 done 009 010 element people 011 output "%c" 012 013 element name 014 output "%v(title) " 015 output "%c" 016 017 element last 018 output "%c%n" 019 020 element #implied 021 suppress
The program attempts to output the value of the TITLE attribute of the NAME element on line 14, followed by a space, before each of the last names. When the program is run, the error message obtained is an OmniMark error rather than an SGML error, and reads
omnimark -- OmniMark Error 6037 on line 13 in file C04T04a.xom: Attempting to access #IMPLIED attribute value. For element 'NAME': For attribute 'TITLE'. There was 1 error detected.
What this means is that, on line 14, we are outputting the value of the attribute TITLE and that there are some NAME elements which don't have one specified. If you check line 5 of the DTD you will see that the TITLE attribute is marked as '#IMPLIED' which means that it is not compulsory. When an attribute is implied, the implication is that if it is not specified then the processing program should take responsibility. Since it is our program that is doing the processing, we have to take some action when the TITLE attribute is missing.
The following (correct) program does this by checking if the attribute is available for each NAME. If there is a TITLE, we output it, if not we output no title at all.
[Code Sample: C04T04b.xom]
001 ; Output last names with titles 002 ; Note: some names don't have titles 003 004 process 005 do sgml-parse document 006 scan file "people.sgml" 007 output "%c" 008 done 009 010 element people 011 output "%c" 012 013 element name 014 output "%v(title) " when attribute title is specified 015 output "%c" 016 017 element last 018 output "%c%n" 019 020 element #implied 021 suppress
and the output is:
Mr Wallpaper Jackson Dr Sukie
It is a principle of SGML that attributes marked '#REQUIRED' in the DTD must be present and contain legal values (or else the instance won't parse); and that attributes marked '#IMPLIED' in the DTD may not be specified and it is the responsibility of any processing application to deal with this situation appropriately. Note that the DATE attribute of the PEOPLE element is required. An application does not need to check for its availability. If the DATE was implied and not specified, an application would probably insert today's date as a replacement.
Tests can be easily applied to attribute values. The following program appends the string 'Ph. D' onto the names of people whose TITLE attribute is 'Dr'.
[Code Sample: C04T04c.xom]
001 ; Add Ph.D to Drs 002 003 global switch isDoctor 004 005 process 006 do sgml-parse document 007 scan file "people.sgml" 008 output "%c" 009 done 010 011 element people 012 output "%c" 013 014 element name 015 do when attribute title is specified 016 do when "%v(title)" = "Dr" 017 activate isDoctor 018 else 019 deactivate isDoctor 020 done 021 output "%v(title) " 022 done 023 output "%c" 024 025 element last 026 output "%c" 027 output " Ph. D" when isDoctor 028 output "%n" 029 030 element #implied 031 suppress
The actual attribute test is done on line 16.
It is a principle of OmniMark programming that every element rule must deal with the content of the element.
Notice that every element rule in every program in this chapter has either an 'output "%c"' action or a 'suppress' action. OmniMark insists that every rule either processes its element's content or suppresses it.
In SGML, elements can be declared as having no content - these are 'empty' elements and are marked up with a starttag but with no endtag. Even though these elements have by definition no content, any element rule which processes them must still either process their content or suppress it.
Further, OmniMark allows an element rule to deal with the element's content only once. Having more than one 'output "%c"' action causes an error.
When the output of an element's content is dependent on some condition, we can easily write a selection statement which outputs the content in one branch and suppress the content in the other branch, like this:
001 element someelement 002 do when ..... 003 output "%c" 004 else 005 suppress 006 done
If the condition is true, this element's content (all its child elements and data content) will be processed, If the condition is false, then none of the element rules for child elements will fire and no data content for them will be output.
The following program outputs two copies of the last name of people who's title is "Mr". It performs a test on the TITLE attribute to detect the "Mr" value. NAMEs who are "Mr" have their content output twice. Since we are only allowed to actually process the content once, we must do this by saving it into a stream variable, then output the stream twice.
[Code Sample: C04T05a.xom]
001 ; Two copies of last names for Mr 002 003 process 004 do sgml-parse document 005 scan file "people.sgml" 006 output "%c" 007 done 008 009 element people 010 output "%c" 011 012 element name 013 local stream theContent 014 set theContent to "%c" ;; process once only 015 016 do when attribute title is specified 017 do when "%v(title)" = "Mr" 018 output "%g(theContent) - " ;; output No. 1 019 output "%g(theContent)%n" ;; output No. 2 020 done 021 done 022 023 element last 024 output "%c" 025 026 element #implied 027 suppress
Since element data content is usually just text, it is difficult to do tests on it as was done with attribute values. Often we need to scan the content to locate certain values in it or to extract parts of it. OmniMark provides a way to apply pattern matching rules to any stream within an element rule.
As a gratuitous example, suppose we want to output just the first three characters of each person's last name. We proceed by writing a program to output just last names and inside the element rule for each last name we include a 'do scan' to apply pattern matching to the content.
[Code Sample: C04T05b.xom]
001 ; First three chars of last names
002
003 process
004 do sgml-parse document
005 scan file "people.sgml"
006 output "%c"
007 done
008
009 element people
010 output "%c"
011
012 element name
013 output "%c"
014
015 element last
016 do scan "%c"
017 match any{3} => firstThree
018 output "%x(firstThree)%n"
019 done
020
021
022 element #implied
023 suppress
The entire content of the LAST element is processed on line 16. The 'do scan' action submits the content to its 'match' patterns. The 'do scan' activates each of its match actions once only from the first character in the stream and the scan exits when any match fails or the stream is exhausted.
To make use of the full power of the pattern matching find rules seen in Chapter 3, within an element rule, a 'repeat scan...again' structure can be used. As the content of the people instance does not contain any interesting data to scan, the program below uses the technique to scan for each of the separate numbers which make up the value of the DATE attribute of the PEOPLE element. Each of the three numbers are found by repeatedly scanning the date for sequences of digits and a check is performed to ensure there are exactly three numbers present. If there are three numbers they are output with slashes between them; if not then a message is written onto the standard error stream.
[Code Sample: C04T05c.xom]
001 ; Check and output the date of the
002 ; PEOPLE element. Uses repeat-scan
003
004 process
005 do sgml-parse document
006 scan file "people.sgml"
007 output "%c"
008 done
009
010 element people
011 local counter check3 initial {0}
012 local stream formDate
013 open formDate as buffer ;; ie a stream buffer
014
015 repeat scan "%v(date)"
016 match digit+ => nextNum
017 put formDate nextNum ;; write number onto the buffer
018 increment check3
019 put formDate "/" when check3 < 3
020
021 match any
022 ; consume space
023 again
024
025 close formDate ;; close the buffer
026
027 do when check3 = 3
028 output "Date of People instance is %g(formDate)%n"
029 else
030 put #error "Date of People instance appears to be invalid: %g(formDate)%n"
031 done
032
033 suppress
034
035
036 element #implied
037 suppress
Lines 13, 17 and 25 show a technique of opening a stream variable as a buffer. This allows information to easily be appended to the stream with the 'put' action. The repeat scan loop goes from line 15 to line 23. Each time through this loop a sequence of digits is matched (line 16), and processed and any other character is matched with line 21. A repeat scan loop always terminates when one of its match actions fails or when the input stream is exhausted - whichever comes first.
Given an unfamiliar SGML instance to process, you could use these guidelines to help organise the processing.
Below is a tiny SGML DTD and an instance. Write an OmniMark program to parse the instance against the DTD, locate any SGML errors, fix them by editing the instance and parse again.
You should store the instance in a file called 'movie.sgml' and the DTD in a file called 'movie.dtd' and place these files together in the directory as your OmniMark parsing program.
The DTD is
<!ELEMENT movie - - (name,length,rating)> <!ELEMENT name - - (#PCDATA)> <!ELEMENT length - - (#PCDATA)> <!ATTLIST length units (min|hour) min> <!ELEMENT rating - - (#PCDATA)>
An instance is
<!DOCTYPE movie SYSTEM "movie.dtd"> <MOVIE> <NAME>Gladiator</NAME> <RATING>MA</RATING> <LENGTH UNITS="min">195</LENGTH>
The following DTD is contained in a file called 'transact.dtd' and specifies records which hold data about various transactions done with a credit card.
<!ELEMENT transact - o (purchase|payment)*> <!ELEMENT purchase - o (from, amount)> <!ATTLIST purchase date NUMBERS #REQUIRED> <!ELEMENT payment - o (amount)> <!ATTLIST payment date NUMBERS #REQUIRED> <!ELEMENT from - o (#PCDATA)> <!ELEMENT amount - o (#PCDATA)>
Transactions are either purchases containing a vendor and an amount or payments just containing an amount. Each purchase and each payment contains a date as an attribute.
The sample below, in a file called 'transact.sgml' shows a conforming instance of the above DTD.
<!DOCTYPE transact SYSTEM "transact.dtd"> <TRANSACT> <PURCHASE DATE="1 3 1999"> <FROM>Toyworld <AMOUNT>13067 <PURCHASE DATE="23 3 1999"> <FROM>Gowings <AMOUNT>9713 <PURCHASE DATE="4 4 1999"> <FROM>Coles Myer <AMOUNT>10600 <PAYMENT DATE="18 4 1999"> <AMOUNT>25000 <PURCHASE DATE="21 7 1999"> <FROM>City Fit <AMOUNT>34500 <PURCHASE DATE="30 8 1999"> <FROM>Frank's Auto Shop <AMOUNT>1050 <PURCHASE DATE="17 9 1999"> <FROM>Mobil <AMOUNT>3500 <PAYMENT DATE="18 9 1999"> <AMOUNT>40000 <PURCHASE DATE="30 10 1999"> <FROM>Travel Land <AMOUNT>56265
Write an OmniMark program which outputs the number of purchases and the number of payments in the instance.
Write a program to report the total amount for purchases and the total amount for payments and the balance owing on the account. Assume a zero balance at the beginning of the transactions. All amounts are in cents.
It should be apparent from the DTD that an AMOUNT element can occur within a purchase and also within a payment. OmniMark can distinguish between these by using the test
... when parent is ...
Write an OmniMark program to output from the transaction instance the names of the vendors (the FROM element) for purchases made after June 1999. Use a 'scan' of each purchase date to capture the month and year. If these are respectively after 6 and 1999, then process the content of the purchase, otherwise suppress it. No information about payments is required in the output.
The usual way of viewing credit card information is via a statement. Below is a the output of an OmniMark program which produces a simple statement from the 'transact.sgml' instance. It has a tab character between each column and thus could easily be imported or copied into a spreadsheet file.
Date Transaction Vendor Amount Balance 1 3 1999 Purchase Toyworld 13067 13067 23 3 1999 Purchase Gowings 9713 22780 4 4 1999 Purchase Coles Myer 10600 33380 18 4 1999 Payment - ThankYou 25000 8380 21 7 1999 Purchase City Fit 34500 42880 30 8 1999 Purchase Frank's Auto Shop 1050 43930 17 9 1999 Purchase Mobil 3500 47430 18 9 1999 Payment - ThankYou 40000 7430 30 10 1999 Purchase Travel Land 56265 63695
Here the figure in the balance column of each row is obtained by adding the previous amount if it belongs to a purchase and subtracting it if it belongs to a payment.
Write the OmniMark program to produce this statement.
A conforming instance of the document type MOVIE is presented here:
<!DOCTYPE movie SYSTEM "movie.dtd"> <MOVIE> <NAME>Gladiator</NAME> <LENGTH UNITS="min">195</LENGTH> <RATING>MA</RATING> </MOVIE>
The errors in the original version were:
A minimal parsing program was presented in subtopic 4.2.1 in this chapter.
No data content is output by this program so all rules except the transaction rule use a 'suppress' action. Two global counters are used to count purchases and payments. The output is generated after all transactions have been processed.
[Code Sample: C04S02.xom]
001 ; Count purchases and payments
002
003 global counter numPurch initial {0}
004 global counter numPay initial {0}
005
006 process
007 do sgml-parse document
008 scan file "transact.sgml"
009 output "%c"
010 done
011
012 element transact
013 output "%c"
014 output "There were %d(numPurch) purchases.%n"
015 output "There were %d(numPay) payments.%n"
016
017 element purchase
018 increment numPurch
019 suppress
020
021 element payment
022 increment numPay
023 suppress
024
025 element #implied
026 suppress
001 ; Calculate the balance owing
002
003 global counter totalPurch initial {0}
004 global counter totalPay initial {0}
005 global counter balance
006
007 process
008 do sgml-parse document
009 scan file "transact.sgml"
010 output "%c"
011 done
012
013 element transact
014 output "%c"
015 output "Total of purchases: %d(totalPurch)%n"
016 output "Total of payments: %d(totalPay)%n"
017 set balance to totalPurch - totalPay
018 output "Balance is %d(balance)%n"
019
020
021
022 element purchase
023 output "%c"
024
025 element payment
026 output "%c"
027
028 element amount
029 local counter thisAmount
030 set thisAmount to "%c"
031 increment totalPurch by thisAmount when parent is purchase
032 increment totalPay by thisAmount when parent is payment
033
034
035 element #implied
036 suppress
The output of this program for the sample instance is
Total of purchases: 128695 Total of payments: 65000 Balance is 63695
The output from my sample solution to this task is:
Purchases after 6/1999 21 7 1999: City Fit 30 8 1999: Frank's Auto Shop 17 9 1999: Mobil 30 10 1999: Travel Land
Here is my program:
[Code Sample: C04S04.xom]
001 ; Purchases after June 1999
002
003 global counter numVendors initial {0}
004 global counter afterMonth initial {6}
005 global counter afterYear initial {1999}
006
007 process
008 do sgml-parse document
009 scan file "transact.sgml"
010 output "%c"
011 done
012
013 element transact
014 output "Purchases after %d(afterMonth)/%d(afterYear)%n"
015 output "%c"
016
017 element purchase
018 do scan "%v(date)"
019 match digit+ space+ digit+ => month space+ digit+ => year
020 do when month > afterMonth AND year >= afterYear
021 output "%v(date): %c"
022 else
023 suppress
024 done
025 done
026
027 element from
028 output "%c%n"
029
030 element #implied
031 suppress
In this program I have placed the critical month (6) and year (1999) into global variables. Doing this makes it possible to set the values of these counter variables from the command line. For example, a command line such as
omnimark -sb program.xom -c afterMonth 7 -c afterYear 2000
Would run the program with the critical month being 7 and year 2000.
This program produces the credit card statement shown in the task.
[Code Sample: C04S05.xom]
001 ; Statement style report
002
003 global counter balance initial {0}
004
005 process
006 do sgml-parse document
007 scan file "transact.sgml"
008 output "%c"
009 done
010
011 element transact
012 output "Statement Report%n"
013 output "----------------%n"
014 output "Date%tTransaction%tVendor%tAmount%tBalance%n"
015 output "%c"
016
017
018 element purchase
019 output "%v(date)%tPurchase"
020 output "%c"
021
022 element from
023 output "%t%c"
024
025 element payment
026 output "%v(date)%tPayment - ThankYou%t"
027 output "%c"
028
029 element amount
030 local counter thisAmount
031 set thisAmount to "%c"
032 increment balance by thisAmount when parent is purchase
033 decrement balance by thisAmount when parent is payment
034 output "%t%d(thisAmount)%t%d(balance)%n"
035
036
037 element #implied
038 suppress
039