OmniMark started life as a program called XTran in the mid 1980s, owned and supported by a company called Exoterica in Canada.
OmniMark and the programming facilities in it provided strong pattern matching rules which could be used to extract data from arbitrary files and write SGML output.
Later, Exoterica changed its name to OmniMark and a free cut-down version of the language, known as 'OmniMark Lite' was released so that more users could become aware of the product.
In 1999, OmniMark was released as a completely free and fully functional programming language in order to gain popular support and use in the Information Technology community. In version 5 of the language, an Integrated Development Environment for the language was launched for use on PCs running Windows. The IDE provides a front-end for OmniMark and includes project management and a fully-featured debugger. The command-line version, called 'OmniMark C/VM' (Compiler/Virtual Machine) remains completely free and a free 10 day trial version of the IDE version is available. A full, unlimited version of the IDE can be purchased from the OmniMark corporation.
Version increments since then have seen the inclusion of many function libraries which make OmniMark a truly general purpose programming language. The SGML parser now also includes a validating XML parser. The language is still evolving and the technical staff and policy makers involved are keen to see new features and included as they are needed.
Before dealing with some of the features and general principles of the OmniMark language in the coming chapters, I present here a snippet of code. If you already have programming or scripting experience, even if you have never seen OmniMark code before, you will be able to see what is going on here.
[Code Sample: C01T03a.xom]
001 process
002 local counter num initial {1}
003 output "Hello%n"
004 repeat
005 output "Hello again%n"
006 increment num
007 exit when num = 10
008 again
OmniMark has an edge over other programming languages for any tasks which involve pattern matching and SGML/XML processing. Even without these special qualities, it can be effectively used for any general tasks associated with processing streams of data from files or from sockets etc.
OmniMark is quick to write, the syntax is mostly easy and natural to read and write, it has good error catching, good error messages and runs very quickly. It is the quickest way to solve many day-to-day data processing tasks. The following topics provide a quick overview of its use and provide some very small fragments of code with which you can gain a basic familiarity with the language.
To extract the right information from a stream or file requires that the data be searched. We can find the first occurrence of some string of characters in a text file with word processing or text editor software but we have to sequentially click a 'Find Next' button each time we want to locate the next matching string. OmniMark's find rules allow text (or binary data) to be located in a similar way but we can search for and capture all the occurrences of any pattern which matches various character classes.
Once data is matched, it can be extracted, manipulated, written out, used for calculations, counted and reformatted in a very flexible way. OmniMark's pattern matching strength is second to none and does not require the programmer to remember or type terse pattern syntax. It can also be used to lift legacy data (eg. old documents) up into SGML or XML.
As an example, suppose a text file called 'timetable.dat' contains the complete timetable for a large university. A tiny fragment of the file is shown below, the actual file is very large and covers several hundred subjects taught in several hundred rooms throughout any academic week.
EEB121 THE E/C PROFESSION: AN INTRO Subject co-ordinator: L. Harrison L Mon 1300 - 1350 S15 - 2.05 T1 Wed 1400 - 1450 C02 - 112 T2 Wed 1300 - 1350 C02 - 112 T2 Thu 1300 - 1350 S01 - 102 T1 Thu 0900 - 0950 S01 - 101 T3 Thu 1000 - 1050 S01 - 101 T3 Thu 1400 - 1450 C03 - 403 EEB322 ISSUES IN CARE & EDUCATION Subject co-ordinator: T. Simpson L Tue 0900 - 0950 S01 - 102 T1 Tue 1100 - 1250 C08 - 1.04 T2 Tue 1400 - 1550 C08 - 1.04
A list of all the times a particular room (say S01-102) is used might be needed. Finding this information is difficult to do manually because the whole timetable is sorted by subject, not by room. To find, collect and display the list of times we need to find all occurrences of the sequence 'S01 - 102' in the file and output the day and time information for these occurrences. By inspection we can identify some patterns which can be used to design the search:
An OmniMark find rule to locate and capture the day and time information might be:
[Code Sample: C01T05a.xom]
001 process
002 submit file "timetable.dat"
003
004 find line-start any{2} white-space+
005 (letter{3} white-space+
006 digit{4} white-space+
007 "-"
008 white-space+
009 digit{4}) => dayAndTime
010 white-space+ "S01 - 102"
011 output "%x(dayAndTime)%n"
012
013 find any
014
Here the 'find any' rule (on line 13) consumes all characters not found by the first find rule so that the only output is that delivered by the statement on line 11; that is, all the days and times used for room S01-102.
OmniMark has built into it an industry strength SGML and XML parser. This means it can be used to check if SGML documents conform to the rules defined in a particular Document Type Definition (DTD). During parsing, any non-conforming markup in an SGML or XML file are located, and comprehensive error messages are issued by OmniMark.
As well as parsing, OmniMark allows any SGML or XML document to be translated into any other arbitrary format. A fragment of XML is shown below. It contains a group of people:
<!DOCTYPE PEOPLE SYSTEM "people.dtd"> <PEOPLE> <NAME>Mary Smith</NAME> <CITY PCODE="2795">Bathurst</CITY> <COUNTRY>Australia</COUNTRY> <NAME>Wally Wallpaper</NAME> <CITY PCODE="2222">Hurstville</CITY> <COUNTRY>Australia</COUNTRY> <NAME>Sam Widge</NAME> <CITY PCODE="1234">Bangalore</CITY> <COUNTRY>India</COUNTRY> </PEOPLE>
An OmniMark program containing element rules can be written to process this XML. As a trivial example, the following rules output all the peoples' names and postcodes. Each name and corresponding postcode is placed on a separate line and a tab character is inserted between the name and the postcode. The output file is thus a tab-delimited file which could easily be imported into a spreadsheet.
[Code Sample: C01T06a.xom]
001 process 002 do xml-parse document 003 scan file "people.xml" 004 output "%c" 005 done 006 007 element people 008 output "%c" 009 010 element name 011 output "%c" 012 output "%t" 013 014 element country 015 suppress 016 017 element city 018 output "%v(pcode)%n" 019 suppress 020
With this kind of process, the XML (or SGML) data is streamed into OmniMark and is parsed against a DTD. Then each element is fed to the program. As the program sees each element, one of the element rules is fired and does the appropriate work with the element's content and/or attributes. Even without too much previous knowledge of SGML, XML or OmniMark the program should be reasonably easy to follow; the symbol %c is a reference to the content of each element and the %v symbol is a reference to the value of an attribute. The statement 'suppress' avoids firing rules for the content of an element.
Note that the programmer does not need to worry about low level details like finding angle brackets, element names or attributes in the raw data - OmniMark handles all of this and leaves the programmer with the high-level task of doing something with the information.
Since the launch of OmniMark version 5, a comprehensive suite of function libraries have been available for use in programs. These have helped to empower the language to produce Common Gateway Interface programs (for use with web servers), ODBC programs (to interact with database systems), Email, FTP and HTTP functions (for client or server applications across the net); and many more.
This booklet does not cover all the details of all these libraries, only the fundamental principles of OmniMark constructs and applications (see Chapter 5, Topic 6 ). However, a reasonable understanding of these principles should allow you to easily use any of the function libraries as needed.
OmniMark is available as a free download from www.omnimark.com.
The Windows (95/98/NT) version comes as a self-installing distribution file so all you have to do is download it, double click on it and follow the default prompts during the installation process.
There are two forms of the language available for Windows: the IDE (Integrated Development Environment) version which provides an editor, debugger and project manager; and the OmniMark C/VM version which provides just and executable MSDOS application.
Beginners will probably be most comfortable using the IDE - certainly as a starting point. The C/VM (Compiler-Virtual Machine) version can be used for more serious applications. The C/VM version (an executable called omnimark.exe) will run all the programs developed in the IDE but it requires that you specify command line arguments for input files and other options.
The UNIX or LinUX versions come as a .tar.Z distribution file. Once the file is uncompressed and (un)tar(ed), the installation is complete. Most users like to copy the binary executable file (omnimark) into a standard location for binaries on their file system - say /local/bin, where it can be called from the command line.
Use your web browser to visit OmniMark's web site at www.omnimark.com and browse through the site. It contains several sections with information about new products, new features, events, feedback and policies.
Download OmniMark for your system and install it. On Windows systems the IDE is a recommended starting point. You can also optionally download the C/VM version if you wish and set your MSDOS path to point to it.
For UNIX-style systems, download the appropriate distribution file for your local system. If you cannot install the package yourself, get your system administrator to help you do the installation and also help you set PATH environment variable to include the location of the binary executable file omnimark.