<Chapter Label=
"HowEnter">
<Heading>How To
Type a &GAPDoc; Document</Heading>
In this chapter we give a more formal description of what you need to start
to
type documentation in &GAPDoc;
XML format. Many details were already
explained by example in Section <Ref Sect=
"sec:3k+1expl"/> of the
introduction.<P/>
We do <E>not</E> answer the question <Q>How to <E>write</E> a &GAPDoc;
document?</Q> in this chapter. You can (hopefully) find an answer to
this question by studying the example in the introduction, see <Ref
Sect=
"sec:3k+1expl"/>, and learning about more details in the reference
Chapter <Ref Chap=
"DTD" />.<P/>
The definite source for all details of the official
XML standard with useful
annotations is:<P/>
<
URL>
https://www.xml.com/axml/axml.
html</
URL><P/>
Although this document must be quite technical, it is surprisingly well
readable.<P/>
<Section Label=
"EnterXML">
<Heading>General
XML Syntax</Heading>
We will now discuss the pieces of text which can occur in a general
XML
document. We start with those pieces which do not contribute to the actual
content of the document.
<Subsection Label=
"XMLhead">
<Heading>Head of
XML Document</Heading>
Each
XML document should have a head which states that it is an
XML document
in some
encoding and which XML-defined
language is used. In case of a
&GAPDoc; document this should always look as in the following example.
<Listing
Type=
"Example">
<![
CDATA[<?
xml version=
"1.0" encoding=
"UTF-8"?>
<!
DOCTYPE Book SYSTEM
"gapdoc.dtd">]]>
</Listing>
See <Ref Subsect=
"XMLenc"/> for a remark on the <Q>
encoding</Q>
statement.<P/>
(There may be local
entity definitions inside the <C>
DOCTYPE</C> statement,
see Subsection <Ref Subsect=
"GDent" /> below.)
</Subsection>
<Subsection Label=
"XMLcomment">
<Heading>Comments</Heading>
A <Q>comment</Q> in
XML starts with the character sequence
<Q><C><!--</C></Q> and ends with the sequence <Q><C>--></C></Q>. Between
these sequences there must not be two adjacent dashes <Q><C>--</C></Q>.
</Subsection>
<Subsection Label=
"XMLprocinstr">
<Heading>Processing Instructions</Heading>
A <Q>processing instruction</Q> in
XML starts with the character sequence
<Q><C><?</C></Q> followed by a name (<Q><C>
xml</C></Q> is only allowed
at the very beginning of the document to declare it being an
XML document,
see <Ref Subsect=
"XMLhead"/>). After that any characters may follow, except
that the ending sequence <Q><C>?></C></Q> must not occur within the
processing instruction.
</Subsection>
<P/>
And now we turn to those parts of the document which contribute to its
actual content.
<Subsection Label=
"XMLnames">
<Heading>Names in
XML and Whitespace</Heading>
A <Q>name</Q> in
XML (used for
element and attribute identifiers, see below)
must start with a letter (in the
encoding of the document) or with a
colon <Q><C>:</C></Q> or underscore <Q><C>_</C></Q> character. The
following characters may also be digits, dots <Q><C>.</C></Q> or dashes
<Q><C>-</C></Q>.<P/>
This is a simplified description of the rules in the standard, which are
concerned with lots of unicode ranges to specify what a <Q>letter</Q>
is.<P/>
Sequences only consisting of the following characters are considered as
<E>whitespace</E>: blanks, tabs, carriage return characters and new line
characters.
</Subsection>
<Subsection Label=
"XMLel">
<Heading>Elements</Heading>
The actual content of an
XML document consists of <Q>elements</Q>.
An
element has some <Q>content</Q> with a leading <Q>start tag</Q>
(<Ref Subsect=
"XMLstarttag"/>) and a trailing <Q>end tag</Q> (<Ref
Subsect=
"XMLendtag"/>). The content can contain further elements but they
must be properly nested. One can define elements whose content is always
empty, those elements can also be entered with a single combined tag (<Ref
Subsect=
"XMLcombtag"/>).
</Subsection>
<Subsection Label=
"XMLstarttag">
<Heading>Start Tags</Heading>
A <Q>start-tag</Q> consists of a less-than-character <Q><C><</C></Q>
directly followed (without whitespace) by an
element name (see <Ref
Subsect=
"XMLnames"/>), optional attributes, optional whitespace, and a
greater-than-character <Q><C>></C></Q>.<P/>
An <Q>attribute</Q> consists of some whitespace and then its name
followed by an equal sign <Q><C>=</C></Q> which is optionally enclosed by
whitespace, and the attribute value, which is enclosed either in single
or double quotes. The attribute value may not contain the
type of
quote used as a delimiter or the character <Q><C><</C></Q>, the
character <Q><C>&</C></Q> may only appear to start an
entity,
see <Ref Subsect=
"XMLent"/>. We describe
in <Ref Subsect=
"AttrValRules"/> how
to enter special characters in attribute values.<P/>
Note especially that no whitespace is allowed between the starting
<Q><C><</C></Q> character and the
element name. The quotes around an
attribute value cannot be omitted. The names of elements and attributes are
<E>case sensitive</E>.
</Subsection>
<Subsection Label=
"XMLendtag">
<Heading>End Tags</Heading>
An <Q>end tag</Q> consists of the two characters <Q><C></</C></Q>
directly followed by the
element name, optional whitespace and a
greater-than-character <Q><C>></C></Q>.
</Subsection>
<Subsection Label=
"XMLcombtag">
<Heading>Combined Tags for Empty Elements</Heading>
Elements which always have empty content can be written with a single
tag. This looks like a start tag (see <Ref Subsect=
"XMLstarttag"/>)
<E>except</E> that the trailing greater-than-character <Q><C>></C></Q>
is substituted by the two character sequence <Q><C>/></C></Q>.
</Subsection>
<Subsection Label=
"XMLent">
<Heading>
Entities</Heading>
An <Q>
entity</Q> in
XML is a macro for some substitution text. There are two
types of
entities. <P/>
A <Q>character
entity</Q> can be used to specify characters in the
encoding
of the document (can be useful for entering non-ASCII characters which you
cannot manage to
type in directly). They are entered with a sequence
<Q><C>&#</C></Q>, directly followed by either some decimal digits
or an <Q><C>x</C></Q> and some hexadecimal digits, directly followed by a
semicolon <Q><C>;</C></Q>. Using such a character
entity is just equivalent
to typing the corresponding character directly.<P/>
Then there are references to <Q>named
entities</Q>. They are entered with an
ampersand character <Q><C>&</C></Q> directly followed by a name which
is directly followed by a semicolon <Q><C>;</C></Q>. Such
entities must be
declared somewhere by giving a substitution text. This text is included in
the document and the document is parsed again afterwards. The exact rules
are a bit subtle but you probably want to use this only in simple cases.
Predefined
entities for &GAPDoc; are described in <Ref Subsect=
"XMLspchar"/>
and <Ref Subsect=
"GDent"/>.<P/>
</Subsection>
<Subsection Label=
"XMLspchar">
<Heading>Special Characters in
XML</Heading>
We have seen that the less-than-character <Q><C><</C></Q> and the
ampersand character <Q><C>&</C></Q> start a tag or
entity reference in
XML. To get these characters into the document text one has to use
entity references, namely <Q><C><</C></Q> to get <Q><C><</C></Q>
and <Q><C>&</C></Q> to get <Q><C>&</C></Q>. Furthermore
<Q><C>></C></Q> must be used to get <Q><C>></C></Q> when the string
<Q><C>]]></C></Q> appears in
element content (and not as delimiter of a
<C>
CDATA</C> section explained below).<P/>
Another possibility is to use a <C>
CDATA</C> statement explained
in <Ref Subsect=
"XMLcdata"/>.
</Subsection>
<Subsection Label=
"AttrValRules">
<Heading>Rules for Attribute Values</Heading>
Attribute values can contain
entities which are substituted recursively.
But except for the
entities < or a character
entity it is not
allowed that a < character is introduced by the substitution (there is
no
XML parsing for evaluating the attribute value, just
entity substitutions).
</Subsection>
<Subsection Label=
"XMLcdata">
<Heading><C>
CDATA</C></Heading>
Pieces of text which contain many characters which can be
misinterpreted as markup can be enclosed by the character sequences
<Q><C><![
CDATA[<![
CDATA[]]></C></Q> and <Q><C>]]></C></Q>. Everything
between these sequences is considered as content of the document and is not
further interpreted as
XML text. All the rules explained so far in this
section do <E>not apply</E> to such a part of the document. The only
document content which cannot be entered directly inside a <C>
CDATA</C>
statement is the sequence <Q><C>]]></C></Q>. This can be entered as
<Q><C>]]></C></Q> outside the <C>
CDATA</C> statement.
<Listing
Type=
"Example">
A nesting of tags like <![
CDATA[<a> <b> </a> </b>]]> is not allowed.
</Listing>
</Subsection>
<Subsection Label=
"XMLenc">
<Heading>
Encoding of an
XML Document</Heading>
We suggest to use the UTF-8
encoding for writing &GAPDoc;
XML documents.
But the tools described in Chapter <Ref Chap=
"ch:conv" /> also work
with ASCII or the various ISO-8859-X encodings (ISO-8859-1 is also
called latin1 and covers most special characters for western European
languages).
</Subsection>
<Subsection Label=
"XMLvalid">
<Heading>Well Formed and Valid
XML Documents</Heading>
We want to mention two further important words which are often used in the
context of
XML documents. A piece of text becomes a <Q>well formed</Q>
XML
document if all the formal rules described in this section are fulfilled.
<P/>
But this says nothing about the content of the document. To give
this content a meaning one needs a declaration of the
element and
corresponding attribute names as well as of named
entities which are
allowed. Furthermore there may be restrictions how such elements can be
nested. This <E>definition of an
XML based markup
language</E> is done in a
<Q>document
type definition</Q>. An
XML document which contains only
elements and
entities declared in such a document
type definition and obeys
the rules given there is called <Q>valid (with respect to this document
type
definition)</Q>.<P/>
The main file of the &GAPDoc; package is <F>gapdoc.dtd</F>. This contains
such a definition of a markup
language. We are not going to explain the
formal syntax rules for document
type definitions in this section. But in
Chapter <Ref Chap=
"DTD"/> we will explain enough about it to understand
the file <F>gapdoc.dtd</F> and so the markup
language defined there.
</Subsection>
</Section>
<Section Label=
"EnterGD">
<Heading>Entering &GAPDoc; Documents</Heading>
Here are some additional rules for writing &GAPDoc;
XML documents.
<Subsection Label=
"otherspecchar">
<Heading>Other special characters</Heading>
As &GAPDoc; documents are used to produce &LaTeX; and
HTML
documents, the question arises how to deal with characters with a
special meaning for other applications (for example
<Q><C>&</C></Q>,
<Q><C>#</C></Q>,
<Q><C>$</C></Q>,
<Q><C>%</C></Q>,
<Q><C>~</C></Q>,
<Q><C>\</C></Q>,
<Q><C>{</C></Q>,
<Q><C>}</C></Q>,
<Q><C>_</C></Q>,
<Q><C>^</C></Q>,
<Q><C> </C></Q> (this is a non-breakable space,
<Q><C>~</C></Q> in &LaTeX;) have a special meaning for &LaTeX; and
<Q><C>&</C></Q>,
<Q><C><</C></Q>,
<Q><C>></C></Q> have a special meaning for
HTML (and
XML).
In &GAPDoc; you can usually just
type these characters directly, it is
the task of the converter programs which translate to some output format
to take care of such special characters. The exceptions to this simple
rule are:
<List >
<Item>
& and < must be entered as <C>&</C> and
<C><</C> as explained in <Ref Subsect=
"XMLspchar"/>.
</Item>
<Item>The content of the &GAPDoc; elements <C><M></C>,
<C><Math></C> and <C><
Display></C> is &LaTeX; code,
see <Ref Sect=
"MathForm"/>.</Item>
<Item>The content of an <C><Alt></C>
element with <C>Only</C>
attribute contains code for the specified output
type, see
<Ref Subsect=
"Alt"/>.</Item>
</List>
Remark: In former versions of &GAPDoc; one had to use particular
entities for all the special characters mentioned above
(<C>&tamp;</C>, <C>&hash;</C>,
<C>$</C>, <C>&percent;</C>, <C>˜</C>,
<C>&bslash;</C>, <C>&obrace;</C>, <C>&cbrace;</C>,
<C>&uscore;</C>, <C>&circum;</C>, <C>&tlt;</C>, <C>&tgt;</C>).
These are no longer needed, but they are still defined for backwards
compatibility with older &GAPDoc; documents.
</Subsection>
<Subsection Label=
"GDformulae">
<Heading>Mathematical Formulae</Heading>
Mathematical formulae in &GAPDoc; are typed as in &LaTeX;. They must be
the content of one of three types of &GAPDoc; elements concerned with
mathematical formulae: <Q><C>Math</C></Q>, <Q><C>
Display</C></Q>, and
<Q><C>M</C></Q> (see Sections <Ref Subsect=
"Math"/> and <Ref
Subsect=
"M"/> for more details). The first two correspond to &LaTeX;
's
math mode and
display math mode. The last one is a special form of the
<Q><C>Math</C></Q>
element type, that imposes certain restrictions on
the content. On the other hand the content of an <Q><C>M</C></Q>
element
is processed in a well defined way for text terminal or
HTML output. The
<Q><C>
Display</C></Q>
element also has an attribute such that its
content is processed as in <Q><C>M</C></Q> elements.<P/>
Note that the content of these
element is &LaTeX; code, but
the special characters
<Q><C><</C></Q> and <Q><C>&</C></Q> for
XML must be entered via
the
entities described in <Ref Subsect=
"XMLspchar"/> or by using a
<C>
CDATA</C> statement, see <Ref Subsect=
"XMLcdata"/>.<P/>
</Subsection>
<Subsection Label=
"GDent">
<Heading>More
Entities</Heading>
In &GAPDoc; there are some more predefined
entities:
<Table Align=
"|l|l|">
<Caption>Predefined
Entities in the &GAPDoc; system</Caption>
<HorLine/>
<Row> <Item><C>&GAP;</C></Item> <Item>&GAP;</Item> </Row>
<HorLine/>
<Row> <Item><C>&GAPDoc;</C></Item> <Item>&GAPDoc;</Item> </Row>
<HorLine/>
<Row> <Item><C>&TeX;</C></Item> <Item>&TeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&LaTeX;</C></Item> <Item>&LaTeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&BibTeX;</C></Item> <Item>&BibTeX;</Item> </Row>
<HorLine/>
<Row> <Item><C>&MeatAxe;</C></Item> <Item>&MeatAxe;</Item> </Row>
<HorLine/>
<Row> <Item><C>&XGAP;</C></Item> <Item>&XGAP;</Item> </Row>
<HorLine/>
<Row> <Item><C>©right;</C></Item> <Item>©right;</Item> </Row>
<HorLine/>
<Row> <Item><C> </C></Item> <Item><Q> </Q></Item> </Row>
<HorLine/>
<Row> <Item><C>–</C></Item> <Item>–</Item> </Row>
<HorLine/>
</Table>
Here <C> </C> is a non-breakable space character.
<P/>
Additional
entities are defined for some mathematical symbols, see <Ref
Sect=
"MathForm"/> for more details.
<P/>
One can define further local
entities right inside the head (see <Ref
Subsect=
"XMLhead"/>) of a &GAPDoc;
XML document as in the following example.
<Listing
Type=
"Example">
<![
CDATA[<?
xml version=
"1.0" encoding=
"UTF-8"?>
<!
DOCTYPE Book SYSTEM
"gapdoc.dtd"
[ <!
ENTITY MyEntity
"some longish text possibly with markup">
]>]]>
</Listing>
These additional definitions go into the <C><!
DOCTYPE</C> tag in square
brackets. Such new
entities are used like this: <C>&MyEntity;</C> <P/>
</Subsection>
</Section>
</Chapter>