<Chapter Label="HowEnter">
<Heading>How To Type a &GAPDoc; Document</Heading>
In this chapter we give a more formal description of what you need to start
to type documentation in &GAPDoc; XML format. Many details were already
explained by example in Section <Ref Sect="sec:3k+1expl"/> of the
introduction.<P/>
We do <E>not</E> answer the question <Q>How to <E>write</E> a &GAPDoc;
document?</Q> in this chapter. You can (hopefully) find an answer to
this question by studying the example in the introduction, see <Ref
Sect="sec:3k+1expl"/>, and learning about more details in the reference
Chapter <Ref Chap="DTD" />.<P/>
The definite source for all details of the official XML standard with useful
annotations is:<P/>
Although this document must be quite technical, it is surprisingly well
readable.<P/>
<Section Label="EnterXML">
<Heading>General XML Syntax</Heading>
We will now discuss the pieces of text which can occur in a general XML
document. We start with those pieces which do not contribute to the actual
content of the document.
<Subsection Label="XMLhead">
<Heading>Head of XML Document</Heading>
Each XML document should have a head which states that it is an XML document
in some encoding and which XML-defined language is used. In case of a
&GAPDoc; document this should always look as in the following example.
<Listing Type="Example">
<![CDATA[<?xmlversion="1.0"encoding="UTF-8"?>
<!DOCTYPE Book SYSTEM "gapdoc.dtd">]]>
</Listing>
See <Ref Subsect="XMLenc"/> for a remark on the <Q>encoding</Q>
statement.<P/>
(There may be local entity definitions inside the <C>DOCTYPE</C> statement,
see Subsection <Ref Subsect="GDent" /> below.)
</Subsection>
A <Q>comment</Q> in XML starts with the character sequence
<Q><C><!--</C></Q> and ends with the sequence <Q><C>--></C></Q>. Between
these sequences there must not be two adjacent dashes <Q><C>--</C></Q>.
A <Q>processing instruction</Q> in XML starts with the character sequence
<Q><C><?</C></Q> followed by a name (<Q><C>xml</C></Q> is only allowed
at the very beginning of the document to declare it being an XML document,
see <Ref Subsect="XMLhead"/>). After that any characters may follow, except
that the ending sequence <Q><C>?></C></Q> must not occur within the
processing instruction.
</Subsection>
<P/>
And now we turn to those parts of the document which contribute to its
actual content.
<Subsection Label="XMLnames">
<Heading>Names in XML and Whitespace</Heading>
A <Q>name</Q> in XML (used for element and attribute identifiers, see below)
must start with a letter (in the encoding of the document) or with a
colon <Q><C>:</C></Q> or underscore <Q><C>_</C></Q> character. The
following characters may also be digits, dots <Q><C>.</C></Q> or dashes
<Q><C>-</C></Q>.<P/>
This is a simplified description of the rules in the standard, which are
concerned with lots of unicode ranges to specify what a <Q>letter</Q>
is.<P/>
Sequences only consisting of the following characters are considered as
<E>whitespace</E>: blanks, tabs, carriage return characters and new line
characters.
The actual content of an XML document consists of <Q>elements</Q>.
An element has some <Q>content</Q> with a leading <Q>start tag</Q>
(<Ref Subsect="XMLstarttag"/>) and a trailing <Q>end tag</Q> (<Ref
Subsect="XMLendtag"/>). The content can contain further elements but they
must be properly nested. One can define elements whose content is always
empty, those elements can also be entered with a single combined tag (<Ref
Subsect="XMLcombtag"/>).
</Subsection>
A <Q>start-tag</Q> consists of a less-than-character <Q><C><</C></Q>
directly followed (without whitespace) by an element name (see <Ref
Subsect="XMLnames"/>), optional attributes, optional whitespace, and a
greater-than-character <Q><C>></C></Q>.<P/>
An <Q>attribute</Q> consists of some whitespace and then its name
followed by an equal sign <Q><C>=</C></Q> which is optionally enclosed by
whitespace, and the attribute value, which is enclosed either in single
or double quotes. The attribute value may not contain the type of
quote used as a delimiter or the character <Q><C><</C></Q>, the
character <Q><C>&</C></Q> may only appear to start an entity,
see <Ref Subsect="XMLent"/>. We describe
in <Ref Subsect="AttrValRules"/> how
to enter special characters in attribute values.<P/>
Note especially that no whitespace is allowed between the starting
<Q><C><</C></Q> character and the element name. The quotes around an
attribute value cannot be omitted. The names of elements and attributes are
<E>case sensitive</E>.
</Subsection>
An <Q>end tag</Q> consists of the two characters <Q><C></</C></Q>
directly followed by the element name, optional whitespace and a
greater-than-character <Q><C>></C></Q>.
</Subsection>
<Subsection Label="XMLcombtag">
<Heading>Combined Tags for Empty Elements</Heading>
Elements which always have empty content can be written with a single
tag. This looks like a start tag (see <Ref Subsect="XMLstarttag"/>)
<E>except</E> that the trailing greater-than-character <Q><C>></C></Q>
is substituted by the two character sequence <Q><C>/></C></Q>.
An <Q>entity</Q> in XML is a macro for some substitution text. There are two
types of entities. <P/>
A <Q>character entity</Q> can be used to specify characters in the encoding
of the document (can be useful for entering non-ASCII characters which you
cannot manage to type in directly). They are entered with a sequence
<Q><C>&#</C></Q>, directly followed by either some decimal digits
or an <Q><C>x</C></Q> and some hexadecimal digits, directly followed by a
semicolon <Q><C>;</C></Q>. Using such a character entity is just equivalent
to typing the corresponding character directly.<P/>
Then there are references to <Q>named entities</Q>. They are entered with an
ampersand character <Q><C>&</C></Q> directly followed by a name which
is directly followed by a semicolon <Q><C>;</C></Q>. Such entities must be
declared somewhere by giving a substitution text. This text is included in
the document and the document is parsed again afterwards. The exact rules
are a bit subtle but you probably want to use this only in simple cases.
Predefined entities for &GAPDoc; are described in <Ref Subsect="XMLspchar"/>
and <Ref Subsect="GDent"/>.<P/>
</Subsection>
<Subsection Label="XMLspchar">
<Heading>Special Characters in XML</Heading>
We have seen that the less-than-character <Q><C><</C></Q> and the
ampersand character <Q><C>&</C></Q> start a tag or entity reference in XML. To get these characters into the document text one has to use entity references, namely <Q><C><</C></Q> to get <Q><C><</C></Q>
and <Q><C>&</C></Q> to get <Q><C>&</C></Q>. Furthermore
<Q><C>></C></Q> must be used to get <Q><C>></C></Q> when the string
<Q><C>]]></C></Q> appears in element content (and not as delimiter of a
<C>CDATA</C> section explained below).<P/>
Another possibility is to use a <C>CDATA</C> statement explained
in <Ref Subsect="XMLcdata"/>.
</Subsection>
<Subsection Label="AttrValRules">
<Heading>Rules for Attribute Values</Heading>
Attribute values can contain entities which are substituted recursively.
But except for the entities < or a character entity it is not
allowed that a < character is introduced by the substitution (there is
no XML parsing for evaluating the attribute value, just entity substitutions).
</Subsection>
Pieces of text which contain many characters which can be
misinterpreted as markup can be enclosed by the character sequences
<Q><C><![CDATA[<![CDATA[]]></C></Q> and <Q><C>]]></C></Q>. Everything
between these sequences is considered as content of the document and is not
further interpreted as XML text. All the rules explained so far in this
section do <E>not apply</E> to such a part of the document. The only
document content which cannot be entered directly inside a <C>CDATA</C>
statement is the sequence <Q><C>]]></C></Q>. This can be entered as
<Q><C>]]></C></Q> outside the <C>CDATA</C> statement.
<Listing Type="Example">
A nesting of tags like <![CDATA[<a> <b> </a> </b>]]> is not allowed.
</Listing>
</Subsection>
<Subsection Label="XMLenc">
<Heading>Encoding of an XML Document</Heading>
We suggest to use the UTF-8 encoding for writing &GAPDoc; XML documents.
But the tools described in Chapter <Ref Chap="ch:conv" /> also work
with ASCII or the various ISO-8859-X encodings (ISO-8859-1 is also
called latin1 and covers most special characters for western European
languages).
</Subsection>
<Subsection Label="XMLvalid">
<Heading>Well Formed and Valid XML Documents</Heading>
We want to mention two further important words which are often used in the
context of XML documents. A piece of text becomes a <Q>well formed</Q> XML
document if all the formal rules described in this section are fulfilled.
<P/>
But this says nothing about the content of the document. To give
this content a meaning one needs a declaration of the element and
corresponding attribute names as well as of named entities which are
allowed. Furthermore there may be restrictions how such elements can be
nested. This <E>definition of an XML based markup language</E> is done in a
<Q>document type definition</Q>. An XML document which contains only
elements and entities declared in such a document type definition and obeys
the rules given there is called <Q>valid (with respect to this document type
definition)</Q>.<P/>
The main file of the &GAPDoc; package is <F>gapdoc.dtd</F>. This contains
such a definition of a markup language. We are not going to explain the
formal syntax rules for document type definitions in this section. But in
Chapter <Ref Chap="DTD"/> we will explain enough about it to understand
the file <F>gapdoc.dtd</F> and so the markup language defined there.
Here are some additional rules for writing &GAPDoc; XML documents.
<Subsection Label="otherspecchar">
<Heading>Other special characters</Heading>
As &GAPDoc; documents are used to produce &LaTeX; and HTML
documents, the question arises how to deal with characters with a
special meaning for other applications (for example
<Q><C>&</C></Q>,
<Q><C>#</C></Q>,
<Q><C>$</C></Q>,
<Q><C>%</C></Q>,
<Q><C>~</C></Q>,
<Q><C>\</C></Q>,
<Q><C>{</C></Q>,
<Q><C>}</C></Q>,
<Q><C>_</C></Q>,
<Q><C>^</C></Q>,
<Q><C> </C></Q> (this is a non-breakable space,
<Q><C>~</C></Q> in &LaTeX;) have a special meaning for &LaTeX; and
<Q><C>&</C></Q>,
<Q><C><</C></Q>,
<Q><C>></C></Q> have a special meaning for HTML (and XML).
In &GAPDoc; you can usually just type these characters directly, it is
the task of the converter programs which translate to some output format
to take care of such special characters. The exceptions to this simple
rule are:
<List >
<Item>
& and < must be entered as <C>&</C> and
<C><</C> as explained in <Ref Subsect="XMLspchar"/>.
</Item>
<Item>The content of the &GAPDoc; elements <C><M></C>,
<C><Math></C> and <C><Display></C> is &LaTeX; code,
see <Ref Sect="MathForm"/>.</Item>
<Item>The content of an <C><Alt></C> element with <C>Only</C>
attribute contains code for the specified output type, see
<Ref Subsect="Alt"/>.</Item>
</List>
Remark: In former versions of &GAPDoc; one had to use particular entities for all the special characters mentioned above
(<C>&tamp;</C>, <C>&hash;</C>,
<C>$</C>, <C>&percent;</C>, <C>˜</C>,
<C>&bslash;</C>, <C>&obrace;</C>, <C>&cbrace;</C>,
<C>&uscore;</C>, <C>&circum;</C>, <C>&tlt;</C>, <C>&tgt;</C>).
These are no longer needed, but they are still defined for backwards
compatibility with older &GAPDoc; documents.
Mathematical formulae in &GAPDoc; are typed as in &LaTeX;. They must be
the content of one of three types of &GAPDoc; elements concerned with
mathematical formulae: <Q><C>Math</C></Q>, <Q><C>Display</C></Q>, and
<Q><C>M</C></Q> (see Sections <Ref Subsect="Math"/> and <Ref
Subsect="M"/> for more details). The first two correspond to &LaTeX;'s
math mode and display math mode. The last one is a special form of the
<Q><C>Math</C></Q> elementtype, that imposes certain restrictions on
the content. On the other hand the content of an <Q><C>M</C></Q> element
is processed in a well defined way for text terminal or HTML output. The
<Q><C>Display</C></Q> element also has an attribute such that its
content is processed as in <Q><C>M</C></Q> elements.<P/>
Note that the content of these element is &LaTeX; code, but
the special characters
<Q><C><</C></Q> and <Q><C>&</C></Q> for XML must be entered via
the entities described in <Ref Subsect="XMLspchar"/> or by using a
<C>CDATA</C> statement, see <Ref Subsect="XMLcdata"/>.<P/>
Here <C> </C> is a non-breakable space character.
<P/>
Additional entities are defined for some mathematical symbols, see <Ref
Sect="MathForm"/> for more details.
<P/>
One can define further local entities right inside the head (see <Ref
Subsect="XMLhead"/>) of a &GAPDoc; XML document as in the following example.
Die Informationen auf dieser Webseite wurden
nach bestem Wissen sorgfältig zusammengestellt. Es wird jedoch weder Vollständigkeit, noch Richtigkeit,
noch Qualität der bereit gestellten Informationen zugesichert.
Bemerkung:
Die farbliche Syntaxdarstellung ist noch experimentell.