module Xml_lexer:Simple XML lexersig..end
ocamllex lexer for XML files. It only
supports the most basic features of the XML specification.
The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration.
The predefined entities (&, <, etc.) are supported. The
replacement text for other entities whose entity value consist of
character data can be provided to the lexer (see
Xml_lexer.entities). Internal entities declarations are not
taken into account (the lexer just skips the doctype declaration).
CDATA sections and character references are supported.
See Xml_lexer.strip_ws about whitespace handling.
type error =
| |
Illegal_character of |
| |
Bad_entity of |
| |
Unterminated of |
| |
Tag_expected |
| |
Attribute_expected |
| |
Other of |
val error_string : error -> stringexception Error of error * int
int argument indicates the character position in
the buffer. Note that some non-conforming XML documents might not
trigger an error.type token =
| |
Tag of |
(* | Tag (name, attributes, empty) denotes an opening tag
with the specified name and attributes. If empty,
then the tag ended in "/>", meaning that it has no
sub-elements. | *) |
| |
Chars of |
(* | Some text between the tags | *) |
| |
Endtag of |
(* | A closing tag | *) |
| |
EOF |
(* | End of input | *) |
val strip_ws : bool Pervasives.refstrip_ws is true (the default),
whitespaces next to a tag are ignored. Character data consisting
only of whitespaces is thus suppressed (i.e. Chars "" tokens are
skipped).val entities : (string * string) list Pervasives.ref ["amp", "&"; "lt", "<" ...] ).val token : Lexing.lexbuf -> tokenError in case of an invalid XML document