module Xml_lexer:Simple XML lexersig
..end
ocamllex
lexer for XML files. It only
supports the most basic features of the XML specification.
The lexer altogether ignores the following 'events': comments, processing instructions, XML prolog and doctype declaration.
The predefined entities (&
, <
, etc.) are supported. The
replacement text for other entities whose entity value consist of
character data can be provided to the lexer (see
Xml_lexer.entities
). Internal entities declarations are not
taken into account (the lexer just skips the doctype declaration).
CDATA
sections and character references are supported.
See Xml_lexer.strip_ws
about whitespace handling.
type
error =
| |
Illegal_character of |
| |
Bad_entity of |
| |
Unterminated of |
| |
Tag_expected |
| |
Attribute_expected |
| |
Other of |
val error_string : error -> string
exception Error of error * int
int
argument indicates the character position in
the buffer. Note that some non-conforming XML documents might not
trigger an error.type
token =
| |
Tag of |
(* | Tag (name, attributes, empty) denotes an opening tag
with the specified name and attributes . If empty ,
then the tag ended in "/>", meaning that it has no
sub-elements. | *) |
| |
Chars of |
(* | Some text between the tags | *) |
| |
Endtag of |
(* | A closing tag | *) |
| |
EOF |
(* | End of input | *) |
val strip_ws : bool Pervasives.ref
strip_ws
is true
(the default),
whitespaces next to a tag are ignored. Character data consisting
only of whitespaces is thus suppressed (i.e. Chars ""
tokens are
skipped).val entities : (string * string) list Pervasives.ref
["amp", "&"; "lt", "<" ...]
).val token : Lexing.lexbuf -> token
Error
in case of an invalid XML document