Content¶
Initially, the course will build up a common understanding of XML (specifically the XML Infoset) and some of its applications. The main theme then deals with efficient processing of XML (and a bit of HTML) in Python.
The presented tool set includes the ElementTree library that comes with Python since version 2.5, and the freely available lxml library that combines a compatible Python API with a large set of additional XML features.
Introduction to XML¶
XML and the XML Infoset
XML Namespaces
Dealing with XML formats
Fast XML processing¶
Parsing and serializing XML files
Extracting information from XML documents (tree navigation, XPath, CSS selectors)
Processing and transforming XML documents in main memory
Generating XML documents
Stream processing of large XML files that do not fit into main memory
Advanced topics¶
Creating proprietary XML formats
Validating XML formats with schema languages (e.g. RelaxNG, Schematron)
Binding XML documents to Python objects (lxml.objectify)
Creating application specific XML APIs with lxml
Introduction to stylesheet transformations (XSLT processing)
Note that the advanced topics are subject to time constraints. A choice will be made based on the interest of the participants.