Tutorial: Validation of TEI files
=================================
*Trafilatura* can produce and validate XML documents according to the guidelines of the `Text Encoding Initiative `_ (XML-TEI).
Producing TEI files
--------------------
In Python:
.. code-block:: python
# load the necessary components
from trafilatura import fetch_url, extract
# download a file
downloaded = fetch_url('https://github.blog/2019-03-29-leader-spotlight-erin-spiceland/')
# extract information as XML TEI and validate the result
result = extract(downloaded, output_format='xmltei', tei_validation=True)
From the command line:
.. code-block:: bash
trafilatura --xmltei --validate --URL "https://github.blog/2019-03-29-leader-spotlight-erin-spiceland/"
Validating existing files
-------------------------
The following code returns `True` if a document is valid and outputs a message related to the first error impeding validation otherwise:
.. code-block:: python
# load the necessary components
from lxml import etree
from trafilatura.xml import validate_tei
# open a file and parse it
mytree = etree.parse('document-name.xml')
# validate it
validate_tei(mytree)
# returns True or an error message
For more information please refer to this blog post: `Validating TEI-XML documents with Python `_