Tutorials# Tutorial: Gathering a custom web corpus Get your system up and running Content discovery Link filtering Process a list of links Tutorial: From a list of links to a frequency list Get your system up and running Process a list of links Build frequency lists Tutorial: Validation of TEI files Producing TEI files Validating existing files Tutorial: DWDS-Korpusdaten reproduzieren Ziel Von einer Abfrage zur Einsicht der Quellen Interesse und Gestaltungsmöglichkeiten Download und Verarbeitung der Daten Blog posts# Extracting the main text content from web pages using Python Validating TEI-XML documents with Python Evaluating scraping and text extraction tools for Python Filtering links to gather texts on the web Using sitemaps to crawl websites on the command-line Using RSS and Atom feeds to collect web pages with Python Web scraping with R: Text and metadata extraction Web scraping with Trafilatura just got faster Videos# Youtube playlistWeb scraping how-tos and tutorials. External resources# GLAM-Workbench Harvesting collections of text from archived web pages Compare two versions of an archived web page User Ethics & Legal Concerns Download von Web-Daten & Daten aufbereiten und verwalten (Tutorials in German by Noah Bubenhofer)