Skip to main content
Ctrl+K
trafilatura 1.6.4 documentation - Home
  • Installation
  • Usage
  • Tutorials
  • Evaluation
  • Core functions
    • Uses & citations
    • Background
    • Blog
  • GitHub
  • Twitter
  • Installation
  • Usage
  • Tutorials
  • Evaluation
  • Core functions
    • Uses & citations
    • Background
    • Blog
  • GitHub
  • Twitter

Section Navigation

  • Compendium: Web texts in linguistics and humanities
  • Finding sources for web corpora
  • Working with corpus data
  • Background

Background#

The pages below provide background information on scientific approaches to web data collection and processing, corpus linguistics, digital humanities, and natural language processing.

  • Compendium: Web texts in linguistics and humanities
    • Web corpora as scientific objects
    • Corpus types and resulting methods
    • Corpus construction steps
    • Methodological issues
    • References
  • Finding sources for web corpora
    • From link lists to web corpora
    • Existing resources
    • Search engines
    • Selecting random documents from the Web
    • Social networks
    • Remarks
    • References
  • Working with corpus data
    • Generic solutions in Python
    • Formats and software used in corpus linguistics
    • Generic NLP solutions

previous

Uses & citations

next

Compendium: Web texts in linguistics and humanities

Show Source

© Copyright 2024, Adrien Barbaresi.

Built with the PyData Sphinx Theme 0.15.2.