6. Pipelines, Processes, Docs, and Words


See notebook https://github.com/cltk/cltk/blob/master/notebooks/CLTK%20data%20types.ipynb for a detailed walkthrough of CLTK data types.

The CLTK contains four important, native data types:

digraph Pipeline {
  fontname = "Bitstream Vera Sans"
  fontsize = 8

  node [
    fontname = "Bitstream Vera Sans"
    fontsize = 8
    shape = "record"

  edge [
    arrowtail = "empty"

  Pipeline [
    label = "{Pipeline|\l| run(): Doc}"

  LatinPipeline [
    label = "{LatinPipeline|\l|processes: [LatinStanzaProcess,\l LatinEmbeddingsProcess,\l StopsProcess,\l LatinNERProcess]}"

  GreekPipeline [
    label = "{GreekPipeline|\l|processes: [GreekStanzaProcess,\l GreekEmbeddingsProcess,\l StopsProcess,\l GreekNERProcess]}"

  EtcPipeline [
    label = "{…|\l|processes: List[Process]}"

  Pipeline -> LatinPipeline [dir=back]
  Pipeline -> GreekPipeline [dir=back]
  Pipeline -> EtcPipeline [dir=back]

Inheritance of Pipeline class