6. Pipelines, Processes, Docs, and Words

Tip

See notebook https://github.com/cltk/cltk/blob/master/notebooks/CLTK%20data%20types.ipynb for a detailed walkthrough of CLTK data types.

The CLTK contains four important, native data types:

digraph Pipeline {
  fontname = "Bitstream Vera Sans"
  fontsize = 8

  node [
    fontname = "Bitstream Vera Sans"
    fontsize = 8
    shape = "record"
  ]

  edge [
    arrowtail = "empty"
  ]

  Pipeline [
    label = "{Pipeline|\l| run(): Doc}"
  ]

  LatinPipeline [
    label = "{LatinPipeline|\l|processes: [LatinStanzaProcess,\l LatinEmbeddingsProcess,\l StopsProcess,\l LatinNERProcess]}"
  ]

  GreekPipeline [
    label = "{GreekPipeline|\l|processes: [GreekStanzaProcess,\l GreekEmbeddingsProcess,\l StopsProcess,\l GreekNERProcess]}"
  ]

  EtcPipeline [
    label = "{…|\l|processes: List[Process]}"
  ]

  Pipeline -> LatinPipeline [dir=back]
  Pipeline -> GreekPipeline [dir=back]
  Pipeline -> EtcPipeline [dir=back]
}

Inheritance of Pipeline class