8.1.9. cltk.lexicon package

8.1.9.1. Submodules

8.1.9.2. cltk.lexicon.lat module

Code for querying Latin language dictionaries/lexicons.

class cltk.lexicon.lat.LatinLewisLexicon(interactive=True)[source]

Bases: object

Access a digital form of Charlton T. Lewis’s An Elementary Latin Dictionary (1890).

lookup(lemma)[source]

Perform match of a lemma against headwords. If more than one match, then return the concatenated entries. For example:

>>> from cltk.lexicon.lat import LatinLewisLexicon
>>> lll = LatinLewisLexicon(interactive=False)
>>> lll.lookup("clemens")[:50]
'clēmēns entis (abl. -tī; rarely -te, L.), adj. wit'
>>> all(word in lll.lookup("levis") for word in ["levis","lēvis"]) # Test for concatenated entries
True
>>> lll.lookup("omnia")
''
>>> lll.lookup(".")
''
>>> lll.lookup("123")
''
>>> lll.lookup("175.")
''
>>> lll.lookup("(") # Test for regex special character
''
Return type:

str

_load_entries()[source]

Read the yaml file of the lexion.

8.1.9.3. cltk.lexicon.non module

Code for querying Old Norse language dictionaries/lexicons.

class cltk.lexicon.non.OldNorseZoegaLexicon(interactive=True)[source]

Bases: object

Access a digital form of Zoëga’s dictionary.

lookup(lemma)[source]

Perform match of a lemma against headwords. This is case sensitive. If more than one match, then return the concatenated entries. For example:

>>> from cltk.lexicon.non import OldNorseZoegaLexicon
>>> onzl = OldNorseZoegaLexicon(interactive=False)
>>> onzl.lookup("sonr")
'(gen. sonar, dat. syni and søni; pl. synir, sønir; ace. sonu and syni), m. son.'
Return type:

str

_load_entries()[source]

Read the yaml file of the lexion.

8.1.9.4. cltk.lexicon.processes module

Processes for dictionary lookup.

class cltk.lexicon.processes.LexiconProcess(language=None)[source]

Bases: Process

To be inherited for each language’s dictionary declarations.

Example: LexiconProcess -> LatinLexiconProcess

>>> from cltk.lexicon.processes import LexiconProcess
>>> from cltk.lemmatize.processes import LemmatizationProcess
>>> from cltk.core.data_types import Process
>>> issubclass(LexiconProcess, Process)
True
language: str = None
algorithm
run(input_doc)[source]
Return type:

Doc

class cltk.lexicon.processes.LatinLexiconProcess(language=None)[source]

Bases: LexiconProcess

The default Latin dictionary lookup algorithm.

>>> from cltk.lexicon.processes import LexiconProcess
>>> from cltk.core.data_types import Process, Pipeline
>>> from cltk.tokenizers import LatinTokenizationProcess
>>> from cltk.lemmatize.processes import LatinLemmatizationProcess
>>> from cltk.languages.utils import get_lang
>>> from cltk.languages.example_texts import get_example_text
>>> from cltk.nlp import NLP
>>> pipe = Pipeline(description="A custom Latin pipeline",     processes=[LatinTokenizationProcess, LatinLemmatizationProcess, LatinLexiconProcess],     language=get_lang("lat"))
>>> nlp = NLP(language='lat', custom_pipeline=pipe, suppress_banner=True)
>>> cltk_doc = nlp.analyze(text=get_example_text("lat"))
>>> [word.definition[:10] for word in cltk_doc.words][:5]
['', 'est\n\n\n see', 'omnis e (o', '', 'in  old in']
description = 'Dictionary lookup process for Latin'
language: str = 'lat'
algorithm
class cltk.lexicon.processes.OldNorseLexiconProcess(language=None)[source]

Bases: LexiconProcess

The default Latin dictionary lookup algorithm.

>>> from cltk.core.data_types import Process, Pipeline
>>> from cltk.tokenizers import OldNorseTokenizationProcess
>>> from cltk.languages.utils import get_lang
>>> from cltk.languages.example_texts import get_example_text
>>> from cltk.nlp import NLP
>>> pipe = Pipeline(description="A custom Old Norse pipeline",     processes=[OldNorseTokenizationProcess, OldNorseLexiconProcess],     language=get_lang("non"))
>>> nlp = NLP(language='non', custom_pipeline=pipe, suppress_banner=True)
>>> cltk_doc = nlp.analyze(text=get_example_text("non"))

#>>> [word.definition[:10] for word in cltk_doc.words][:5] # TODO check this #[‘’, ‘(-s, -ar),’, ‘’, ‘adv.

  1. th’, ‘’]

description = 'Dictionary lookup process for Old Norse'
language: str = 'non'
algorithm