8.1.7. cltk.languages package¶

Init for cltk.languages.

8.1.7.1. Submodules¶

8.1.7.2. cltk.languages.example_texts module¶

Example paragraphs of text to be reused within the codebase for testing or demonstrating code.

TODO: Get longer Akkadian text

>>> from cltk.languages.example_texts import get_example_text
>>> get_example_text("grc")[:66]
'ὅτι μὲν ὑμεῖς, ὦ ἄνδρες Ἀθηναῖοι, πεπόνθατε ὑπὸ τῶν ἐμῶν κατηγόρων'
>>> get_example_text("lat")[:67]
'Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae'
>>> get_example_text("non")[:50]
'Gylfi konungr réð þar löndum er nú heitir Svíþjóð.'

cltk.languages.example_texts.get_example_text(iso_code)[source]¶

Take in search term of usual language name and find ISO code.

>>> from cltk.languages.example_texts import get_example_text
>>> get_example_text("got")[:25]
'swa liuhtjai liuhaþ izwar'
>>> get_example_text("zkz")
Traceback (most recent call last):
  ...
cltk.core.exceptions.UnimplementedAlgorithmError: Example text unavailable for ISO 639-3 code 'zkz'.
>>> get_example_text("xxx")
Traceback (most recent call last):
  ...
cltk.core.exceptions.UnknownLanguageError: Unknown ISO language code 'xxx'.

Return type:: str

8.1.7.3. cltk.languages.glottolog module¶

Module for mapping ISO 639-3 to Glottolog languages and language names. The key is the ISO code and the value, being a Language object, contains information from both the Glottolog and ISO data sets. The contents of this module were generated by scripts/make_glottolog_languages.py.

ISO 639-3 is an international standard for language languages with an aim to cover all known natural languages. The extended language coverage was based primarily on the language languages published by SIL International, which is now the registration authority for ISO 639-3. About: https://iso639-3.sil.org/.

Glottolog is a project run by the Max Planck Institute for the Science of Human History. The website contains languages for languages as well as reconstructions of language families. About: http://glottolog.org/. Data of Glottolog 4.0 is published under the following license: https://creativecommons.org/licenses/by/4.0/.

Haspelmath, Martin & Forkel, Robert & Hammarström, Harald. 2019. Glottolog 4.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://glottolog.org, Accessed on 2019-10-02.)

>>> from cltk.languages.utils import get_lang
>>> akkadian = get_lang("akk")
>>> akkadian
Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[])
>>> akkadian.name
'Akkadian'
>>> akkadian.glottolog_id
'akka1240'
>>> akkadian.latitude
33.1
>>> akkadian.longitude
44.1
>>> akkadian.family_id
'afro1255'
>>> akkadian.parent_id
'east2678'
>>> len(LANGUAGES)
219

cltk.languages.glottolog._resort_languages_list(languages_list)[source]¶

Pick up the LANGUAGES global and return alphabetized according to a language’s common name.

>>> iso_dict_keys = _resort_languages_list(LANGUAGES)
>>> list(iso_dict_keys)[:10]
['xae', 'xag', 'akk', 'xln', 'grc', 'hbo', 'xlg', 'xmk', 'xna', 'xzp']

Return type:: OrderedDict[str, Language]

8.1.7.4. cltk.languages.pipelines module¶

Default processing pipelines for languages. The purpose of these dataclasses is to represent:

the types of NLP processes that the CLTK can do
the order in which processes are to be executed
specifying what downstream features a particular implemented process requires

class cltk.languages.pipelines.AkkadianPipeline(description='Pipeline for the Akkadian language.', processes=<factory>, language=Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[]))[source]¶

8.1.7. cltk.languages package¶

8.1.7.1. Submodules¶

8.1.7.2. cltk.languages.example_texts module¶

8.1.7.3. cltk.languages.glottolog module¶

8.1.7.4. cltk.languages.pipelines module¶

8.1.7.5. cltk.languages.utils module¶