8.1.7. cltk.languages package¶
Init for cltk.languages
.
8.1.7.1. Submodules¶
8.1.7.2. cltk.languages.example_texts module¶
Example paragraphs of text to be reused within the codebase for testing or demonstrating code.
TODO: Get longer Akkadian text
>>> from cltk.languages.example_texts import get_example_text
>>> get_example_text("grc")[:66]
'ὅτι μὲν ὑμεῖς, ὦ ἄνδρες Ἀθηναῖοι, πεπόνθατε ὑπὸ τῶν ἐμῶν κατηγόρων'
>>> get_example_text("lat")[:67]
'Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae'
>>> get_example_text("non")[:50]
'Gylfi konungr réð þar löndum er nú heitir Svíþjóð.'
- cltk.languages.example_texts.get_example_text(iso_code)[source]¶
Take in search term of usual language name and find ISO code.
>>> from cltk.languages.example_texts import get_example_text >>> get_example_text("got")[:25] 'swa liuhtjai liuhaþ izwar' >>> get_example_text("zkz") Traceback (most recent call last): ... cltk.core.exceptions.UnimplementedAlgorithmError: Example text unavailable for ISO 639-3 code 'zkz'. >>> get_example_text("xxx") Traceback (most recent call last): ... cltk.core.exceptions.UnknownLanguageError: Unknown ISO language code 'xxx'.
- Return type:
str
8.1.7.3. cltk.languages.glottolog module¶
Module for mapping ISO 639-3 to Glottolog languages and language names.
The key is the ISO code and the value, being a Language
object, contains
information from both the Glottolog and ISO data sets. The contents of this
module were generated by scripts/make_glottolog_languages.py
.
ISO 639-3 is an international standard for language languages with an aim to cover all known natural languages. The extended language coverage was based primarily on the language languages published by SIL International, which is now the registration authority for ISO 639-3. About: https://iso639-3.sil.org/.
Glottolog is a project run by the Max Planck Institute for the Science of Human History. The website contains languages for languages as well as reconstructions of language families. About: http://glottolog.org/. Data of Glottolog 4.0 is published under the following license: https://creativecommons.org/licenses/by/4.0/.
Haspelmath, Martin & Forkel, Robert & Hammarström, Harald. 2019. Glottolog 4.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://glottolog.org, Accessed on 2019-10-02.)
>>> from cltk.languages.utils import get_lang
>>> akkadian = get_lang("akk")
>>> akkadian
Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[])
>>> akkadian.name
'Akkadian'
>>> akkadian.glottolog_id
'akka1240'
>>> akkadian.latitude
33.1
>>> akkadian.longitude
44.1
>>> akkadian.family_id
'afro1255'
>>> akkadian.parent_id
'east2678'
>>> len(LANGUAGES)
219
- cltk.languages.glottolog._resort_languages_list(languages_list)[source]¶
Pick up the LANGUAGES global and return alphabetized according to a language’s common name.
>>> iso_dict_keys = _resort_languages_list(LANGUAGES) >>> list(iso_dict_keys)[:10] ['xae', 'xag', 'akk', 'xln', 'grc', 'hbo', 'xlg', 'xmk', 'xna', 'xzp']
- Return type:
OrderedDict
[str
,Language
]
8.1.7.4. cltk.languages.pipelines module¶
Default processing pipelines for languages. The purpose of these dataclasses is to represent:
the types of NLP processes that the CLTK can do
the order in which processes are to be executed
specifying what downstream features a particular implemented process requires
- class cltk.languages.pipelines.AkkadianPipeline(description='Pipeline for the Akkadian language.', processes=<factory>, language=Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Akkadian.>>> from cltk.languages.pipelines import AkkadianPipeline >>> a_pipeline = AkkadianPipeline() >>> a_pipeline.description 'Pipeline for the Akkadian language.' >>> a_pipeline.language Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[]) >>> a_pipeline.language.name 'Akkadian' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.AkkadianTokenizationProcess'>
- description: str = 'Pipeline for the Akkadian language.'¶
- class cltk.languages.pipelines.ArabicPipeline(description='Pipeline for the Arabic language', processes=<factory>, language=Language(name='Standard Arabic', glottolog_id='stan1318', latitude=27.9625, longitude=43.8525, family_id='afro1255', parent_id='arab1395', level='language', iso_639_3_code='arb', type='', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Arabic.>>> from cltk.languages.pipelines import ArabicPipeline >>> a_pipeline = ArabicPipeline() >>> a_pipeline.description 'Pipeline for the Arabic language' >>> a_pipeline.language Language(name='Standard Arabic', glottolog_id='stan1318', latitude=27.9625, longitude=43.8525, family_id='afro1255', parent_id='arab1395', level='language', iso_639_3_code='arb', type='', dates=[]) >>> a_pipeline.language.name 'Standard Arabic' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.ArabicTokenizationProcess'>
- description: str = 'Pipeline for the Arabic language'¶
- class cltk.languages.pipelines.AramaicPipeline(description='Pipeline for the Aramaic language', processes=<factory>, language=Language(name='Official Aramaic (700-300 BCE)', glottolog_id='', latitude=0.0, longitude=0.0, family_id='', parent_id='', level='', iso_639_3_code='arc', type='a', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Aramaic.TODO: Confirm with specialist what encodings should be expected. TODO: Replace
ArabicTokenizationProcess
with a multilingual one or a specific Aramaic.>>> from cltk.languages.pipelines import AramaicPipeline >>> a_pipeline = AramaicPipeline() >>> a_pipeline.description 'Pipeline for the Aramaic language' >>> a_pipeline.language Language(name='Official Aramaic (700-300 BCE)', glottolog_id='', latitude=0.0, longitude=0.0, family_id='', parent_id='', level='', iso_639_3_code='arc', type='a', dates=[]) >>> a_pipeline.language.name 'Official Aramaic (700-300 BCE)' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.ArabicTokenizationProcess'>
- description: str = 'Pipeline for the Aramaic language'¶
- class cltk.languages.pipelines.ChinesePipeline(description='Pipeline for the Classical Chinese language', processes=<factory>, language=Language(name='Literary Chinese', glottolog_id='lite1248', latitude=0.0, longitude=0.0, family_id='sino1245', parent_id='clas1255', level='language', iso_639_3_code='lzh', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Classical Chinese.>>> from cltk.languages.pipelines import ChinesePipeline >>> a_pipeline = ChinesePipeline() >>> a_pipeline.description 'Pipeline for the Classical Chinese language' >>> a_pipeline.language Language(name='Literary Chinese', glottolog_id='lite1248', latitude=0.0, longitude=0.0, family_id='sino1245', parent_id='clas1255', level='language', iso_639_3_code='lzh', type='h', dates=[]) >>> a_pipeline.language.name 'Literary Chinese' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.ChineseStanzaProcess'>
- description: str = 'Pipeline for the Classical Chinese language'¶
- class cltk.languages.pipelines.CopticPipeline(description='Pipeline for the Coptic language', processes=<factory>, language=Language(name='Coptic', glottolog_id='copt1239', latitude=29.472, longitude=31.2053, family_id='afro1255', parent_id='egyp1245', level='language', iso_639_3_code='cop', type='', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Coptic.>>> from cltk.languages.pipelines import CopticPipeline >>> a_pipeline = CopticPipeline() >>> a_pipeline.description 'Pipeline for the Coptic language' >>> a_pipeline.language Language(name='Coptic', glottolog_id='copt1239', latitude=29.472, longitude=31.2053, family_id='afro1255', parent_id='egyp1245', level='language', iso_639_3_code='cop', type='', dates=[]) >>> a_pipeline.language.name 'Coptic' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.CopticStanzaProcess'>
- description: str = 'Pipeline for the Coptic language'¶
- class cltk.languages.pipelines.GothicPipeline(description='Pipeline for the Gothic language', processes=<factory>, language=Language(name='Gothic', glottolog_id='goth1244', latitude=46.9304, longitude=29.9786, family_id='indo1319', parent_id='east2805', level='language', iso_639_3_code='got', type='a', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Gothic.>>> from cltk.languages.pipelines import GothicPipeline >>> a_pipeline = GothicPipeline() >>> a_pipeline.description 'Pipeline for the Gothic language' >>> a_pipeline.language Language(name='Gothic', glottolog_id='goth1244', latitude=46.9304, longitude=29.9786, family_id='indo1319', parent_id='east2805', level='language', iso_639_3_code='got', type='a', dates=[]) >>> a_pipeline.language.name 'Gothic' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.GothicStanzaProcess'> >>> a_pipeline.processes[1] <class 'cltk.embeddings.processes.GothicEmbeddingsProcess'>
- description: str = 'Pipeline for the Gothic language'¶
- class cltk.languages.pipelines.GreekPipeline(description='Pipeline for the Greek language', processes=<factory>, language=Language(name='Ancient Greek', glottolog_id='anci1242', latitude=39.8155, longitude=21.9129, family_id='indo1319', parent_id='east2798', level='language', iso_639_3_code='grc', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Ancient Greek.>>> from cltk.languages.pipelines import GreekPipeline >>> a_pipeline = GreekPipeline() >>> a_pipeline.description 'Pipeline for the Greek language' >>> a_pipeline.language Language(name='Ancient Greek', glottolog_id='anci1242', latitude=39.8155, longitude=21.9129, family_id='indo1319', parent_id='east2798', level='language', iso_639_3_code='grc', type='h', dates=[]) >>> a_pipeline.language.name 'Ancient Greek' >>> a_pipeline.processes[0] <class 'cltk.alphabet.processes.GreekNormalizeProcess'>
- description: str = 'Pipeline for the Greek language'¶
- class cltk.languages.pipelines.HindiPipeline(description='Pipeline for the Hindi language.', processes=<factory>, language=Language(name='Hindi', glottolog_id='hind1269', latitude=25.0, longitude=77.0, family_id='indo1319', parent_id='hind1270', level='language', iso_639_3_code='hin', type='', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Hindi.>>> from cltk.languages.pipelines import HindiPipeline >>> a_pipeline = HindiPipeline() >>> a_pipeline.description 'Pipeline for the Hindi language.' >>> a_pipeline.language Language(name='Hindi', glottolog_id='hind1269', latitude=25.0, longitude=77.0, family_id='indo1319', parent_id='hind1270', level='language', iso_639_3_code='hin', type='', dates=[]) >>> a_pipeline.language.name 'Hindi' >>> a_pipeline.processes[1] <class 'cltk.stops.processes.StopsProcess'>
- description: str = 'Pipeline for the Hindi language.'¶
- class cltk.languages.pipelines.LatinPipeline(description='Pipeline for the Latin language', processes=<factory>, language=Language(name='Latin', glottolog_id='lati1261', latitude=41.9026, longitude=12.4502, family_id='indo1319', parent_id='impe1234', level='language', iso_639_3_code='lat', type='a', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Latin.TODO: Add stopword annotation for all relevant pipelines.
>>> from cltk.languages.pipelines import LatinPipeline >>> a_pipeline = LatinPipeline() >>> a_pipeline.description 'Pipeline for the Latin language' >>> a_pipeline.language Language(name='Latin', glottolog_id='lati1261', latitude=41.9026, longitude=12.4502, family_id='indo1319', parent_id='impe1234', level='language', iso_639_3_code='lat', type='a', dates=[]) >>> a_pipeline.language.name 'Latin' >>> a_pipeline.processes[0] <class 'cltk.alphabet.processes.LatinNormalizeProcess'>
- description: str = 'Pipeline for the Latin language'¶
- class cltk.languages.pipelines.MiddleHighGermanPipeline(description='Pipeline for the Middle High German language.', processes=<factory>, language=Language(name='Middle High German', glottolog_id='midd1343', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='midd1349', level='language', iso_639_3_code='gmh', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Middle High German.>>> a_pipeline = MiddleHighGermanPipeline() >>> a_pipeline.description 'Pipeline for the Middle High German language.' >>> a_pipeline.language Language(name='Middle High German', glottolog_id='midd1343', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='midd1349', level='language', iso_639_3_code='gmh', type='h', dates=[]) >>> a_pipeline.language.name 'Middle High German' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MiddleHighGermanTokenizationProcess'>
- description: str = 'Pipeline for the Middle High German language.'¶
- class cltk.languages.pipelines.MiddleEnglishPipeline(description='Pipeline for the Middle English language', processes=<factory>, language=Language(name='Middle English', glottolog_id='midd1317', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='merc1242', level='language', iso_639_3_code='enm', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Middle English.TODO: Figure out whether this the dedicated tokenizer is good enough or necessary; we have stanza for Old English, which might be able to tokenizer fine.
>>> from cltk.languages.pipelines import MiddleEnglishPipeline >>> a_pipeline = MiddleEnglishPipeline() >>> a_pipeline.description 'Pipeline for the Middle English language' >>> a_pipeline.language Language(name='Middle English', glottolog_id='midd1317', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='merc1242', level='language', iso_639_3_code='enm', type='h', dates=[]) >>> a_pipeline.language.name 'Middle English' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MiddleEnglishTokenizationProcess'> >>> from cltk import NLP >>> middle_english_nlp = NLP(language="enm", suppress_banner=True) >>> from cltk.languages.example_texts import get_example_text >>> doc = middle_english_nlp.analyze(get_example_text("enm")) >>> doc[2].embedding.shape (50,)
- description: str = 'Pipeline for the Middle English language'¶
- class cltk.languages.pipelines.MiddleFrenchPipeline(description='Pipeline for the Middle French language', processes=<factory>, language=Language(name='Middle French', glottolog_id='midd1316', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='stan1290', level='dialect', iso_639_3_code='frm', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Middle French.TODO: Figure out whether this the dedicated tokenizer is good enough or necessary; we have stanza for Old French, which might be able to tokenizer fine.
>>> from cltk.languages.pipelines import MiddleFrenchPipeline >>> a_pipeline = MiddleFrenchPipeline() >>> a_pipeline.description 'Pipeline for the Middle French language' >>> a_pipeline.language Language(name='Middle French', glottolog_id='midd1316', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='stan1290', level='dialect', iso_639_3_code='frm', type='h', dates=[]) >>> a_pipeline.language.name 'Middle French' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MiddleFrenchTokenizationProcess'>
- description: str = 'Pipeline for the Middle French language'¶
- class cltk.languages.pipelines.OCSPipeline(description='Pipeline for the Old Church Slavonic language', processes=<factory>, language=Language(name='Church Slavic', glottolog_id='chur1257', latitude=43.7171, longitude=22.8442, family_id='indo1319', parent_id='east2269', level='language', iso_639_3_code='chu', type='a', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Old Church Slavonic.>>> from cltk.languages.pipelines import OCSPipeline >>> a_pipeline = OCSPipeline() >>> a_pipeline.description 'Pipeline for the Old Church Slavonic language' >>> a_pipeline.language Language(name='Church Slavic', glottolog_id='chur1257', latitude=43.7171, longitude=22.8442, family_id='indo1319', parent_id='east2269', level='language', iso_639_3_code='chu', type='a', dates=[]) >>> a_pipeline.language.name 'Church Slavic' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.OCSStanzaProcess'>
- description: str = 'Pipeline for the Old Church Slavonic language'¶
- class cltk.languages.pipelines.OldEnglishPipeline(description='Pipeline for the Old English language', processes=<factory>, language=Language(name='Old English (ca. 450-1100)', glottolog_id='olde1238', latitude=51.06, longitude=-1.31, family_id='indo1319', parent_id='angl1265', level='language', iso_639_3_code='ang', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Old English.>>> from cltk.languages.pipelines import OldEnglishPipeline >>> a_pipeline = OldEnglishPipeline() >>> a_pipeline.description 'Pipeline for the Old English language' >>> a_pipeline.language Language(name='Old English (ca. 450-1100)', glottolog_id='olde1238', latitude=51.06, longitude=-1.31, family_id='indo1319', parent_id='angl1265', level='language', iso_639_3_code='ang', type='h', dates=[]) >>> a_pipeline.language.name 'Old English (ca. 450-1100)' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MultilingualTokenizationProcess'>
- description: str = 'Pipeline for the Old English language'¶
- class cltk.languages.pipelines.OldFrenchPipeline(description='Pipeline for the Old French language', processes=<factory>, language=Language(name='Old French (842-ca. 1400)', glottolog_id='oldf1239', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='oila1234', level='language', iso_639_3_code='fro', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Old French.>>> from cltk.languages.pipelines import OldFrenchPipeline >>> a_pipeline = OldFrenchPipeline() >>> a_pipeline.description 'Pipeline for the Old French language' >>> a_pipeline.language Language(name='Old French (842-ca. 1400)', glottolog_id='oldf1239', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='oila1234', level='language', iso_639_3_code='fro', type='h', dates=[]) >>> a_pipeline.language.name 'Old French (842-ca. 1400)' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.OldFrenchStanzaProcess'>
- description: str = 'Pipeline for the Old French language'¶
- class cltk.languages.pipelines.OldNorsePipeline(description='Pipeline for the Old Norse language', processes=<factory>, language=Language(name='Old Norse', glottolog_id='oldn1244', latitude=63.42, longitude=10.38, family_id='indo1319', parent_id='west2805', level='language', iso_639_3_code='non', type='h', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Old Norse.>>> from cltk.languages.pipelines import OldNorsePipeline >>> a_pipeline = OldNorsePipeline() >>> a_pipeline.description 'Pipeline for the Old Norse language' >>> a_pipeline.language Language(name='Old Norse', glottolog_id='oldn1244', latitude=63.42, longitude=10.38, family_id='indo1319', parent_id='west2805', level='language', iso_639_3_code='non', type='h', dates=[]) >>> a_pipeline.language.name 'Old Norse' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.OldNorseTokenizationProcess'>
- description: str = 'Pipeline for the Old Norse language'¶
- class cltk.languages.pipelines.PaliPipeline(description='Pipeline for the Pali language', processes=<factory>, language=Language(name='Pali', glottolog_id='pali1273', latitude=24.5271, longitude=82.251, family_id='indo1319', parent_id='biha1245', level='language', iso_639_3_code='pli', type='a', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Pali.TODO: Make better tokenizer for Pali.
>>> from cltk.languages.pipelines import PaliPipeline >>> a_pipeline = PaliPipeline() >>> a_pipeline.description 'Pipeline for the Pali language' >>> a_pipeline.language Language(name='Pali', glottolog_id='pali1273', latitude=24.5271, longitude=82.251, family_id='indo1319', parent_id='biha1245', level='language', iso_639_3_code='pli', type='a', dates=[]) >>> a_pipeline.language.name 'Pali' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MultilingualTokenizationProcess'>
- description: str = 'Pipeline for the Pali language'¶
- class cltk.languages.pipelines.PanjabiPipeline(description='Pipeline for the Panjabi language.', processes=<factory>, language=Language(name='Eastern Panjabi', glottolog_id='panj125', latitude=30.0368, longitude=75.6702, family_id='indo1319', parent_id='east2727', level='language', iso_639_3_code='pan', type='', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Panjabi.>>> from cltk.languages.pipelines import SanskritPipeline >>> a_pipeline = PanjabiPipeline() >>> a_pipeline.description 'Pipeline for the Panjabi language.' >>> a_pipeline.language Language(name='Eastern Panjabi', glottolog_id='panj125', latitude=30.0368, longitude=75.6702, family_id='indo1319', parent_id='east2727', level='language', iso_639_3_code='pan', type='', dates=[]) >>> a_pipeline.language.name 'Eastern Panjabi' >>> a_pipeline.processes[1] <class 'cltk.stops.processes.StopsProcess'>
- description: str = 'Pipeline for the Panjabi language.'¶
- class cltk.languages.pipelines.SanskritPipeline(description='Pipeline for the Sanskrit language.', processes=<factory>, language=Language(name='Sanskrit', glottolog_id='sans1269', latitude=20.0, longitude=77.0, family_id='indo1319', parent_id='indo1321', level='language', iso_639_3_code='san', type='a', dates=[]))[source]¶
Bases:
Pipeline
Default
Pipeline
for Sanskrit.TODO: Make better tokenizer for Sanskrit.
>>> from cltk.languages.pipelines import SanskritPipeline >>> a_pipeline = SanskritPipeline() >>> a_pipeline.description 'Pipeline for the Sanskrit language.' >>> a_pipeline.language Language(name='Sanskrit', glottolog_id='sans1269', latitude=20.0, longitude=77.0, family_id='indo1319', parent_id='indo1321', level='language', iso_639_3_code='san', type='a', dates=[]) >>> a_pipeline.language.name 'Sanskrit' >>> a_pipeline.processes[1] <class 'cltk.embeddings.processes.SanskritEmbeddingsProcess'>
- description: str = 'Pipeline for the Sanskrit language.'¶
8.1.7.5. cltk.languages.utils module¶
- cltk.languages.utils.get_lang(iso_code)[source]¶
Take ISO 639-3 code and return
Language
object for language.TODO: Split this into another fn,
check_language()
, which is how is usually used now.>>> from cltk.languages.utils import get_lang >>> get_lang("akk") Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[]) >>> from cltk.core.exceptions import UnknownLanguageError >>> get_lang("xxx") Traceback (most recent call last): ... cltk.core.exceptions.UnknownLanguageError: Unknown ISO language code 'xxx'.
- Return type:
- cltk.languages.utils.find_iso_name(common_name)[source]¶
Find the ISO 639-3 language code (e.g.,
lat
) by inputting the common name (Latin
). This function just does simple substring matching, with some normalization of case, on thename
field of theLanguage
object.>>> find_iso_name(common_name="Latin") ['lat'] >>> find_iso_name(common_name="lat") ['xga', 'lat'] >>> find_iso_name(common_name="slav") ['chu'] >>> find_iso_name(common_name="xxx") []
- Return type:
List
[str
]