8.1.7. cltk.languages package¶
Init for cltk.languages.
8.1.7.1. Submodules¶
8.1.7.2. cltk.languages.example_texts module¶
Example paragraphs of text to be reused within the codebase for testing or demonstrating code.
TODO: Get longer Akkadian text
>>> from cltk.languages.example_texts import get_example_text
>>> get_example_text("grc")[:66]
'ὅτι μὲν ὑμεῖς, ὦ ἄνδρες Ἀθηναῖοι, πεπόνθατε ὑπὸ τῶν ἐμῶν κατηγόρων'
>>> get_example_text("lat")[:67]
'Gallia est omnis divisa in partes tres, quarum unam incolunt Belgae'
>>> get_example_text("non")[:50]
'Gylfi konungr réð þar löndum er nú heitir Svíþjóð.'
- cltk.languages.example_texts.get_example_text(iso_code)[source]¶
Take in search term of usual language name and find ISO code.
>>> from cltk.languages.example_texts import get_example_text >>> get_example_text("got")[:25] 'swa liuhtjai liuhaþ izwar' >>> get_example_text("zkz") Traceback (most recent call last): ... cltk.core.exceptions.UnimplementedAlgorithmError: Example text unavailable for ISO 639-3 code 'zkz'. >>> get_example_text("xxx") Traceback (most recent call last): ... cltk.core.exceptions.UnknownLanguageError: Unknown ISO language code 'xxx'.
- Return type:
str
8.1.7.3. cltk.languages.glottolog module¶
Module for mapping ISO 639-3 to Glottolog languages and language names.
The key is the ISO code and the value, being a Language object, contains
information from both the Glottolog and ISO data sets. The contents of this
module were generated by scripts/make_glottolog_languages.py.
ISO 639-3 is an international standard for language languages with an aim to cover all known natural languages. The extended language coverage was based primarily on the language languages published by SIL International, which is now the registration authority for ISO 639-3. About: https://iso639-3.sil.org/.
Glottolog is a project run by the Max Planck Institute for the Science of Human History. The website contains languages for languages as well as reconstructions of language families. About: http://glottolog.org/. Data of Glottolog 4.0 is published under the following license: https://creativecommons.org/licenses/by/4.0/.
Haspelmath, Martin & Forkel, Robert & Hammarström, Harald. 2019. Glottolog 4.0. Jena: Max Planck Institute for the Science of Human History. (Available online at http://glottolog.org, Accessed on 2019-10-02.)
>>> from cltk.languages.utils import get_lang
>>> akkadian = get_lang("akk")
>>> akkadian
Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[])
>>> akkadian.name
'Akkadian'
>>> akkadian.glottolog_id
'akka1240'
>>> akkadian.latitude
33.1
>>> akkadian.longitude
44.1
>>> akkadian.family_id
'afro1255'
>>> akkadian.parent_id
'east2678'
>>> len(LANGUAGES)
219
- cltk.languages.glottolog._resort_languages_list(languages_list)[source]¶
Pick up the LANGUAGES global and return alphabetized according to a language’s common name.
>>> iso_dict_keys = _resort_languages_list(LANGUAGES) >>> list(iso_dict_keys)[:10] ['xae', 'xag', 'akk', 'xln', 'grc', 'hbo', 'xlg', 'xmk', 'xna', 'xzp']
- Return type:
OrderedDict[str,Language]
8.1.7.4. cltk.languages.pipelines module¶
Default processing pipelines for languages. The purpose of these dataclasses is to represent:
the types of NLP processes that the CLTK can do
the order in which processes are to be executed
specifying what downstream features a particular implemented process requires
- class cltk.languages.pipelines.AkkadianPipeline(description='Pipeline for the Akkadian language.', processes=<factory>, language=Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Akkadian.>>> from cltk.languages.pipelines import AkkadianPipeline >>> a_pipeline = AkkadianPipeline() >>> a_pipeline.description 'Pipeline for the Akkadian language.' >>> a_pipeline.language Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[]) >>> a_pipeline.language.name 'Akkadian' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.AkkadianTokenizationProcess'>
- description: str = 'Pipeline for the Akkadian language.'¶
- class cltk.languages.pipelines.ArabicPipeline(description='Pipeline for the Arabic language', processes=<factory>, language=Language(name='Standard Arabic', glottolog_id='stan1318', latitude=27.9625, longitude=43.8525, family_id='afro1255', parent_id='arab1395', level='language', iso_639_3_code='arb', type='', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Arabic.>>> from cltk.languages.pipelines import ArabicPipeline >>> a_pipeline = ArabicPipeline() >>> a_pipeline.description 'Pipeline for the Arabic language' >>> a_pipeline.language Language(name='Standard Arabic', glottolog_id='stan1318', latitude=27.9625, longitude=43.8525, family_id='afro1255', parent_id='arab1395', level='language', iso_639_3_code='arb', type='', dates=[]) >>> a_pipeline.language.name 'Standard Arabic' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.ArabicTokenizationProcess'>
- description: str = 'Pipeline for the Arabic language'¶
- class cltk.languages.pipelines.AramaicPipeline(description='Pipeline for the Aramaic language', processes=<factory>, language=Language(name='Official Aramaic (700-300 BCE)', glottolog_id='', latitude=0.0, longitude=0.0, family_id='', parent_id='', level='', iso_639_3_code='arc', type='a', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Aramaic.TODO: Confirm with specialist what encodings should be expected. TODO: Replace
ArabicTokenizationProcesswith a multilingual one or a specific Aramaic.>>> from cltk.languages.pipelines import AramaicPipeline >>> a_pipeline = AramaicPipeline() >>> a_pipeline.description 'Pipeline for the Aramaic language' >>> a_pipeline.language Language(name='Official Aramaic (700-300 BCE)', glottolog_id='', latitude=0.0, longitude=0.0, family_id='', parent_id='', level='', iso_639_3_code='arc', type='a', dates=[]) >>> a_pipeline.language.name 'Official Aramaic (700-300 BCE)' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.ArabicTokenizationProcess'>
- description: str = 'Pipeline for the Aramaic language'¶
- class cltk.languages.pipelines.ChinesePipeline(description='Pipeline for the Classical Chinese language', processes=<factory>, language=Language(name='Literary Chinese', glottolog_id='lite1248', latitude=0.0, longitude=0.0, family_id='sino1245', parent_id='clas1255', level='language', iso_639_3_code='lzh', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Classical Chinese.>>> from cltk.languages.pipelines import ChinesePipeline >>> a_pipeline = ChinesePipeline() >>> a_pipeline.description 'Pipeline for the Classical Chinese language' >>> a_pipeline.language Language(name='Literary Chinese', glottolog_id='lite1248', latitude=0.0, longitude=0.0, family_id='sino1245', parent_id='clas1255', level='language', iso_639_3_code='lzh', type='h', dates=[]) >>> a_pipeline.language.name 'Literary Chinese' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.ChineseStanzaProcess'>
- description: str = 'Pipeline for the Classical Chinese language'¶
- class cltk.languages.pipelines.CopticPipeline(description='Pipeline for the Coptic language', processes=<factory>, language=Language(name='Coptic', glottolog_id='copt1239', latitude=29.472, longitude=31.2053, family_id='afro1255', parent_id='egyp1245', level='language', iso_639_3_code='cop', type='', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Coptic.>>> from cltk.languages.pipelines import CopticPipeline >>> a_pipeline = CopticPipeline() >>> a_pipeline.description 'Pipeline for the Coptic language' >>> a_pipeline.language Language(name='Coptic', glottolog_id='copt1239', latitude=29.472, longitude=31.2053, family_id='afro1255', parent_id='egyp1245', level='language', iso_639_3_code='cop', type='', dates=[]) >>> a_pipeline.language.name 'Coptic' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.CopticStanzaProcess'>
- description: str = 'Pipeline for the Coptic language'¶
- class cltk.languages.pipelines.GothicPipeline(description='Pipeline for the Gothic language', processes=<factory>, language=Language(name='Gothic', glottolog_id='goth1244', latitude=46.9304, longitude=29.9786, family_id='indo1319', parent_id='east2805', level='language', iso_639_3_code='got', type='a', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Gothic.>>> from cltk.languages.pipelines import GothicPipeline >>> a_pipeline = GothicPipeline() >>> a_pipeline.description 'Pipeline for the Gothic language' >>> a_pipeline.language Language(name='Gothic', glottolog_id='goth1244', latitude=46.9304, longitude=29.9786, family_id='indo1319', parent_id='east2805', level='language', iso_639_3_code='got', type='a', dates=[]) >>> a_pipeline.language.name 'Gothic' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.GothicStanzaProcess'> >>> a_pipeline.processes[1] <class 'cltk.embeddings.processes.GothicEmbeddingsProcess'>
- description: str = 'Pipeline for the Gothic language'¶
- class cltk.languages.pipelines.GreekPipeline(description='Pipeline for the Greek language', processes=<factory>, language=Language(name='Ancient Greek', glottolog_id='anci1242', latitude=39.8155, longitude=21.9129, family_id='indo1319', parent_id='east2798', level='language', iso_639_3_code='grc', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Ancient Greek.>>> from cltk.languages.pipelines import GreekPipeline >>> a_pipeline = GreekPipeline() >>> a_pipeline.description 'Pipeline for the Greek language' >>> a_pipeline.language Language(name='Ancient Greek', glottolog_id='anci1242', latitude=39.8155, longitude=21.9129, family_id='indo1319', parent_id='east2798', level='language', iso_639_3_code='grc', type='h', dates=[]) >>> a_pipeline.language.name 'Ancient Greek' >>> a_pipeline.processes[0] <class 'cltk.alphabet.processes.GreekNormalizeProcess'>
- description: str = 'Pipeline for the Greek language'¶
- class cltk.languages.pipelines.HindiPipeline(description='Pipeline for the Hindi language.', processes=<factory>, language=Language(name='Hindi', glottolog_id='hind1269', latitude=25.0, longitude=77.0, family_id='indo1319', parent_id='hind1270', level='language', iso_639_3_code='hin', type='', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Hindi.>>> from cltk.languages.pipelines import HindiPipeline >>> a_pipeline = HindiPipeline() >>> a_pipeline.description 'Pipeline for the Hindi language.' >>> a_pipeline.language Language(name='Hindi', glottolog_id='hind1269', latitude=25.0, longitude=77.0, family_id='indo1319', parent_id='hind1270', level='language', iso_639_3_code='hin', type='', dates=[]) >>> a_pipeline.language.name 'Hindi' >>> a_pipeline.processes[1] <class 'cltk.stops.processes.StopsProcess'>
- description: str = 'Pipeline for the Hindi language.'¶
- class cltk.languages.pipelines.LatinPipeline(description='Pipeline for the Latin language', processes=<factory>, language=Language(name='Latin', glottolog_id='lati1261', latitude=41.9026, longitude=12.4502, family_id='indo1319', parent_id='impe1234', level='language', iso_639_3_code='lat', type='a', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Latin.TODO: Add stopword annotation for all relevant pipelines.
>>> from cltk.languages.pipelines import LatinPipeline >>> a_pipeline = LatinPipeline() >>> a_pipeline.description 'Pipeline for the Latin language' >>> a_pipeline.language Language(name='Latin', glottolog_id='lati1261', latitude=41.9026, longitude=12.4502, family_id='indo1319', parent_id='impe1234', level='language', iso_639_3_code='lat', type='a', dates=[]) >>> a_pipeline.language.name 'Latin' >>> a_pipeline.processes[0] <class 'cltk.alphabet.processes.LatinNormalizeProcess'>
- description: str = 'Pipeline for the Latin language'¶
- class cltk.languages.pipelines.MiddleHighGermanPipeline(description='Pipeline for the Middle High German language.', processes=<factory>, language=Language(name='Middle High German', glottolog_id='midd1343', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='midd1349', level='language', iso_639_3_code='gmh', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Middle High German.>>> a_pipeline = MiddleHighGermanPipeline() >>> a_pipeline.description 'Pipeline for the Middle High German language.' >>> a_pipeline.language Language(name='Middle High German', glottolog_id='midd1343', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='midd1349', level='language', iso_639_3_code='gmh', type='h', dates=[]) >>> a_pipeline.language.name 'Middle High German' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MiddleHighGermanTokenizationProcess'>
- description: str = 'Pipeline for the Middle High German language.'¶
- class cltk.languages.pipelines.MiddleEnglishPipeline(description='Pipeline for the Middle English language', processes=<factory>, language=Language(name='Middle English', glottolog_id='midd1317', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='merc1242', level='language', iso_639_3_code='enm', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Middle English.TODO: Figure out whether this the dedicated tokenizer is good enough or necessary; we have stanza for Old English, which might be able to tokenizer fine.
>>> from cltk.languages.pipelines import MiddleEnglishPipeline >>> a_pipeline = MiddleEnglishPipeline() >>> a_pipeline.description 'Pipeline for the Middle English language' >>> a_pipeline.language Language(name='Middle English', glottolog_id='midd1317', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='merc1242', level='language', iso_639_3_code='enm', type='h', dates=[]) >>> a_pipeline.language.name 'Middle English' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MiddleEnglishTokenizationProcess'> >>> from cltk import NLP >>> middle_english_nlp = NLP(language="enm", suppress_banner=True) >>> from cltk.languages.example_texts import get_example_text >>> doc = middle_english_nlp.analyze(get_example_text("enm")) >>> doc[2].embedding.shape (50,)
- description: str = 'Pipeline for the Middle English language'¶
- class cltk.languages.pipelines.MiddleFrenchPipeline(description='Pipeline for the Middle French language', processes=<factory>, language=Language(name='Middle French', glottolog_id='midd1316', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='stan1290', level='dialect', iso_639_3_code='frm', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Middle French.TODO: Figure out whether this the dedicated tokenizer is good enough or necessary; we have stanza for Old French, which might be able to tokenizer fine.
>>> from cltk.languages.pipelines import MiddleFrenchPipeline >>> a_pipeline = MiddleFrenchPipeline() >>> a_pipeline.description 'Pipeline for the Middle French language' >>> a_pipeline.language Language(name='Middle French', glottolog_id='midd1316', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='stan1290', level='dialect', iso_639_3_code='frm', type='h', dates=[]) >>> a_pipeline.language.name 'Middle French' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MiddleFrenchTokenizationProcess'>
- description: str = 'Pipeline for the Middle French language'¶
- class cltk.languages.pipelines.OCSPipeline(description='Pipeline for the Old Church Slavonic language', processes=<factory>, language=Language(name='Church Slavic', glottolog_id='chur1257', latitude=43.7171, longitude=22.8442, family_id='indo1319', parent_id='east2269', level='language', iso_639_3_code='chu', type='a', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Old Church Slavonic.>>> from cltk.languages.pipelines import OCSPipeline >>> a_pipeline = OCSPipeline() >>> a_pipeline.description 'Pipeline for the Old Church Slavonic language' >>> a_pipeline.language Language(name='Church Slavic', glottolog_id='chur1257', latitude=43.7171, longitude=22.8442, family_id='indo1319', parent_id='east2269', level='language', iso_639_3_code='chu', type='a', dates=[]) >>> a_pipeline.language.name 'Church Slavic' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.OCSStanzaProcess'>
- description: str = 'Pipeline for the Old Church Slavonic language'¶
- class cltk.languages.pipelines.OldEnglishPipeline(description='Pipeline for the Old English language', processes=<factory>, language=Language(name='Old English (ca. 450-1100)', glottolog_id='olde1238', latitude=51.06, longitude=-1.31, family_id='indo1319', parent_id='angl1265', level='language', iso_639_3_code='ang', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Old English.>>> from cltk.languages.pipelines import OldEnglishPipeline >>> a_pipeline = OldEnglishPipeline() >>> a_pipeline.description 'Pipeline for the Old English language' >>> a_pipeline.language Language(name='Old English (ca. 450-1100)', glottolog_id='olde1238', latitude=51.06, longitude=-1.31, family_id='indo1319', parent_id='angl1265', level='language', iso_639_3_code='ang', type='h', dates=[]) >>> a_pipeline.language.name 'Old English (ca. 450-1100)' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MultilingualTokenizationProcess'>
- description: str = 'Pipeline for the Old English language'¶
- class cltk.languages.pipelines.OldFrenchPipeline(description='Pipeline for the Old French language', processes=<factory>, language=Language(name='Old French (842-ca. 1400)', glottolog_id='oldf1239', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='oila1234', level='language', iso_639_3_code='fro', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Old French.>>> from cltk.languages.pipelines import OldFrenchPipeline >>> a_pipeline = OldFrenchPipeline() >>> a_pipeline.description 'Pipeline for the Old French language' >>> a_pipeline.language Language(name='Old French (842-ca. 1400)', glottolog_id='oldf1239', latitude=0.0, longitude=0.0, family_id='indo1319', parent_id='oila1234', level='language', iso_639_3_code='fro', type='h', dates=[]) >>> a_pipeline.language.name 'Old French (842-ca. 1400)' >>> a_pipeline.processes[0] <class 'cltk.dependency.processes.OldFrenchStanzaProcess'>
- description: str = 'Pipeline for the Old French language'¶
- class cltk.languages.pipelines.OldNorsePipeline(description='Pipeline for the Old Norse language', processes=<factory>, language=Language(name='Old Norse', glottolog_id='oldn1244', latitude=63.42, longitude=10.38, family_id='indo1319', parent_id='west2805', level='language', iso_639_3_code='non', type='h', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Old Norse.>>> from cltk.languages.pipelines import OldNorsePipeline >>> a_pipeline = OldNorsePipeline() >>> a_pipeline.description 'Pipeline for the Old Norse language' >>> a_pipeline.language Language(name='Old Norse', glottolog_id='oldn1244', latitude=63.42, longitude=10.38, family_id='indo1319', parent_id='west2805', level='language', iso_639_3_code='non', type='h', dates=[]) >>> a_pipeline.language.name 'Old Norse' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.OldNorseTokenizationProcess'>
- description: str = 'Pipeline for the Old Norse language'¶
- class cltk.languages.pipelines.PaliPipeline(description='Pipeline for the Pali language', processes=<factory>, language=Language(name='Pali', glottolog_id='pali1273', latitude=24.5271, longitude=82.251, family_id='indo1319', parent_id='biha1245', level='language', iso_639_3_code='pli', type='a', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Pali.TODO: Make better tokenizer for Pali.
>>> from cltk.languages.pipelines import PaliPipeline >>> a_pipeline = PaliPipeline() >>> a_pipeline.description 'Pipeline for the Pali language' >>> a_pipeline.language Language(name='Pali', glottolog_id='pali1273', latitude=24.5271, longitude=82.251, family_id='indo1319', parent_id='biha1245', level='language', iso_639_3_code='pli', type='a', dates=[]) >>> a_pipeline.language.name 'Pali' >>> a_pipeline.processes[0] <class 'cltk.tokenizers.processes.MultilingualTokenizationProcess'>
- description: str = 'Pipeline for the Pali language'¶
- class cltk.languages.pipelines.PanjabiPipeline(description='Pipeline for the Panjabi language.', processes=<factory>, language=Language(name='Eastern Panjabi', glottolog_id='panj125', latitude=30.0368, longitude=75.6702, family_id='indo1319', parent_id='east2727', level='language', iso_639_3_code='pan', type='', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Panjabi.>>> from cltk.languages.pipelines import SanskritPipeline >>> a_pipeline = PanjabiPipeline() >>> a_pipeline.description 'Pipeline for the Panjabi language.' >>> a_pipeline.language Language(name='Eastern Panjabi', glottolog_id='panj125', latitude=30.0368, longitude=75.6702, family_id='indo1319', parent_id='east2727', level='language', iso_639_3_code='pan', type='', dates=[]) >>> a_pipeline.language.name 'Eastern Panjabi' >>> a_pipeline.processes[1] <class 'cltk.stops.processes.StopsProcess'>
- description: str = 'Pipeline for the Panjabi language.'¶
- class cltk.languages.pipelines.SanskritPipeline(description='Pipeline for the Sanskrit language.', processes=<factory>, language=Language(name='Sanskrit', glottolog_id='sans1269', latitude=20.0, longitude=77.0, family_id='indo1319', parent_id='indo1321', level='language', iso_639_3_code='san', type='a', dates=[]))[source]¶
Bases:
PipelineDefault
Pipelinefor Sanskrit.TODO: Make better tokenizer for Sanskrit.
>>> from cltk.languages.pipelines import SanskritPipeline >>> a_pipeline = SanskritPipeline() >>> a_pipeline.description 'Pipeline for the Sanskrit language.' >>> a_pipeline.language Language(name='Sanskrit', glottolog_id='sans1269', latitude=20.0, longitude=77.0, family_id='indo1319', parent_id='indo1321', level='language', iso_639_3_code='san', type='a', dates=[]) >>> a_pipeline.language.name 'Sanskrit' >>> a_pipeline.processes[1] <class 'cltk.embeddings.processes.SanskritEmbeddingsProcess'>
- description: str = 'Pipeline for the Sanskrit language.'¶
8.1.7.5. cltk.languages.utils module¶
- cltk.languages.utils.get_lang(iso_code)[source]¶
Take ISO 639-3 code and return
Languageobject for language.TODO: Split this into another fn,
check_language(), which is how is usually used now.>>> from cltk.languages.utils import get_lang >>> get_lang("akk") Language(name='Akkadian', glottolog_id='akka1240', latitude=33.1, longitude=44.1, family_id='afro1255', parent_id='east2678', level='language', iso_639_3_code='akk', type='a', dates=[]) >>> from cltk.core.exceptions import UnknownLanguageError >>> get_lang("xxx") Traceback (most recent call last): ... cltk.core.exceptions.UnknownLanguageError: Unknown ISO language code 'xxx'.
- Return type:
- cltk.languages.utils.find_iso_name(common_name)[source]¶
Find the ISO 639-3 language code (e.g.,
lat) by inputting the common name (Latin). This function just does simple substring matching, with some normalization of case, on thenamefield of theLanguageobject.>>> find_iso_name(common_name="Latin") ['lat'] >>> find_iso_name(common_name="lat") ['xga', 'lat'] >>> find_iso_name(common_name="slav") ['chu'] >>> find_iso_name(common_name="xxx") []
- Return type:
List[str]