8.1.18. cltk.text package¶
8.1.18.1. Submodules¶
8.1.18.2. cltk.text.akk module¶
-
cltk.text.akk.
_convert_consonant
(sign)[source]¶ Uses dictionary to replace ATF convention for unicode characters.
>>> signs = ["as,", "S,ATU", "tet,", "T,et", "sza", "ASZ"] >>> [_convert_consonant(s) for s in signs] ['aṣ', 'ṢATU', 'teṭ', 'Ṭet', 'ša', 'AŠ']
- Return type:
str
-
cltk.text.akk.
_convert_number_to_subscript
(num)[source]¶ Converts number into subscript.
>>> signs = ["a", "a1", "be2", "bad3", "buru14"] >>> [_get_number_from_sign(s)[1] for s in signs] [0, 1, 2, 3, 14]
- Return type:
str
-
cltk.text.akk.
_get_number_from_sign
(sign)[source]¶ Captures numbers after sign for __convert_num__.
input = [“a”, “a1”, “be2”, “bad3”, “buru14”] output = [0, 1, 2, 3, 14]
- Parameters:
sign (
str
) – string- Return type:
Tuple
[str
,int
]- Returns:
string, integer
-
class
cltk.text.akk.
ATFConverter
(two_three=True)[source]¶ Bases:
object
Class to convert tokens to unicode.
- Transliterates ATF data from CDLI into readable unicode.
sz = š s, = ṣ t, = ṭ ‘ = ʾ Sign values for 2-3 take accent aigu and accent grave standards, otherwise signs are printed as subscript.
- For in depth reading on ATF-formatting for CDLI and ORACC:
Oracc ATF Primer = http://oracc.museum.upenn.edu/doc/help/editinginatf/ primer/index.html ATF Structure = http://oracc.museum.upenn.edu/doc/help/editinginatf/ primer/structuretutorial/index.html ATF Inline = http://oracc.museum.upenn.edu/doc/help/editinginatf/ primer/inlinetutorial/index.html
8.1.18.3. cltk.text.lat module¶
Functions for replacing j/J and v/V to i/I and u/U
8.1.18.4. cltk.text.non module¶
Code for punctuation removal: Old Norse
8.1.18.5. cltk.text.processes module¶
-
class
cltk.text.processes.
PunctuationRemovalProcess
(language: str = None)[source]¶ Bases:
cltk.core.data_types.Process
-
class
cltk.text.processes.
DefaultPunctuationRemovalProcess
(language: str = None)[source]¶ Bases:
cltk.text.processes.PunctuationRemovalProcess
-
description
= 'Default punctuation removal algorithm'¶
-
algorithm
¶
-
-
class
cltk.text.processes.
OldNorsePunctuationRemovalProcess
(language: str = None)[source]¶ Bases:
cltk.text.processes.PunctuationRemovalProcess
-
description
= 'Default Old Norse punctuation removal algorithm'¶
-
algorithm
¶
-