8.1.18. cltk.text package¶
8.1.18.1. Submodules¶
8.1.18.2. cltk.text.akk module¶
- cltk.text.akk._convert_consonant(sign)[source]¶
Uses dictionary to replace ATF convention for unicode characters.
>>> signs = ["as,", "S,ATU", "tet,", "T,et", "sza", "ASZ"] >>> [_convert_consonant(s) for s in signs] ['aṣ', 'ṢATU', 'teṭ', 'Ṭet', 'ša', 'AŠ']
- Return type:
str
- cltk.text.akk._convert_number_to_subscript(num)[source]¶
Converts number into subscript.
>>> signs = ["a", "a1", "be2", "bad3", "buru14"] >>> [_get_number_from_sign(s)[1] for s in signs] [0, 1, 2, 3, 14]
- Return type:
str
- cltk.text.akk._get_number_from_sign(sign)[source]¶
Captures numbers after sign for __convert_num__.
input = [“a”, “a1”, “be2”, “bad3”, “buru14”] output = [0, 1, 2, 3, 14]
- Parameters:
sign (
str
) – string- Return type:
Tuple
[str
,int
]- Returns:
string, integer
- class cltk.text.akk.ATFConverter(two_three=True)[source]¶
Bases:
object
Class to convert tokens to unicode.
- Transliterates ATF data from CDLI into readable unicode.
sz = š s, = ṣ t, = ṭ ‘ = ʾ Sign values for 2-3 take accent aigu and accent grave standards, otherwise signs are printed as subscript.
- For in depth reading on ATF-formatting for CDLI and ORACC:
Oracc ATF Primer = http://oracc.museum.upenn.edu/doc/help/editinginatf/ primer/index.html ATF Structure = http://oracc.museum.upenn.edu/doc/help/editinginatf/ primer/structuretutorial/index.html ATF Inline = http://oracc.museum.upenn.edu/doc/help/editinginatf/ primer/inlinetutorial/index.html
8.1.18.3. cltk.text.lat module¶
Functions for replacing j/J and v/V to i/I and u/U
8.1.18.4. cltk.text.non module¶
Code for punctuation removal: Old Norse
8.1.18.5. cltk.text.processes module¶
- class cltk.text.processes.DefaultPunctuationRemovalProcess(language=None)[source]¶
Bases:
PunctuationRemovalProcess
- description = 'Default punctuation removal algorithm'¶
- algorithm¶
- class cltk.text.processes.OldNorsePunctuationRemovalProcess(language=None)[source]¶
Bases:
PunctuationRemovalProcess
- description = 'Default Old Norse punctuation removal algorithm'¶
- algorithm¶