8.1.20. cltk.utils package¶
Init for cltk.utils.
8.1.20.1. Submodules¶
8.1.20.2. cltk.utils.feature_extraction module¶
Helper functions for extracting features from CLTK data structures, especially for the purpose of preparing data for machine learning.
-
cltk.utils.feature_extraction.
cltk_doc_to_features_table
(cltk_doc)[source]¶ Take a CLTK
Doc
and return a list of lists ready for machine learning.This expects the default features available for Greek and Latin (word embeddings, morphology, syntax, lemmata). This should be improved to fail gracefully when less features available in the input
Doc
.TODO: Fail gracefully when missing info in
Doc
.- Return type:
Tuple
[List
[str
],List
[List
[Union
[str
,int
,float
,None
]]]]
8.1.20.3. cltk.utils.file_operations module¶
Miscellaneous file operations used by various parts of the CLTK.
-
cltk.utils.file_operations.
make_cltk_path
(*fp_list)[source]¶ Take arbitrary number of str arguments (not list) and return expanded, absolute path to a user’s (or user-defined) cltk_data dir.
Example: In [8]: make_cltk_path(‘greek’, ‘model’, ‘greek_models_cltk’) Out[8]: ‘/Users/kyle/cltk_data/greek/model/greek_models_cltk’
- Param:
: fp_list tokens to join together beginning from cltk_root folder
- Return type:
str
-
cltk.utils.file_operations.
open_pickle
(path)[source]¶ Open a pickle and return loaded pickle object. :type path: str :param : path: File path to pickle file to be opened. :rtype : object
- Return type:
Any
-
cltk.utils.file_operations.
md5
(filename)[source]¶ Given a filename produce an md5 hash of the contents. >>> import tempfile, os >>> f = tempfile.NamedTemporaryFile(delete=False) >>> f.write(b’Hello Wirld!’) 12 >>> f.close() >>> md5(f.name) ‘997c62b6afe9712cad3baffb49cb8c8a’ >>> os.unlink(f.name)
- Return type:
str
8.1.20.4. cltk.utils.utils module¶
Module for commonly reused classes and functions.
-
cltk.utils.utils.
file_exists
(file_path, is_dir=False)[source]¶ Try to expand ~/ and check if a file or dir exists. Optionally check if it’s a dir.
>>> file_exists('~/fake_file') False
>>> file_exists('~/', is_dir=True) True
- Return type:
bool
-
cltk.utils.utils.
reverse_dict
(input_dict, ignore_keys=None)[source]¶ Take a dict and reverse its keys and values. Optional parameter to ignore certain keys.
>>> ids_lang = dict(anci1242='Ancient Greek', lati1261='Latin', unlabeled=['Ottoman']) >>> reverse_dict(ids_lang, ignore_keys=['unlabeled']) {'Ancient Greek': 'anci1242', 'Latin': 'lati1261'}
>>> reverse_dict(dict(anci1242='Ancient Greek', lati1261='Latin')) {'Ancient Greek': 'anci1242', 'Latin': 'lati1261'}
>>> reverse_dict(ids_lang) Traceback (most recent call last): ... TypeError: This function can only convert type str value to a key. Received value type `<class 'list'>` for key `unlabeled` instead. Consider using `ignore_keys` for this key-value pair to be skipped.
>>> reverse_dict(ids_lang, ignore_keys='unlabeled') Traceback (most recent call last): ... TypeError: The `ignore_key` parameter must be either types None or list. Received type `<class 'str'>` instead.
>>> reverse_dict(ids_lang, ignore_keys=['UNUSED-KEY']) Traceback (most recent call last): ... TypeError: This function can only convert type str value to a key. Received value type `<class 'list'>` for key `unlabeled` instead. Consider using `ignore_keys` for this key-value pair to be skipped.
- Return type:
Dict
[str
,str
]
-
cltk.utils.utils.
suppress_stdout
()[source]¶ Wrap a function with this to suppress its printing to screen.
Source: https://thesmithfam.org/blog/2012/10/25/temporarily-suppress-console-output-in-python/.
>>> print("You can see this") You can see this
>>> with suppress_stdout(): ... print("YY")
>>> print("And you can see this again") And you can see this again
-
cltk.utils.utils.
get_cltk_data_dir
()[source]¶ Defines where to look for the
cltk_data
dir. By default, this is located in a user’s home directory and the directory is created there (~/cltk_data
). However a user may customize where this goes with the OS environment variable$CLTK_DATA
. If the variable is found, then its value is used.>>> from cltk.utils import CLTK_DATA_DIR >>> import os >>> os.environ["CLTK_DATA"] = os.path.expanduser("~/cltk_data") >>> cltk_data_dir = get_cltk_data_dir() >>> os.path.split(cltk_data_dir)[1] 'cltk_data' >>> del os.environ["CLTK_DATA"] >>> os.environ["CLTK_DATA"] = os.path.expanduser("~/custom_dir") >>> cltk_data_dir = os.environ.get("CLTK_DATA") >>> os.path.split(cltk_data_dir)[1] 'custom_dir' >>> del os.environ["CLTK_DATA"]
- Return type:
str
-
cltk.utils.utils.
query_yes_no
(question, default='yes')[source]¶ Ask a yes/no question via
input()` and return ``True
/False
..Source: https://stackoverflow.com/a/3041990.
- Parameters:
question (
str
) – Question string presented to the user.default (
Optional
[str
]) – Presumed answer if the user just hits <Enter>. It must be “yes” (the default), “no”, or None (meaning an answer is required of the user).
- Return type:
bool
- Returns:
True
for “yes” orFalse
for “no”.
-
cltk.utils.utils.
mk_dirs_for_file
(file_path)[source]¶ Make all dirs specified for final file. If dir already exists, then silently continue.
- Parameters:
file_path (
str
) – Paths of dirs to be created (i.e., mkdir -p)- Return type:
None
- Returns:
None
-
cltk.utils.utils.
get_file_with_progress_bar
(model_url, file_path)[source]¶ Download file with a progress bar.
Source: https://stackoverflow.com/a/37573701
- Parameters:
model_url (
str
) – URL from which to downloaded file.file_path (
str
) – Location at which to save file.
- Raises:
IOError – If size of downloaded file differs from that in remote’s
content-length
header.- Return type:
None
- Returns:
None