8.1.20. cltk.utils package

Init for cltk.utils.

8.1.20.1. Submodules

8.1.20.2. cltk.utils.feature_extraction module

Helper functions for extracting features from CLTK data structures, especially for the purpose of preparing data for machine learning.

cltk.utils.feature_extraction.cltk_doc_to_features_table(cltk_doc)[source]

Take a CLTK Doc and return a list of lists ready for machine learning.

This expects the default features available for Greek and Latin (word embeddings, morphology, syntax, lemmata). This should be improved to fail gracefully when less features available in the input Doc.

TODO: Fail gracefully when missing info in Doc.

Return type:

Tuple[List[str], List[List[Union[str, int, float, None]]]]

8.1.20.3. cltk.utils.file_operations module

Miscellaneous file operations used by various parts of the CLTK.

cltk.utils.file_operations.make_cltk_path(*fp_list)[source]

Take arbitrary number of str arguments (not list) and return expanded, absolute path to a user’s (or user-defined) cltk_data dir.

Example: In [8]: make_cltk_path(‘greek’, ‘model’, ‘greek_models_cltk’) Out[8]: ‘/Users/kyle/cltk_data/greek/model/greek_models_cltk’

Param:

: fp_list tokens to join together beginning from cltk_root folder

Return type:

str

cltk.utils.file_operations.open_pickle(path)[source]

Open a pickle and return loaded pickle object. :type path: str :param : path: File path to pickle file to be opened.

Return type:

Any

cltk.utils.file_operations.md5(filename)[source]

Given a filename produce an md5 hash of the contents. >>> import tempfile, os >>> f = tempfile.NamedTemporaryFile(delete=False) >>> f.write(b’Hello Wirld!’) 12 >>> f.close() >>> md5(f.name) ‘997c62b6afe9712cad3baffb49cb8c8a’ >>> os.unlink(f.name)

Return type:

str

8.1.20.4. cltk.utils.utils module

Module for commonly reused classes and functions.

class cltk.utils.utils.CLTKEnumMeta(cls, bases, classdict, *, boundary=None, _simple=False, **kwds)[source]

Bases: EnumType

class cltk.utils.utils.CLTKEnum(value, names=None, *, module=None, qualname=None, type=None, start=1, boundary=None)[source]

Bases: IntEnum

cltk.utils.utils.file_exists(file_path, is_dir=False)[source]

Try to expand ~/ and check if a file or dir exists. Optionally check if it’s a dir.

>>> file_exists('~/fake_file')
False
>>> file_exists('~/', is_dir=True)
True
Return type:

bool

cltk.utils.utils.reverse_dict(input_dict, ignore_keys=None)[source]

Take a dict and reverse its keys and values. Optional parameter to ignore certain keys.

>>> ids_lang = dict(anci1242='Ancient Greek', lati1261='Latin', unlabeled=['Ottoman'])
>>> reverse_dict(ids_lang, ignore_keys=['unlabeled'])
{'Ancient Greek': 'anci1242', 'Latin': 'lati1261'}
>>> reverse_dict(dict(anci1242='Ancient Greek', lati1261='Latin'))
{'Ancient Greek': 'anci1242', 'Latin': 'lati1261'}
>>> reverse_dict(ids_lang)
Traceback (most recent call last):
  ...
TypeError: This function can only convert type str value to a key. Received value type `<class 'list'>` for key `unlabeled` instead. Consider using `ignore_keys` for this key-value pair to be skipped.
>>> reverse_dict(ids_lang, ignore_keys='unlabeled')
Traceback (most recent call last):
  ...
TypeError: The `ignore_key` parameter must be either types None or list. Received type `<class 'str'>` instead.
>>> reverse_dict(ids_lang, ignore_keys=['UNUSED-KEY'])
Traceback (most recent call last):
  ...
TypeError: This function can only convert type str value to a key. Received value type `<class 'list'>` for key `unlabeled` instead. Consider using `ignore_keys` for this key-value pair to be skipped.
Return type:

Dict[str, str]

cltk.utils.utils.suppress_stdout()[source]

Wrap a function with this to suppress its printing to screen.

Source: https://thesmithfam.org/blog/2012/10/25/temporarily-suppress-console-output-in-python/.

>>> print("You can see this")
You can see this
>>> with suppress_stdout():
...     print("YY")
>>> print("And you can see this again")
And you can see this again
cltk.utils.utils.get_cltk_data_dir()[source]

Defines where to look for the cltk_data dir. By default, this is located in a user’s home directory and the directory is created there (~/cltk_data). However a user may customize where this goes with the OS environment variable $CLTK_DATA. If the variable is found, then its value is used.

>>> from cltk.utils import CLTK_DATA_DIR
>>> import os
>>> os.environ["CLTK_DATA"] = os.path.expanduser("~/cltk_data")
>>> cltk_data_dir = get_cltk_data_dir()
>>> os.path.split(cltk_data_dir)[1]
'cltk_data'
>>> del os.environ["CLTK_DATA"]
>>> os.environ["CLTK_DATA"] = os.path.expanduser("~/custom_dir")
>>> cltk_data_dir = os.environ.get("CLTK_DATA")
>>> os.path.split(cltk_data_dir)[1]
'custom_dir'
>>> del os.environ["CLTK_DATA"]
Return type:

str

cltk.utils.utils.str_to_bool(string, truths=None)[source]

Convert a string into a boolean (case insensitively).

Parameters:
  • string (str) – String to convert.

  • truths (Optional[List[str]]) – List of strings that count as Truthy; defaults to “yes” and “y”.

Return type:

bool

Returns:

True if string is in truths; otherwise, returns False. All strings are compared in lowercase, so the method is case insensitive.

cltk.utils.utils.query_yes_no(question, default='yes')[source]

Ask a yes/no question via input() and return True/False.

Source: https://stackoverflow.com/a/3041990.

Parameters:
  • question (str) – Question string presented to the user.

  • default (Optional[str]) – Presumed answer if the user just hits <Enter>. It must be “yes” (the default), “no”, or None (meaning an answer is required of the user).

Return type:

bool

Returns:

True for “yes” and “y” or False for “no” and “n”.

cltk.utils.utils.mk_dirs_for_file(file_path)[source]

Make all dirs specified for final file. If dir already exists, then silently continue.

Parameters:

file_path (str) – Paths of dirs to be created (i.e., mkdir -p)

Return type:

None

Returns:

None

cltk.utils.utils.get_file_with_progress_bar(model_url, file_path)[source]

Download file with a progress bar.

Source: https://stackoverflow.com/a/37573701

Parameters:
  • model_url (str) – URL from which to downloaded file.

  • file_path (str) – Location at which to save file.

Raises:

IOError – If size of downloaded file differs from that in remote’s content-length header.

Return type:

None

Returns:

None