Full API documentation¶
Once a Linguistica object (such as lxa_object
below with the Brown corpus)
is initialized, various methods and attributes are available for automatic
linguistic analysis:
>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist() # using wordlist()
Basic information¶
Return the number of word tokens. |
|
Return the number of word types. |
Word ngrams¶
Parameter: max_word_tokens
|
Return a wordlist sorted by word frequency in descending order. |
Return a dict of words with their counts. |
|
Return a dict of word bigrams with their counts. |
|
Return a dict of word trigrams with their counts. |
Morphological signatures¶
Parameters: min_stem_length
, max_affix_length
, min_sig_count
, suffixing
Return a set of morphological signatures. |
|
|
Return a set of stems. |
|
Return a set of affixes. |
Return a dict of morphological signatures to stems. |
|
Return a dict of morphological signatures to words. |
|
Return a dict of affixes to morphological signatures. |
|
Return a dict of stems to morphological signatures. |
|
Return a dict of stems to words. |
|
Return a set of words that are in at least one morphological signature. |
|
Return a dict of words to morphological signatures. |
|
Return a dict of words to signature transforms. |
Word manifolds and syntactic word neighborhood¶
Parameters: max_word_types
, min_context_count
, n_neighbors
, n_eigenvectors
Return a dict of words to syntactic neighbors. |
|
Return the syntactic word neighborhood graph. |
|
Return a dict of words to contexts with counts. |
|
Return a dict of contexts to words with counts. |
Phonology¶
Return a dict of phone unigrams with counts. |
|
Return a dict of phone bigrams with counts. |
|
Return a dict of phone trigrams with counts. |
Tries¶
Parameter: min_stem_length
Return a dict of words to their left-to-right broken form. |
|
Return a dict of words to their right-to-left broken form. |
|
Return a dict of word (sub)strings to their successors. |
|
Return a dict of word (sub)strings to their predecessors. |
Other methods and attributes¶
Return the parameter dict. |
|
|
Change parameters specified by kwargs. |
Reset parameters to their default values. |
|
|
Reset the Linguistica object. |
- class linguistica.lexicon.Lexicon(file_path=None, wordlist_file=False, corpus_object=None, wordlist_object=None, encoding='utf8', **kwargs)¶
A class for a Linguistica object.
- affixes()¶
Return a set of affixes.
- Return type:
set(str)
- affixes_to_signatures()¶
Return a dict of affixes to morphological signatures.
- Return type:
dict(str: set(tuple(str)))
- biphone_dict()¶
Return a dict of phone bigrams to Biphone objects. A Biphone instance has the methods
spelling()
,count()
,frequency()
,MI()
, andweighted_MI()
.- Return type:
dict((str, str): Biphone instance)
- broken_words_left_to_right()¶
Return a dict of words to their left-to-right broken form.
- Return type:
dict(str: list(str))
- broken_words_right_to_left()¶
Return a dict of words to their right-to-left broken form.
- Return type:
dict(str: list(str))
- change_parameters(**kwargs)¶
Change parameters specified by kwargs.
- Parameters:
kwargs – keyword arguments for parameters and their new values
- contexts_to_words()¶
Return a dict of contexts to words with counts.
- Return type:
dict(tuple(str): dict(str: int))
- neighbor_graph()¶
Return the syntactic word neighborhood graph.
- Return type:
networkx undirected graph
- number_of_word_tokens()¶
Return the number of word tokens.
- Return type:
int
- number_of_word_types()¶
Return the number of word types.
- Return type:
int
- output_all_results(directory=None, verbose=False, test=False)¶
Output all Linguistica results to directory.
- Parameters:
directory – output directory. If not specified, it defaults to the current directory given by
os.getcwd()
.
- parameters()¶
Return the parameter dict.
- Return type:
dict(str: int)
- phone_bigram_counter()¶
Return a dict of phone bigrams with counts.
- Return type:
dict(tuple(str): int)
- phone_dict()¶
Return a dict of phone unigrams to Phone objects. A Phone instance has the methods
spelling()
,count()
,frequency()
, andplog()
.- Return type:
dict(str: Phone instance)
- phone_trigram_counter()¶
Return a dict of phone trigrams with counts.
- Return type:
dict(tuple(str): int)
- phone_unigram_counter()¶
Return a dict of phone unigrams with counts.
- Return type:
dict(str: int)
- predecessors()¶
Return a dict of word (sub)strings to their predecessors.
- Return type:
dict(str: set(str))
- reset()¶
Reset the Linguistica object. While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to
NULL
; if they are called again, they are re-computed.
- run_all_modules(verbose=False)¶
Run all modules.
- run_manifold_module(verbose=False)¶
Run the phon module.
- run_ngram_module(verbose=False)¶
Run the ngram module.
- run_phon_module(verbose=False)¶
Run the phon module.
- run_signature_module(verbose=False)¶
Run the signature module.
- run_trie_module(verbose=False)¶
Run the trie module.
- signatures()¶
Return a set of morphological signatures.
- Return type:
set(tuple(str))
- signatures_to_stems()¶
Return a dict of morphological signatures to stems.
- Return type:
dict(tuple(str): set(str))
- signatures_to_words()¶
Return a dict of morphological signatures to words.
- Return type:
dict(tuple(str): set(str))
- stems()¶
Return a set of stems.
- Return type:
set(str)
- stems_to_signatures()¶
Return a dict of stems to morphological signatures.
- Return type:
dict(str: set(tuple(str)))
- stems_to_words()¶
Return a dict of stems to words.
- Return type:
dict(str: set(str))
- successors()¶
Return a dict of word (sub)strings to their successors.
- Return type:
dict(str: set(str))
- use_default_parameters()¶
Reset parameters to their default values.
- word_bigram_counter()¶
Return a dict of word bigrams with their counts.
- Return type:
dict(tuple(str): int)
- word_phonology_dict()¶
Return a dict of words to Word objects. A Word instance has the methods
spelling()
,phones()
,count()
,frequency()
,unigram_plog()
,avg_unigram_plog()
,bigram_plog()
, andavg_bigram_plog()
.- Return type:
dict(str: Word instance)
- word_trigram_counter()¶
Return a dict of word trigrams with their counts.
- Return type:
dict(tuple(str): int)
- word_unigram_counter()¶
Return a dict of words with their counts.
- Return type:
dict(str: in)
- wordlist()¶
Return a wordlist sorted by word frequency in descending order. (So “the” will most likely be the first word for written English.)
- Return type:
list(str)
- words_in_signatures()¶
Return a set of words that are in at least one morphological signature.
- Return type:
set(str)
- words_to_contexts()¶
Return a dict of words to contexts with counts.
- Return type:
dict(str: dict(tuple(str): int))
- words_to_neighbors()¶
Return a dict of words to syntactic neighbors.
- Return type:
dict(word: list(str))
- words_to_phones()¶
Return a dict of words with their phones.
- Return type:
dict(str: list(str))
- words_to_signatures()¶
Return a dict of words to morphological signatures.
- Return type:
dict(str: set(tuple(str)))
- words_to_sigtransforms()¶
Return a dict of words to signature transforms.
- Return type:
dict(str: set(tuple(tuple(str), str))