Full API documentation¶

Once a Linguistica object (such as lxa_object below with the Brown corpus) is initialized, various methods and attributes are available for automatic linguistic analysis:

>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist()  # using wordlist()

Basic information¶

`number_of_word_tokens`()	Return the number of word tokens.
`number_of_word_types`()	Return the number of word types.

Word ngrams¶

Parameter: max_word_tokens

`wordlist`()	Return a wordlist sorted by word frequency in descending order.
`word_unigram_counter`()	Return a dict of words with their counts.
`word_bigram_counter`()	Return a dict of word bigrams with their counts.
`word_trigram_counter`()	Return a dict of word trigrams with their counts.

Morphological signatures¶

Parameters: min_stem_length, max_affix_length, min_sig_count, suffixing

`signatures`()	Return a set of morphological signatures.
`stems`()	Return a set of stems.
`affixes`()	Return a set of affixes.
`signatures_to_stems`()	Return a dict of morphological signatures to stems.
`signatures_to_words`()	Return a dict of morphological signatures to words.
`affixes_to_signatures`()	Return a dict of affixes to morphological signatures.
`stems_to_signatures`()	Return a dict of stems to morphological signatures.
`stems_to_words`()	Return a dict of stems to words.
`words_in_signatures`()	Return a set of words that are in at least one morphological signature.
`words_to_signatures`()	Return a dict of words to morphological signatures.
`words_to_sigtransforms`()	Return a dict of words to signature transforms.

Word manifolds and syntactic word neighborhood¶

Parameters: max_word_types, min_context_count, n_neighbors, n_eigenvectors

`words_to_neighbors`()	Return a dict of words to syntactic neighbors.
`neighbor_graph`()	Return the syntactic word neighborhood graph.
`words_to_contexts`()	Return a dict of words to contexts with counts.
`contexts_to_words`()	Return a dict of contexts to words with counts.

Phonology¶

`phone_unigram_counter`()	Return a dict of phone unigrams with counts.
`phone_bigram_counter`()	Return a dict of phone bigrams with counts.
`phone_trigram_counter`()	Return a dict of phone trigrams with counts.

Tries¶

Parameter: min_stem_length

`broken_words_left_to_right`()	Return a dict of words to their left-to-right broken form.
`broken_words_right_to_left`()	Return a dict of words to their right-to-left broken form.
`successors`()	Return a dict of word (sub)strings to their successors.
`predecessors`()	Return a dict of word (sub)strings to their predecessors.

Other methods and attributes¶

`parameters`()	Return the parameter dict.
`change_parameters`(**kwargs)	Change parameters specified by kwargs.
`use_default_parameters`()	Reset parameters to their default values.
`reset`()	Reset the Linguistica object.

class linguistica.lexicon.Lexicon(file_path=None, wordlist_file=False, corpus_object=None, wordlist_object=None, encoding='utf8', **kwargs)¶

A class for a Linguistica object.

affixes()¶

Return a set of affixes.

Return type:: set(str)

affixes_to_signatures()¶

Return a dict of affixes to morphological signatures.

Return type:: dict(str: set(tuple(str)))

biphone_dict()¶

Return a dict of phone bigrams to Biphone objects. A Biphone instance has the methods spelling(), count(), frequency(), MI(), and weighted_MI().

Return type:: dict((str, str): Biphone instance)

broken_words_left_to_right()¶

Return a dict of words to their left-to-right broken form.

Return type:: dict(str: list(str))

broken_words_right_to_left()¶

Return a dict of words to their right-to-left broken form.

Return type:: dict(str: list(str))

change_parameters(**kwargs)¶

Change parameters specified by kwargs.

Parameters:: kwargs – keyword arguments for parameters and their new values

contexts_to_words()¶

Return a dict of contexts to words with counts.

Return type:: dict(tuple(str): dict(str: int))

neighbor_graph()¶

Return the syntactic word neighborhood graph.

Return type:: networkx undirected graph

number_of_word_tokens()¶

Return the number of word tokens.

Return type:: int

number_of_word_types()¶

Return the number of word types.

Return type:: int

output_all_results(directory=None, verbose=False, test=False)¶

Output all Linguistica results to directory.

Parameters:: directory – output directory. If not specified, it defaults to the current directory given by os.getcwd().

parameters()¶

Return the parameter dict.

Return type:: dict(str: int)

phone_bigram_counter()¶

Return a dict of phone bigrams with counts.

Return type:: dict(tuple(str): int)

phone_dict()¶

Return a dict of phone unigrams to Phone objects. A Phone instance has the methods spelling(), count(), frequency(), and plog().

Return type:: dict(str: Phone instance)

phone_trigram_counter()¶

Return a dict of phone trigrams with counts.

Return type:: dict(tuple(str): int)

phone_unigram_counter()¶

Return a dict of phone unigrams with counts.

Return type:: dict(str: int)

predecessors()¶

Return a dict of word (sub)strings to their predecessors.

Return type:: dict(str: set(str))

reset()¶: Reset the Linguistica object. While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to NULL; if they are called again, they are re-computed.

run_all_modules(verbose=False)¶: Run all modules.

run_manifold_module(verbose=False)¶: Run the phon module.

run_ngram_module(verbose=False)¶: Run the ngram module.

run_phon_module(verbose=False)¶: Run the phon module.

run_signature_module(verbose=False)¶: Run the signature module.

run_trie_module(verbose=False)¶: Run the trie module.

signatures()¶

Return a set of morphological signatures.

Return type:: set(tuple(str))

signatures_to_stems()¶

Return a dict of morphological signatures to stems.

Return type:: dict(tuple(str): set(str))

signatures_to_words()¶

Return a dict of morphological signatures to words.

Return type:: dict(tuple(str): set(str))

stems()¶

Return a set of stems.

Return type:: set(str)

stems_to_signatures()¶

Return a dict of stems to morphological signatures.

Return type:: dict(str: set(tuple(str)))

stems_to_words()¶

Return a dict of stems to words.

Return type:: dict(str: set(str))

successors()¶

Return a dict of word (sub)strings to their successors.

Return type:: dict(str: set(str))

use_default_parameters()¶: Reset parameters to their default values.

word_bigram_counter()¶

Return a dict of word bigrams with their counts.

Return type:: dict(tuple(str): int)

word_phonology_dict()¶

Return a dict of words to Word objects. A Word instance has the methods spelling(), phones(), count(), frequency(), unigram_plog(), avg_unigram_plog(), bigram_plog(), and avg_bigram_plog().

Return type:: dict(str: Word instance)

word_trigram_counter()¶

Return a dict of word trigrams with their counts.

Return type:: dict(tuple(str): int)

word_unigram_counter()¶

Return a dict of words with their counts.

Return type:: dict(str: in)

wordlist()¶

Return a wordlist sorted by word frequency in descending order. (So “the” will most likely be the first word for written English.)

Return type:: list(str)

words_in_signatures()¶

Return a set of words that are in at least one morphological signature.

Return type:: set(str)

words_to_contexts()¶

Return a dict of words to contexts with counts.

Return type:: dict(str: dict(tuple(str): int))

words_to_neighbors()¶

Return a dict of words to syntactic neighbors.

Return type:: dict(word: list(str))

words_to_phones()¶

Return a dict of words with their phones.

Return type:: dict(str: list(str))

words_to_signatures()¶

Return a dict of words to morphological signatures.

Return type:: dict(str: set(tuple(str)))

words_to_sigtransforms()¶

Return a dict of words to signature transforms.

Return type:: dict(str: set(tuple(tuple(str), str))