Full API documentation

Once a Linguistica object (such as lxa_object below with the Brown corpus) is initialized, various methods and attributes are available for automatic linguistic analysis:

>>> import linguistica as lxa
>>> lxa_object = lxa.read_corpus('path/to/english-brown.txt')
>>> words = lxa_object.wordlist()  # using wordlist()

Basic information

number_of_word_tokens()

Return the number of word tokens.

number_of_word_types()

Return the number of word types.

Word ngrams

Parameter: max_word_tokens

wordlist()

Return a wordlist sorted by word frequency in descending order.

word_unigram_counter()

Return a dict of words with their counts.

word_bigram_counter()

Return a dict of word bigrams with their counts.

word_trigram_counter()

Return a dict of word trigrams with their counts.

Morphological signatures

Parameters: min_stem_length, max_affix_length, min_sig_count, suffixing

signatures()

Return a set of morphological signatures.

stems()

Return a set of stems.

affixes()

Return a set of affixes.

signatures_to_stems()

Return a dict of morphological signatures to stems.

signatures_to_words()

Return a dict of morphological signatures to words.

affixes_to_signatures()

Return a dict of affixes to morphological signatures.

stems_to_signatures()

Return a dict of stems to morphological signatures.

stems_to_words()

Return a dict of stems to words.

words_in_signatures()

Return a set of words that are in at least one morphological signature.

words_to_signatures()

Return a dict of words to morphological signatures.

words_to_sigtransforms()

Return a dict of words to signature transforms.

Word manifolds and syntactic word neighborhood

Parameters: max_word_types, min_context_count, n_neighbors, n_eigenvectors

words_to_neighbors()

Return a dict of words to syntactic neighbors.

neighbor_graph()

Return the syntactic word neighborhood graph.

words_to_contexts()

Return a dict of words to contexts with counts.

contexts_to_words()

Return a dict of contexts to words with counts.

Phonology

phone_unigram_counter()

Return a dict of phone unigrams with counts.

phone_bigram_counter()

Return a dict of phone bigrams with counts.

phone_trigram_counter()

Return a dict of phone trigrams with counts.

Tries

Parameter: min_stem_length

broken_words_left_to_right()

Return a dict of words to their left-to-right broken form.

broken_words_right_to_left()

Return a dict of words to their right-to-left broken form.

successors()

Return a dict of word (sub)strings to their successors.

predecessors()

Return a dict of word (sub)strings to their predecessors.

Other methods and attributes

parameters()

Return the parameter dict.

change_parameters(**kwargs)

Change parameters specified by kwargs.

use_default_parameters()

Reset parameters to their default values.

reset()

Reset the Linguistica object.

class linguistica.lexicon.Lexicon(file_path=None, wordlist_file=False, corpus_object=None, wordlist_object=None, encoding='utf8', **kwargs)

A class for a Linguistica object.

affixes()

Return a set of affixes.

Return type:

set(str)

affixes_to_signatures()

Return a dict of affixes to morphological signatures.

Return type:

dict(str: set(tuple(str)))

biphone_dict()

Return a dict of phone bigrams to Biphone objects. A Biphone instance has the methods spelling(), count(), frequency(), MI(), and weighted_MI().

Return type:

dict((str, str): Biphone instance)

broken_words_left_to_right()

Return a dict of words to their left-to-right broken form.

Return type:

dict(str: list(str))

broken_words_right_to_left()

Return a dict of words to their right-to-left broken form.

Return type:

dict(str: list(str))

change_parameters(**kwargs)

Change parameters specified by kwargs.

Parameters:

kwargs – keyword arguments for parameters and their new values

contexts_to_words()

Return a dict of contexts to words with counts.

Return type:

dict(tuple(str): dict(str: int))

neighbor_graph()

Return the syntactic word neighborhood graph.

Return type:

networkx undirected graph

number_of_word_tokens()

Return the number of word tokens.

Return type:

int

number_of_word_types()

Return the number of word types.

Return type:

int

output_all_results(directory=None, verbose=False, test=False)

Output all Linguistica results to directory.

Parameters:

directory – output directory. If not specified, it defaults to the current directory given by os.getcwd().

parameters()

Return the parameter dict.

Return type:

dict(str: int)

phone_bigram_counter()

Return a dict of phone bigrams with counts.

Return type:

dict(tuple(str): int)

phone_dict()

Return a dict of phone unigrams to Phone objects. A Phone instance has the methods spelling(), count(), frequency(), and plog().

Return type:

dict(str: Phone instance)

phone_trigram_counter()

Return a dict of phone trigrams with counts.

Return type:

dict(tuple(str): int)

phone_unigram_counter()

Return a dict of phone unigrams with counts.

Return type:

dict(str: int)

predecessors()

Return a dict of word (sub)strings to their predecessors.

Return type:

dict(str: set(str))

reset()

Reset the Linguistica object. While the file path information is retained, all computed objects (ngrams, signatures, word neighbors, etc) are reset to NULL; if they are called again, they are re-computed.

run_all_modules(verbose=False)

Run all modules.

run_manifold_module(verbose=False)

Run the phon module.

run_ngram_module(verbose=False)

Run the ngram module.

run_phon_module(verbose=False)

Run the phon module.

run_signature_module(verbose=False)

Run the signature module.

run_trie_module(verbose=False)

Run the trie module.

signatures()

Return a set of morphological signatures.

Return type:

set(tuple(str))

signatures_to_stems()

Return a dict of morphological signatures to stems.

Return type:

dict(tuple(str): set(str))

signatures_to_words()

Return a dict of morphological signatures to words.

Return type:

dict(tuple(str): set(str))

stems()

Return a set of stems.

Return type:

set(str)

stems_to_signatures()

Return a dict of stems to morphological signatures.

Return type:

dict(str: set(tuple(str)))

stems_to_words()

Return a dict of stems to words.

Return type:

dict(str: set(str))

successors()

Return a dict of word (sub)strings to their successors.

Return type:

dict(str: set(str))

use_default_parameters()

Reset parameters to their default values.

word_bigram_counter()

Return a dict of word bigrams with their counts.

Return type:

dict(tuple(str): int)

word_phonology_dict()

Return a dict of words to Word objects. A Word instance has the methods spelling(), phones(), count(), frequency(), unigram_plog(), avg_unigram_plog(), bigram_plog(), and avg_bigram_plog().

Return type:

dict(str: Word instance)

word_trigram_counter()

Return a dict of word trigrams with their counts.

Return type:

dict(tuple(str): int)

word_unigram_counter()

Return a dict of words with their counts.

Return type:

dict(str: in)

wordlist()

Return a wordlist sorted by word frequency in descending order. (So “the” will most likely be the first word for written English.)

Return type:

list(str)

words_in_signatures()

Return a set of words that are in at least one morphological signature.

Return type:

set(str)

words_to_contexts()

Return a dict of words to contexts with counts.

Return type:

dict(str: dict(tuple(str): int))

words_to_neighbors()

Return a dict of words to syntactic neighbors.

Return type:

dict(word: list(str))

words_to_phones()

Return a dict of words with their phones.

Return type:

dict(str: list(str))

words_to_signatures()

Return a dict of words to morphological signatures.

Return type:

dict(str: set(tuple(str)))

words_to_sigtransforms()

Return a dict of words to signature transforms.

Return type:

dict(str: set(tuple(tuple(str), str))