8.1.21. cltk.wordnet package¶
8.1.21.1. Submodules¶
8.1.21.2. cltk.wordnet.processes module¶
Process
to wrap WordNet.
8.1.21.3. cltk.wordnet.wordnet module¶
A CLTK interface for Sanskrit, Greek and Latin WordNets, built on the NLTK WordNet API The Sanskrit, Greek and Latin WordNets are lexico-semantic databases for the classical languages inspired by the Princeton WordNet for English. Most directly, these WordNets build on the framework of the Fondazione Bruno Kessler’s MultiWordNet Project. The CLTK WordNet API provides a nearly complete interface to the RESTful API provided by these services and thus provides access to the rich lexical and, especially, semantic information they contain. The WordNets share a common set of semantic descriptors (synsets) for defining the senses of words, as well as language-specific ones.
The WordNetCorpusReader class is the main entry point for getting information about lemmas, synsets, and various lexical and semantic (conceptual) relationships such as hypernymy, hyponymy, synonymy, antonymy etc. It is also possible to compute semantic similarities using several different algorithms.
>>> from cltk.wordnet.wordnet import WordNetCorpusReader
>>> LWN = WordNetCorpusReader(iso_code="lat")
>>> uirtus = LWN.lemma('uirtus')
>>> list(uirtus[0].synsets())
[Synset(pos='n', offset='05595229', gloss='feeling no fear'), Synset(pos='n', offset='04504076', gloss='a characteristic property that defines the apparent individual nature of something'), Synset(pos='n', offset='04349777', gloss='possession of the qualities (especially mental qualities) required to do something or get something done; "danger heightened his powers of discrimination"'), Synset(pos='n', offset='04549901', gloss='an ideal of personal excellence toward which a person strives'), Synset(pos='n', offset='03800378', gloss='moral excellence or admirableness'), Synset(pos='n', offset='03800842', gloss='morality with respect to sexual relations'), Synset(pos='n', offset='03805961', gloss='a quality of spirit that enables you to face danger of pain without showing fear'), Synset(pos='n', offset='03929156', gloss='strength of mind that enables one to endure adversity with courage'), Synset(pos='n', offset='03678310', gloss='the trait of being manly; having the characteristics of an adult male'), Synset(pos='n', offset='03806773', gloss='resolute courageousness'), Synset(pos='n', offset='04505328', gloss='something in which something or some one excels'), Synset(pos='n', offset='03806965', gloss='the trait of having a courageous spirit'), Synset(pos='n', offset='03655289', gloss='courageous high-spiritedness'), Synset(pos='n', offset='03808136', gloss='the trait of showing courage and determination in spite of possible loss or injury'), Synset(pos='n', offset='04003047', gloss='the quality that renders something desirable or valuable or useful'), Synset(pos='n', offset='03717355', gloss='a degree or grade of excellence or worth'), Synset(pos='n', offset='04003707', gloss='any admirable quality or attribute'), Synset(pos='n', offset='03798920', gloss='the quality of doing what is right and avoiding what is wrong'), Synset(pos='n', offset='03799068', gloss='a particular moral excellence')]
>>> LWN.synset('n#03457380')
Synset(pos='n', offset='03457380', gloss='a cutting or thrusting weapon with a long blade')
>>> from cltk.wordnet.wordnet import Synset
>>> s1 = Synset(LWN, 'lat', pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade')
>>> s2 = Synset(LWN, 'lat', pos='n', offset='03457380', gloss='a cutting or thrusting weapon with a long blade')
>>> s1.lowest_common_hypernyms(s2)
[Synset(pos='n', offset='03601056', gloss='weaponry used in fighting or hunting')]
>>> s1.shortest_path_distance(s2)
3
>>> s1.wup_similarity(s2)
0.8
- cltk.wordnet.wordnet.nesteddict()¶
- cltk.wordnet.wordnet._INF = 1e+300¶
Positive infinity (for similarity functions)
- exception cltk.wordnet.wordnet.WordNetError[source]¶
Bases:
Exception
An exception class for WordNet-related errors.
- class cltk.wordnet.wordnet._WordNetObject[source]¶
Bases:
object
A common base class for lemmas and synsets.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> sub = Lemma(LWN, lemma='sub', pos='r', morpho='rp--------', uri='37096') >>> 'super' in [lemma.lemma() for lemma in sub.antonyms()] True
>>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s1.hypernyms() [Synset(pos='n', offset='02893681', gloss='a weapon with a handle and blade with a sharp point')]
>>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s1.hyponyms() [Synset(pos='n', offset='02575932', gloss='(Scottish) a long straight-bladed dagger'), Synset(pos='n', offset='03155758', gloss='a dagger with a slender blade'), Synset(pos='n', offset='03413564', gloss='a small dagger with a tapered blade')]
>>> s1 = LWN.synset_from_pos_and_offset('n', '00510771') >>> s1.member_meronyms() [Synset(pos='n', offset='07260585', gloss='a supporter of feminism')]
>>> s1 = LWN.synset_from_pos_and_offset('n', '02335723') >>> s1.substance_meronyms() [Synset(pos='n', offset='10626993', gloss='soil that is plastic when moist but hard when fired')]
>>> s1 = LWN.synset_from_pos_and_offset('n', '00541686') >>> s1.attributes() [Synset(pos='a', offset='01151057', gloss='sexually attracted to members of the opposite sex'), Synset(pos='a', offset='01151299', gloss='sexually attracted to members of your own sex')]
>>> s1 = LWN.synset_from_pos_and_offset('n', '00077986') >>> s1.part_meronyms() [Synset(pos='n', offset='00078772', gloss='preparation for the delivery of shellfire on a target')]
>>> s1 = LWN.synset_from_pos_and_offset('v', '00107243') >>> s1.also_sees() [Synset(pos='v', offset='00293275', gloss='become looser or slack')]
>>> s1 = LWN.synset_from_pos_and_offset('v', '00001740') >>> s1.entailments() [Synset(pos='v', offset='00003142', gloss='expel air'), Synset(pos='v', offset='00003763', gloss='draw in air')]
>>> s1 = LWN.synset_from_pos_and_offset('v', '00014590') >>> s1.causes() [Synset(pos='v', offset='00009805', gloss='be asleep')]
>>> s1 = LWN.synset_from_pos_and_offset('v', '00051515') >>> s1.verb_groups() [Synset(pos='v', offset='00050470', gloss='eliminate urine')]
>>> s1 = LWN.synset_from_pos_and_offset('n', 'L9083855') >>> s1.nearest() [Synset(pos='n', offset='03543592', gloss='ship for transporting troops')]
- class cltk.wordnet.wordnet.Lemma(wordnet_corpus_reader, lemma, pos, morpho, uri, **kwargs)[source]¶
Bases:
_WordNetObject
The lexical entry for a single morphological form of a sense-disambiguated word. Create a Lemma from lemma, pos, and morpho, or uri parameters where: <lemma> is the morphological form identifying the lemma <pos> is one of the module attributes ‘n’, ‘v’, ‘a’ or ‘r’ <morpho> is the morphological descriptor <uri> is the URI
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> animus = Lemma(LWN, lemma='animus', pos='n', morpho='n-s---mn2-', uri='a2046') >>> print(animus) Lemma(lemma='animus', pos='n', morpho='n-s---mn2-', uri='a2046') >>> virtus = Lemma(LWN, lemma='uirtus', pos='n', morpho='n-s---fn3-', uri='u0800') >>> print(virtus) Lemma(lemma='uirtus', pos='n', morpho='n-s---fn3-', uri='u0800')
Lemma attributes, accessible via methods with the same name: - lemma: The canonical form of this lemma - synsets: The synsets that this lemma belongs to - literal: The synsets that this lemma belongs to in virtue of its literal senses - metonymic: The synsets that this lemma belongs to in virtue of its metonymic senses - metaphoric: The synsets that this lemma belongs to in virtue of its metaphoric senses - count: The frequency of this lemma in the WordNet, i.e., the number of synsets (literal, metonymic, or metaphoric) to which it belongs Lemma methods: Lemmas have the following methods for retrieving related Lemmas. They correspond to the names for the pointer symbols defined here: https://wordnet.princeton.edu/documentation/wninput5wn These methods all return lists of Lemmas: - antonyms - hypernyms - hyponyms - member_holonyms, substance_holonyms, part_holonyms - member_meronyms, substance_meronyms, part_meronyms - attributes - derivationally_related_forms - entailments - causes - also_sees - verb_groups - similar_tos - pertainyms
- uri()[source]¶
URI.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> metus = Lemma(LWN, lemma='metus', pos='n', morpho='n-s---mn4-', uri='m0918') >>> metus.uri() 'm0918'
- lemma()[source]¶
Lemma.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> metus = Lemma(LWN, lemma='metus', pos='n', morpho='n-s---mn4-', uri='m0918') >>> metus.lemma() 'metus'
- pos()[source]¶
POS.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> metus = Lemma(LWN, lemma='metus', pos='n', morpho='n-s---mn4-', uri='m0918') >>> metus.pos() 'n'
- morpho()[source]¶
Morpho.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> metus = Lemma(LWN, lemma='metus', pos='n', morpho='n-s---mn4-', uri='m0918') >>> metus.morpho() 'n-s---mn4-'
- synsets()[source]¶
Retrieve all synsets for the lemma.
- Returns:
A generator of Synset objects.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> virtus = LWN.lemmas_from_uri('u0800')[0] >>> synset = list(virtus.synsets())[0] >>> print(synset.gloss()) feeling no fear
- literal()[source]¶
Retrieve all literal senses of the lemma.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> virtus = LWN.lemmas_from_uri('u0800')[0] >>> list(virtus.literal()) [Synset(pos='n', offset='05595229', gloss='feeling no fear'), Synset(pos='n', offset='04504076', gloss='a characteristic property that defines the apparent individual nature of something'), Synset(pos='n', offset='04349777', gloss='possession of the qualities (especially mental qualities) required to do something or get something done; "danger heightened his powers of discrimination"'), Synset(pos='n', offset='04549901', gloss='an ideal of personal excellence toward which a person strives'), Synset(pos='n', offset='03800378', gloss='moral excellence or admirableness'), Synset(pos='n', offset='03800842', gloss='morality with respect to sexual relations'), Synset(pos='n', offset='03805961', gloss='a quality of spirit that enables you to face danger of pain without showing fear'), Synset(pos='n', offset='03929156', gloss='strength of mind that enables one to endure adversity with courage'), Synset(pos='n', offset='03678310', gloss='the trait of being manly; having the characteristics of an adult male'), Synset(pos='n', offset='03806773', gloss='resolute courageousness'), Synset(pos='n', offset='04505328', gloss='something in which something or some one excels'), Synset(pos='n', offset='03806965', gloss='the trait of having a courageous spirit'), Synset(pos='n', offset='03655289', gloss='courageous high-spiritedness'), Synset(pos='n', offset='03808136', gloss='the trait of showing courage and determination in spite of possible loss or injury'), Synset(pos='n', offset='04003047', gloss='the quality that renders something desirable or valuable or useful'), Synset(pos='n', offset='03717355', gloss='a degree or grade of excellence or worth'), Synset(pos='n', offset='04003707', gloss='any admirable quality or attribute'), Synset(pos='n', offset='03798920', gloss='the quality of doing what is right and avoiding what is wrong'), Synset(pos='n', offset='03799068', gloss='a particular moral excellence')]
- metonymic()[source]¶
Retrieve all metonymic senses of the lemma.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> baculum = LWN.lemma('baculum', 'n', 'n-s---nn2-') >>> list(baculum[0].metonymic()) [Synset(pos='n', offset='02327416', gloss='a support that steadies or strengthens something else'), Synset(pos='n', offset='02531456', gloss='used as a weapon'), Synset(pos='n', offset='03444976', gloss='any device that bears the weight of another thing')]
- metaphoric()[source]¶
Retrieve all metaphoric senses of the lemma.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> baculum = LWN.lemma('baculum', 'n', 'n-s---nn2-') >>> list(baculum[0].metaphoric()) [Synset(pos='n', offset='04399253', gloss='something providing immaterial support or assistance to a person or cause or interest')]
Retrieve lemmas having the given relation type to this lemma.
- Parameters:
relation_symbol – Symbol for the lexical or semantic relation
- Returns:
A list of Lemma objects
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> baculum = LWN.lemma('baculum', 'n', 'n-s---nn2-') >>> list(baculum.related('/')) [Lemma(lemma='bacillum', pos='n', morpho='n-s---nn2-', uri='b0028'), Lemma(lemma='imbecillus', pos='a', morpho='aps---mn1-', uri='i0301')]
derivationally_related_forms.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> abalienatio = LWN.lemma('abalienatio', 'n', 'n-s---fn3-') >>> abalienatio [Lemma(lemma='abalienatio', pos='n', morpho='n-s---fn3-', uri='a0014')]
>>> sorted(abalienatio[0].derivationally_related_forms()) [Lemma(lemma='abalienatus', pos='a', morpho='aps---mn1-', uri='53399'), Lemma(lemma='abalieno', pos='v', morpho='v1spia--1-', uri='a0015')]
- pertainyms()[source]¶
Pertainyms.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> abalienatio = LWN.lemma('abalienatio', 'n', 'n-s---fn3-') >>> abalienatio [Lemma(lemma='abalienatio', pos='n', morpho='n-s---fn3-', uri='a0014')]
>>> list(abalienatio[0].pertainyms()) [Lemma(lemma='abalieno', pos='v', morpho='v1spia--1-', uri='a0015'), Lemma(lemma='ab', pos='p', morpho='p---------', uri='a0001')]
- composed_of()[source]¶
Composed of.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> evoco = LWN.lemma('euoco', 'v', 'v1spia--1-') >>> list(evoco[0].composed_of()) [Lemma(lemma='uoco', pos='v', morpho='v1spia--1-', uri='u1152'), Lemma(lemma='ex', pos='p', morpho='p---------', uri='e1167')]
- composes()[source]¶
Composes.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> voco = LWN.lemma('uoco', 'v', 'v1spia--1-') >>> list(voco[0].composes()) [Lemma(lemma='euoco', pos='v', morpho='v1spia--1-', uri='e1117'), Lemma(lemma='conuoco', pos='v', morpho='v1spia--1-', uri='c3931'), Lemma(lemma='prouoco', pos='v', morpho='v1spia--1-', uri='p4232'), Lemma(lemma='inuoco', pos='v', morpho='v1spia--1-', uri='i2733'), Lemma(lemma='reuoco', pos='v', morpho='v1spia--1-', uri='r1447')]
- class cltk.wordnet.wordnet.Semfield(wordnet_corpus_reader, code, english=None)[source]¶
Bases:
object
Create a Semfield from code and english parameters where: <code> is the semfield’s DDCS code <english> is the semfield’s DDCS descriptor A semfield (semantic field) defines a broad conceptual domain that includes many synsets. The Latin WordNet uses the Dewey Decimal Classification System as a topic index and hierarchy.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> anatomy = Semfield(LWN, '611', "Human Anatomy, Cytology & Histology")
- synsets()[source]¶
Retrieve all synsets of the semfield.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> anatomy = Semfield(LWN, '611', "Human anatomy, cytology & histology") >>> fat = LWN.synset('n#04089143') >>> print(fat in list(anatomy.synsets())) True
- lemmas()[source]¶
Retrieve all lemmas for all synsets of the semfield.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> anatomy = Semfield(LWN, '611', "Human anatomy, cytology & histology") >>> list(anatomy.lemmas())[0] Lemma(lemma='autopsia', pos='n', morpho='n-s---fn1-', uri='50882')
- hypers()[source]¶
Retrieve all superordinate semfields of the semfield.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> anatomy = Semfield(LWN, '611', "Human anatomy, cytology & histology") >>> print(list(anatomy.hypers())) [Semfield(code='610', english='Medicine & Health')]
- hypons()[source]¶
Retrieve all subordinate semfields of the semfield.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> medicine = Semfield(LWN, '610', "Medicine & Health") >>> print(list(medicine.hypons())) [Semfield(code='610', english='Medicine & health'), Semfield(code='611', english='Human anatomy, cytology & histology'), Semfield(code='612', english='Human Physiology'), Semfield(code='613', english='Personal Health & Safety'), Semfield(code='614', english='Incidence & prevention of disease'), Semfield(code='615', english='Pharmacology & therapeutics'), Semfield(code='616', english='Diseases'), Semfield(code='617', english='Surgery & Related Medical Specialties'), Semfield(code='618', english='Gynecology, Obstetrics, Pediatrics & Geriatrics')]
- class cltk.wordnet.wordnet.Synset(wordnet_corpus_reader, language, pos, offset, gloss, semfield=None)[source]¶
Bases:
_WordNetObject
Create a Synset from pos, offset and gloss parameters where:
<pos> is the synset’s part of speech <offset> is the offset ID of the synset <gloss> is the synset’s gloss
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> print(s1.id()) n#02542418
Synset attributes, accessible via methods with the same name: - pos: The synset’s part of speech, ‘n’, ‘v’, ‘a’, or ‘r’ - offset: The unique offset ID of the synset - lemmas: A list of the Lemma objects for this synset - gloss: The gloss for this synset Synset methods: Synsets have the following methods for retrieving related Synsets. They correspond to the names for the pointer symbols defined here: https://wordnet.princeton.edu/documentation/wninput5wn These methods all return lists of Synsets. - hypernyms - hyponyms - member_holonyms, substance_holonyms, part_holonyms - member_meronyms, substance_meronyms, part_meronyms - attributes - entailments - causes - also_sees - verb_groups - similar_tos - nearest Additionally, Synsets support the following methods specific to the hypernym relation: - root_hypernyms - common_hypernyms - lowest_common_hypernyms Note that Synsets do not support the following relations because these are defined by WordNet as lexical relations: - derivationally_related_forms - pertainyms - composed_of - composes - participle
- semfields()[source]¶
Retrieve the synset’s semfields.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('n', 'L6992236') >>> list(s1.semfields()) [Semfield(code='150', english='Psychology')]
- sentiment()[source]¶
Retrieve sentiment scores for the synset.
- Returns:
A dict including the synset’s positivity, negativity, and objectivity scores (-1 to 1).
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('v', '01215448') >>> s1.sentiment() {'positivity': 0.0, 'negativity': 0.625, 'objectivity': 0.375}
- positivity()[source]¶
Positivity.
- Returns:
An integer value representing the synset’s positivity score.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('v', '01215448') >>> s1.positivity() 0.0
- negativity()[source]¶
Negativity.
- Returns:
An integer value representing the synset’s negativity score.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('v', '01215448') >>> s1.negativity() 0.625
- objectivity()[source]¶
Objectivity.
- Returns:
An integer value representing the synset’s objectivity score.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('v', '01215448') >>> s1.objectivity() 0.375
- examples()[source]¶
Retrieve examples of any lemma instantiating this synset.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('n', '04399253') >>> print(s1.examples()[0]) {'lemma': {'lemma': 'baculum', 'pos': 'n', 'morpho': 'n-s---nn2-', 'uri': 'b0034', 'prosody': 'baculum'}, 'author_abbr': 'Vulg', 'work_abbr': 'Tob', 'reference': '10.4', 'text': 'baculum senectutis nostrae'}
- lemmas()[source]¶
Return all the Lemma objects associated with the synset.
- Returns:
A generator of Lemma objects.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> for lemma in sorted(set(s1.lemmas())): ... print(lemma.lemma()) clunaculum gladiolus parazonium pugio pugiunculus sica sicula
- root_hypernyms()[source]¶
et the topmost hypernyms of this synset in WordNet.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s1.root_hypernyms() [Synset(pos='n', offset='00001740', gloss='anything having existence (living or nonliving)')]
- max_depth()[source]¶
Get the length of the longest hypernym path from this synset to the root.
- Returns:
An integer value representing the maximum path length to the root.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s1.max_depth() 7
- min_depth()[source]¶
Get min depth. :return: The length of the shortest hypernym path from this synset to the root. >>> LWN = WordNetCorpusReader(iso_code=”lat”) >>> s1 = Synset(LWN, None, pos=’n’, offset=’02542418’, gloss=’a short stabbing weapon with a pointed blade’) >>> s1.min_depth() 7
- closure(rel, depth=-1)[source]¶
Return the transitive closure of the synset under the rel relationship, breadth-first.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> hypers = lambda s: s.hypernyms() >>> list(s1.closure(hypers)) [Synset(pos='n', offset='02893681', gloss='a weapon with a handle and blade with a sharp point'), Synset(pos='n', offset='03601056', gloss='weaponry used in fighting or hunting'), Synset(pos='n', offset='03601456', gloss='weapons considered collectively'), Synset(pos='n', offset='02859872', gloss='an artifact (or system of artifacts) that is instrumental in accomplishing some end'), Synset(pos='n', offset='00011937', gloss='a man-made object'), Synset(pos='n', offset='00009457', gloss='a physical (tangible and visible) entity'), Synset(pos='n', offset='00001740', gloss='anything having existence (living or nonliving)')]
- hypernym_paths()[source]¶
Get the path(s) from this synset to the root, where each path is a list of the synset nodes traversed on the way to the root. :return: A list of lists, where each list gives the node sequence connecting the initial
Synset
node and a root node.>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s1.hypernym_paths() [[Synset(pos='n', offset='00001740', gloss='anything having existence (living or nonliving)'), Synset(pos='n', offset='00009457', gloss='a physical (tangible and visible) entity'), Synset(pos='n', offset='00011937', gloss='a man-made object'), Synset(pos='n', offset='02859872', gloss='an artifact (or system of artifacts) that is instrumental in accomplishing some end'), Synset(pos='n', offset='03601456', gloss='weapons considered collectively'), Synset(pos='n', offset='03601056', gloss='weaponry used in fighting or hunting'), Synset(pos='n', offset='02893681', gloss='a weapon with a handle and blade with a sharp point'), Synset(pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade')]]
- common_hypernyms(other)[source]¶
Find all synsets that are hypernyms of this synset and the other synset. :type other: Synset :param other: other input synset. :return: The synsets that are hypernyms of both synsets.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s2 = Synset(LWN, None, pos='n', offset='03457380', gloss='a cutting or thrusting weapon with a long blade') >>> sorted(s1.common_hypernyms(s2)) [Synset(pos='n', offset='00001740', gloss='anything having existence (living or nonliving)'), Synset(pos='n', offset='00009457', gloss='a physical (tangible and visible) entity'), Synset(pos='n', offset='00011937', gloss='a man-made object'), Synset(pos='n', offset='02859872', gloss='an artifact (or system of artifacts) that is instrumental in accomplishing some end'), Synset(pos='n', offset='03601056', gloss='weaponry used in fighting or hunting'), Synset(pos='n', offset='03601456', gloss='weapons considered collectively')]
- lowest_common_hypernyms(other, simulate_root=False, use_min_depth=False)[source]¶
Get a list of lowest synset(s) that both synsets have as a hypernym. When use_min_depth == False this means that the synset which appears as a hypernym of both self and other with the lowest maximum depth is returned or if there are multiple such synsets at the same depth they are all returned However, if use_min_depth == True then the synset(s) which has/have the lowest minimum depth and appear(s) in both paths is/are returned. :type other: Synset :param other: other input synset :type simulate_root: bool :param simulate_root: The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (False by default) creates a fake root that connects all the taxonomies. Set it to True to enable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will need to be added for nouns as well. :type use_min_depth: bool :param use_min_depth: This setting mimics older (v2) behavior of NLTK wordnet If True, will use the min_depth function to calculate the lowest common hypernyms. This is known to give strange results for some synset pairs (eg: ‘chef.n.01’, ‘fireman.n.01’) but is retained for backwards compatibility :return: The synsets that are the lowest common hypernyms of both synsets
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s2 = Synset(LWN, None, pos='n', offset='03457380', gloss='a cutting or thrusting weapon with a long blade') >>> s1.lowest_common_hypernyms(s2) [Synset(pos='n', offset='03601056', gloss='weaponry used in fighting or hunting')]
- hypernym_distances(distance=0, simulate_root=False)[source]¶
Get the path(s) from this synset to the root, counting the distance of each node from the initial node on the way. A set of (synset, distance) tuples is returned. :type distance: int :param distance: the distance (number of edges) from this hypernym to the original hypernym
Synset
on which this method was called. :return: A set of(Synset, int)
tuples where eachSynset
is a hypernym of the firstSynset
.>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> sorted(s1.hypernym_distances()) [(Synset(pos='n', offset='00001740', gloss='anything having existence (living or nonliving)'), 7), (Synset(pos='n', offset='00009457', gloss='a physical (tangible and visible) entity'), 6), (Synset(pos='n', offset='00011937', gloss='a man-made object'), 5), (Synset(pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade'), 0), (Synset(pos='n', offset='02859872', gloss='an artifact (or system of artifacts) that is instrumental in accomplishing some end'), 4), (Synset(pos='n', offset='02893681', gloss='a weapon with a handle and blade with a sharp point'), 1), (Synset(pos='n', offset='03601056', gloss='weaponry used in fighting or hunting'), 2), (Synset(pos='n', offset='03601456', gloss='weapons considered collectively'), 3)]
- shortest_path_distance(other, simulate_root=False)[source]¶
Returns the distance of the shortest path linking the two synsets (if one exists). For each synset, all the ancestor nodes and their distances are recorded and compared. The ancestor node common to both synsets that can be reached with the minimum number of traversals is used. If no ancestor nodes are common, None is returned. If a node is compared with itself 0 is returned. :type other: Synset :param other: The Synset to which the shortest path will be found. :return: The number of edges in the shortest path connecting the two nodes, or None if no path exists.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s2 = Synset(LWN, None, pos='n', offset='03457380', gloss='a cutting or thrusting weapon with a long blade') >>> s1.shortest_path_distance(s2) 3
- tree(rel, depth=-1, cut_mark=None)[source]¶
Generate a tree-like list structure for rel relationship of this synset. :param rel: A function returning the relations of a certain kind of this synset. :param depth: :param cut_mark: An object used to indicate where a branch has been truncated. :return: A list of lists.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset(pos='n', offset='01595188') >>> hypers = lambda s: s.hypernyms() >>> s1.tree(hypers) [Synset(pos='n', offset='01595188', gloss='a member of the genus Canis (probably descended from the common wolf) that has been domesticated by man since prehistoric times; occurs in many breeds; "the dog barked all night"'), [Synset(pos='n', offset='01594481', gloss='any of various fissiped mammals with nonretractile claws and typically long muzzles'), [Synset(pos='n', offset='01586585', gloss='terrestrial or aquatic flesh-eating mammal; terrestrial carnivores have four or five clawed digits on each limb'), [Synset(pos='n', offset='01402712', gloss='mammals having a placenta; all mammals except monotremes and marsupials'), [Synset(pos='n', offset='01378363', gloss='any warm-blooded vertebrate having the skin more or less covered with hair; young are born alive except for the small subclass of monotremes and nourished with milk'), [Synset(pos='n', offset='00995974', gloss='animals having a bony or cartilaginous skeleton with a segmented spinal column and a large brain enclosed in a skull or cranium'), [Synset(pos='n', offset='00990770', gloss='any animal of the phylum Chordata having a notochord or spinal column'), [Synset(pos='n', offset='00008019', gloss='a living organism characterized by voluntary movement'), [Synset(pos='n', offset='00002086', gloss='any living entity'), [Synset(pos='n', offset='00001740', gloss='anything having existence (living or nonliving)')]]]]]]]]]]
- path_similarity(other, verbose=False, simulate_root=True)[source]¶
Path Distance Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses in the is-a (hypernym/hypnoym) taxonomy. The score is in the range 0 to 1, except in those cases where a path cannot be found (will only be true for verbs as there are many distinct verb taxonomies), in which case None is returned. A score of 1 represents identity i.e. comparing a sense with itself will return 1. :type other: Synset :param other: The
Synset
that thisSynset
is being compared to. :type simulate_root: bool :param simulate_root: The various verb taxonomies do not synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. For the noun taxonomy, there is usually a default root except for WordNet version 1.6. If you are using wordnet 1.6, a fake root will be added for nouns as well. :return: A score denoting the similarity of the twoSynset
objects, normally between 0 and 1. None is returned if no connecting path could be found. 1 is returned if aSynset
is compared with itself.>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s2 = Synset(LWN, None, pos='n', offset='03457380', gloss='a cutting or thrusting weapon with a long blade') >>> s1.path_similarity(s2) 0.25
- _lcs_ic(other, icreader, verbose=False)[source]¶
Get the information content of the least common subsumer that has the highest information content value. If two nodes have no explicit common subsumer, assume that they share an artificial root node that is the hypernym of all explicit roots. :type synset1: Synset :param synset1: First input synset. :type synset2: Synset :param synset2: Second input synset. Must be the same part of speech as the first synset. :type ic: WordNetICCorpusReader :param ic: an information content reader object :return: The information content of the two synsets and their most informative subsumer
- lch_similarity(other, verbose=False, simulate_root=True)[source]¶
Leacock Chodorow Similarity: Return a score denoting how similar two word senses are, based on the shortest path that connects the senses (as above) and the maximum depth of the taxonomy in which the senses occur. The relationship is given as -log(p/2d) where p is the shortest path length and d is the taxonomy depth. Because this metric must compute the max depth of the entire synset taxonomy, it can be very slow! :type other: Synset :param other: The
Synset
that thisSynset
is being compared to. :type simulate_root: bool :param simulate_root: The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. :return: A score denoting the similarity of the twoSynset
objects, normally greater than 0. None is returned if no connecting path could be found. If aSynset
is compared with itself, the maximum score is returned, which varies depending on the taxonomy depth.
- wup_similarity(other, verbose=False, simulate_root=True)[source]¶
Wu-Palmer Similarity: Return a score denoting how similar two word senses are, based on the depth of the two senses in the taxonomy and that of their Least Common Subsumer (most specific ancestor node). Previously, the scores computed by this implementation did _not_ always agree with those given by Pedersen’s Perl implementation of WordNet Similarity. However, with the addition of the simulate_root flag (see below), the score for verbs now almost always agree but not always for nouns. The LCS does not necessarily feature in the shortest path connecting the two senses, as it is by gloss the common ancestor deepest in the taxonomy, not closest to the two senses. Typically, however, it will so feature. Where multiple candidates for the LCS exist, that whose shortest path to the root node is the longest will be selected. Where the LCS has multiple paths to the root, the longer path is used for the purposes of the calculation. :type other: Synset :param other: The
Synset
that thisSynset
is being compared to. :type simulate_root: bool :param simulate_root: The various verb taxonomies do not share a single root which disallows this metric from working for synsets that are not connected. This flag (True by default) creates a fake root that connects all the taxonomies. Set it to false to disable this behavior. :return: A float score denoting the similarity of the twoSynset
objects, normally greater than zero. If no connecting path between the two senses can be found, None is returned.>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = Synset(LWN, None, pos='n', offset='02542418', gloss='a short stabbing weapon with a pointed blade') >>> s2 = Synset(LWN, None, pos='n', offset='03457380', gloss='a cutting or thrusting weapon with a long blade') >>> s1.wup_similarity(s2) 0.8
- res_similarity(other, icreader, verbose=False)[source]¶
Resnik Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node). :type other: Synset :param other: The
Synset
that thisSynset
is being compared to. :type ic: WordNetICCorpusReader :param ic: an information content reader :return: A float score denoting the similarity of the twoSynset
objects. Synsets whose LCS is the root node of the taxonomy will have a score of 0 (e.g. N[‘dog’][0] and N[‘table’][0]).>>> from cltk.wordnet.wordnet import WordNetCorpusReader, WordNetICCorpusReader >>> LASLA_IC = WordNetICCorpusReader(iso_code="lat", fileids=['ic-lasla.dat']) >>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('n', '02542418') >>> s2 = LWN.synset_from_pos_and_offset('n', '03457380') >>> s1.res_similarity(s2, LASLA_IC) 6.056495670686355
- jcn_similarity(other, icreader, verbose=False)[source]¶
Jiang-Conrath Similarity: Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 1 / (IC(s1) + IC(s2) - 2 * IC(lcs)). :type other: Synset :param other: The
Synset
that thisSynset
is being compared to. :type ic: WordNetICCorpusReader :param ic: an information content reader :return: A float score denoting the similarity of the twoSynset
objects.>>> from cltk.wordnet.wordnet import WordNetCorpusReader, WordNetICCorpusReader >>> LASLA_IC = WordNetICCorpusReader(iso_code='lat', fileids=['ic-lasla.dat']) >>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('n', '02542418') >>> s2 = LWN.synset_from_pos_and_offset('n', '03457380') >>> s1.jcn_similarity(s2, LASLA_IC) 0.23789011550933925
- lin_similarity(other, icreader, verbose=False)[source]¶
Lin Similarity. Return a score denoting how similar two word senses are, based on the Information Content (IC) of the Least Common Subsumer (most specific ancestor node) and that of the two input Synsets. The relationship is given by the equation 2 * IC(lcs) / (IC(s1) + IC(s2)). :type other: Synset :param other: The
Synset
that thisSynset
is being compared to. :type ic: WordNetICCorpusReader :param ic: an information content reader :return: A float score denoting the similarity of the twoSynset
objects, in the range 0 to 1.>>> from cltk.wordnet.wordnet import WordNetCorpusReader, WordNetICCorpusReader >>> LASLA_IC = WordNetICCorpusReader(iso_code="lat", fileids=['ic-lasla.dat']) >>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('n', '02542418') >>> s2 = LWN.synset_from_pos_and_offset('n', '03457380') >>> s1.lin_similarity(s2, LASLA_IC) 0.7423716841366877
- _iter_hypernym_lists()[source]¶
Get hypernyms. :return: An iterator over
Synset
objects that are either proper hypernyms or instance of hypernyms of the synset.
Get related.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> s1 = LWN.synset_from_pos_and_offset('v', '01215448') >>> s1.related('~') [Synset(pos='v', offset='01217265', gloss='feel panic')]
- class cltk.wordnet.wordnet.WordNetCorpusReader(iso_code, ignore_errors=False)[source]¶
Bases:
CorpusReader
A corpus reader used to access a WordNet. :param iso_code: The ISO code for one of the languages providing a WordNet API
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> animus = LWN.lemma('animus', 'n', 'n-s---mn2-') >>> print(animus) [Lemma(lemma='animus', pos='n', morpho='n-s---mn2-', uri='a2046')] >>> dico = LWN.lemmas('dico', 'v') >>> print(sorted(list(dico), key=lambda x: x.uri())) [Lemma(lemma='dico', pos='v', morpho='v1spia--1-', uri='d1349'), Lemma(lemma='dico', pos='v', morpho='v1spia--3-', uri='d1350')] >>> virtus = LWN.lemmas_from_uri('u0800') >>> print(virtus) [Lemma(lemma='uirtus', pos='n', morpho='n-s---fn3-', uri='u0800')] >>> courage = LWN.synset('n#03805961') >>> print(courage) Synset(pos='n', offset='03805961', gloss='a quality of spirit that enables you to face danger of pain without showing fear') >>> adverbs = LWN.synsets('r') >>> print(len(list(adverbs)) > 3600) True
- _compute_max_depth(pos, simulate_root)[source]¶
Compute the max depth for the given part of speech. This is used by the lch similarity metric.
- lemma(lemma, pos='', morpho='', return_ambiguous=True)[source]¶
Takes
lemma
and finds matching headword.If
pos
ormorph
is provided, the results found throughlemma
alone are filtered.pos
tags are in the formn
for noun,v
for verb,a
for adjective,r
for adverb.If
return_ambiguous
isFalse
, only the first matching lemma is returned as a single-element list. IfTrue
, (default) all the matching lemmas will be returned.>>> LWN = WordNetCorpusReader(iso_code="lat") >>> LWN.lemma('baculum') [Lemma(lemma='baculum', pos='n', morpho='n-s---nn2-', uri='b0034')]
- lemma_from_uri(uri)[source]¶
Get lemma from URI.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> LWN.lemma_from_uri('b0034') Lemma(lemma='baculum', pos='n', morpho='n-s---nn2-', uri='b0034')
- semfield(code, english)[source]¶
Semfield.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> LWN.semfield('910', 'Geography & travel') Semfield(code='910', english='Geography & travel')
- synset(id)[source]¶
Get synset.
- Parameters:
id – Synset id, consisting of POS and offset separated by ‘#’
- Returns:
Synset object
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> LWN.synset('r#L2556264') Synset(pos='r', offset='L2556264', gloss='in the manner of a woman')
- synset_from_pos_and_offset(pos, offset)[source]¶
Get synset from pos.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> LWN.synset_from_pos_and_offset('r', 'L2556264') Synset(pos='r', offset='L2556264', gloss='in the manner of a woman')
- lemmas(lemma=None, pos=None, morpho=None)[source]¶
Return all Lemma objects with a name matching the specified lemma name, part of speech tag or morphological descriptor.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> sorted(list(LWN.lemmas('dico', 'v')), key=lambda x: x.uri()) [Lemma(lemma='dico', pos='v', morpho='v1spia--1-', uri='d1349'), Lemma(lemma='dico', pos='v', morpho='v1spia--3-', uri='d1350')]
- lemmas_from_uri(uri)[source]¶
Get lemmas from URI.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> list(sorted(LWN.lemmas_from_uri('f1052'))) [Lemma(lemma='frumentaria', pos='n', morpho='n-s---fn1-', uri='f1052'), Lemma(lemma='frumentarius', pos='n', morpho='n-s---mn2-', uri='f1052'), Lemma(lemma='frumentarius', pos='a', morpho='aps---mn1-', uri='f1052')]
- synsets(pos=None)[source]¶
Load all synsets for a given part of speech, if specified.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> len(list(LWN.synsets('r'))) > 3000 True
- semfields(code=None)[source]¶
Load all semfields for a given code, if specified.
>>> LWN = WordNetCorpusReader(iso_code="lat") >>> list(LWN.semfields('300')) [Semfield(code='300', english='Social Sciences'), Semfield(code='300', english='Social Sciences, Sociology & Anthropology'), Semfield(code='300', english='Social sciences')]
- lemmatize(form, morpho=None)[source]¶
Lemmatizes a word form. :type form:
str
:param form: The form to lemmatize, as a string :type morpho:str
:param morpho: Optional 10-place morphological descriptor, used as a filter :return: A list of matching Lemma objects>>> LWN = WordNetCorpusReader(iso_code="lat") >>> print(list(LWN.lemmatize('pumice'))) [Lemma(lemma='pumex', pos='n', morpho='n-s---cn3-', uri='p4512')]
- translate(language, form, pos='*')[source]¶
Translates an English, French, Spanish, or Italian word into Latin. :type language:
str
:param language: ‘en’, ‘fr’, ‘es’, ‘it’ indicating the source language :type form:str
:param form: The word to translate :type pos:str
:param pos: Optionally, a part-of-speech (‘n’, ‘v’, ‘a’, ‘r’) indicator used as a filter :return: A list of Lemma objects>>> LWN = WordNetCorpusReader(iso_code="lat") >>> offspring_translations = list(LWN.translate('en', 'offspring')) >>> print('pusio' in [lemma.lemma() for lemma in offspring_translations]) True
- class cltk.wordnet.wordnet.WordNetICCorpusReader(iso_code, root=None, fileids=None)[source]¶
Bases:
CorpusReader
A corpus reader for the WordNet information content corpus. :param root: The root directory where the information content file is stored. :param fileids: A list of file names, relative to the root directory, in this case a single file containing information content for a corpus.
>>> from cltk.wordnet.wordnet import WordNetICCorpusReader >>> LWNIC = WordNetICCorpusReader(iso_code='lat', fileids=['ic-lasla.dat'])
- create_ic(iso_code, corpus, weight_senses_equally=False, smoothing=1.0)[source]¶
Creates an information content lookup dictionary from a corpus. :type corpus: CorpusReader :param corpus: The corpus from which we create an information content dictionary. :type weight_senses_equally: bool :param weight_senses_equally: If this is True, gives all possible senses equal weight rather than dividing by the number of possible senses. (If a word has 3 synses, each sense gets 0.3333 per appearance when this is False, 1.0 when it is true.) :param smoothing: How much do we smooth synset counts (default is 1.0) :type smoothing: float :return: An information content dictionary
- load_ic(icfile=None)[source]¶
Load an information content file and return a dictionary whose keys are POS types and whose values are dictionaries that map from synsets to information content values. :type icfile: str :param icfile: The name of the wordnet_ic file (e.g. “ic-latin_library.dat”) :return: An information content dictionary
>>> from cltk.wordnet.wordnet import WordNetICCorpusReader >>> LWNIC = WordNetICCorpusReader(iso_code="lat") >>> LWNIC.load_ic('ic-lasla.dat')
- information_content(synset)[source]¶
Retrieve the information content score for a synset.
>>> from cltk.wordnet.wordnet import WordNetCorpusReader, WordNetICCorpusReader >>> LWN = WordNetCorpusReader(iso_code="lat") >>> LWNIC = WordNetICCorpusReader(iso_code="lat", fileids=['ic-lasla.dat']) >>> s = LWN.synset_from_pos_and_offset('n', '02542418') >>> LWNIC.information_content(s) 9.256474058450094