WoNeF documentation
Papers
Quentin Pradet, Gaël de Chalendar and Jeanne Baguenier Desormeaux. January 2014. WoNeF, an improved, expanded and evaluated automatic French translation of WordNet. GWC 2014, Tartu, Estonia. [bib] [pdf] [html]
Quentin Pradet, Jeanne Baguenier-Desormeaux, Gaël de Chalendar et Laurence Danlos. Juin 2013. WoNeF : amélioration, extension et évaluation d’une traduction française automatique de WordNet. TALN 2013, Les Sables d'Olonne, France. [bib] [pdf]
Versions
- The high precision version is small but can serve as a test set or training set for other systems.
- The high F-score version is the version you should choose by default.
- The high coverage version is more complete but also noisier: use it if you can reduce that noise first.
Evaluation
Our gold standard contains 300 synsets for each evaluated part-of-speech (nouns, verbs and adjectives). Two annotators produced a gold standard and agreed on the differences to produce the final gold standard.
Our gold standard is included in WoNeF: all literals obtained with "wonef-gold" (in the "lnote" XML field).
Rules
The improved edit distance used a few rules to shorten the distance between words that share the same origin in French and English:
- que$ -> k (banque -> bank, casque -> cask, disque -> disk)
- aire$ -> ary (tertiaire -> tertiairy)
- eur$ -> or (chercheur -> chearchor)
- ie$ -> y (cajolerie -> cajolery)
- té$ -> ty (extremité -> extremity)
- re$ -> er (ordre -> order, tigre -> tiger)
- ais$ -> ese, ois$ -> ese (libanais -> lebanese, chinois -> chinese)
- ant$ -> ing (changeant -> changeing)
- er$ -> "" (documenter -> document)
- osis$ -> ose (osmose -> osmose)
- ment$ -> ly (confortablement -> confortably)
Glossary
- Synset
- A synset is a sense defined with a definition all literals possibly caring this sense (synonym set). Polysemous words are represented in multiple synsets. All synsets are linked together with various relations such as hyponymy (parrot vs. bird).
- Literal
- The lemma of a noun, verb, adjective or adverb in WoNeF.
- Part meronymy
- A relation between two synsets where one is part of the whole : a porcupine's quill, a bike's pedal.
- wordnet
- A « wordnet » is a lexical database following Princeton WordNet's principes: synsets linked together.