Abstract
We investigate language-agnostic algorithms for the construction of unsupervised distributional semantic models using web-harvested corpora. Specifically, a corpus is created from web document snippets, and the relevant semantic similarity statistics are encoded in a semantic network. We propose the notion of semantic neighborhoods that are defined using co-occurrence or context similarity features. Three neighborhood-based similarity metrics are proposed, motivated by the hypotheses of attributional and maximum sense similarity. The proposed metrics are evaluated against human similarity ratings achieving state-of-the-art results.
| Original language | English |
|---|---|
| Pages (from-to) | 49-79 |
| Number of pages | 31 |
| Journal | Natural Language Engineering |
| Volume | 21 |
| Issue number | 1 |
| DOIs | |
| Publication status | Published - 23 Jan 2015 |
| Externally published | Yes |
Fingerprint
Dive into the research topics of 'Similarity computation using semantic networks created from web-harvested data'. Together they form a unique fingerprint.Cite this
- APA
- Author
- BIBTEX
- Harvard
- Standard
- RIS
- Vancouver