DeepPurple: Estimating sentence semantic similarity using N-gram regression models and Web Snippets

Nikos Malandrakis, Elias Iosif, Alexandros Potamianos

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

We estimate the semantic similarity between two sentences using regression models with features: 1) n-gram hit rates (lexical matches) between sentences, 2) lexical semantic similarity between non-matching words, and 3) sentence length. Lexical semantic similarity is computed via co-occurrence counts on a corpus harvested from the web using a modified mutual informationmetric. State-of-The-Art results are obtained for semantic similarity computation at the word level, however, the fusion of this information at the sentence level provides only moderate improvement on Task 6 of SemEval'12. Despite the simple features used, regression models provide good performance, especially for shorter sentences, reaching correlation of 0.62 on the SemEval test set.

Original languageEnglish
Title of host publicationProceedings of the 6th International Workshop on Semantic Evaluation, SemEval 2012
PublisherAssociation for Computational Linguistics (ACL)
Pages565-570
Number of pages6
ISBN (Electronic)9781937284220
Publication statusPublished - 2012
Externally publishedYes
Event1st Joint Conference on Lexical and Computational Semantics, *SEM 2012 - Montreal, Canada
Duration: 7 Jun 20128 Jun 2012

Publication series

Name*SEM 2012 - 1st Joint Conference on Lexical and Computational Semantics
Volume2

Conference

Conference1st Joint Conference on Lexical and Computational Semantics, *SEM 2012
Country/TerritoryCanada
CityMontreal
Period7/06/128/06/12

Fingerprint

Dive into the research topics of 'DeepPurple: Estimating sentence semantic similarity using N-gram regression models and Web Snippets'. Together they form a unique fingerprint.

Cite this