Combining statistical similarity measures for automatic induction of semantic classes

Apostolos Pangos, Elias Iosif, Alexandros Potamianos, Eric Fosler-Lussier

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, an unsupervised semantic class induction algorithm is proposed that is based on the principle that similarity of context implies similarity of meaning. Two semantic similarity metrics that are variations of the Vector Product distance are used in order to measure the semantic distance between words and to automatically generate semantic classes. The first metric computes "wide-context" similarity between words using a "bag-of-words" model, while the second metric computes "narrow-context" similarity using a bigram language model. A hybrid metric that is defined as the linear combination of the wide and narrow-context metrics is also proposed and evaluated. To cluster words into semantic classes an iterative clustering algorithm is used. The semantic metrics are evaluated on two corpora: a semantically heterogeneous web news domain (HR-Net) and an application-specific travel reservation corpus (ATIS). For the hybrid metric, semantic class member precision of 85% is achieved at 17% recall for the HR-Net task and precision of 85% is achieved at 55% recall for the ATIS task.

Original languageEnglish
Title of host publicationProceedings of ASRU 2005
Subtitle of host publication2005 IEEE Automatic Speech Recognition and Understanding Workshop
PublisherIEEE Computer Society
Pages278-283
Number of pages6
ISBN (Print)0780394798, 9780780394797
DOIs
Publication statusPublished - 2005
Externally publishedYes
EventASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop - Cancun, Mexico
Duration: 27 Nov 20051 Dec 2005

Publication series

NameProceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
Volume2005

Conference

ConferenceASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
Country/TerritoryMexico
CityCancun
Period27/11/051/12/05

Fingerprint

Dive into the research topics of 'Combining statistical similarity measures for automatic induction of semantic classes'. Together they form a unique fingerprint.

Cite this