TY - GEN
T1 - Combining statistical similarity measures for automatic induction of semantic classes
AU - Pangos, Apostolos
AU - Iosif, Elias
AU - Potamianos, Alexandros
AU - Fosler-Lussier, Eric
PY - 2005
Y1 - 2005
N2 - In this paper, an unsupervised semantic class induction algorithm is proposed that is based on the principle that similarity of context implies similarity of meaning. Two semantic similarity metrics that are variations of the Vector Product distance are used in order to measure the semantic distance between words and to automatically generate semantic classes. The first metric computes "wide-context" similarity between words using a "bag-of-words" model, while the second metric computes "narrow-context" similarity using a bigram language model. A hybrid metric that is defined as the linear combination of the wide and narrow-context metrics is also proposed and evaluated. To cluster words into semantic classes an iterative clustering algorithm is used. The semantic metrics are evaluated on two corpora: a semantically heterogeneous web news domain (HR-Net) and an application-specific travel reservation corpus (ATIS). For the hybrid metric, semantic class member precision of 85% is achieved at 17% recall for the HR-Net task and precision of 85% is achieved at 55% recall for the ATIS task.
AB - In this paper, an unsupervised semantic class induction algorithm is proposed that is based on the principle that similarity of context implies similarity of meaning. Two semantic similarity metrics that are variations of the Vector Product distance are used in order to measure the semantic distance between words and to automatically generate semantic classes. The first metric computes "wide-context" similarity between words using a "bag-of-words" model, while the second metric computes "narrow-context" similarity using a bigram language model. A hybrid metric that is defined as the linear combination of the wide and narrow-context metrics is also proposed and evaluated. To cluster words into semantic classes an iterative clustering algorithm is used. The semantic metrics are evaluated on two corpora: a semantically heterogeneous web news domain (HR-Net) and an application-specific travel reservation corpus (ATIS). For the hybrid metric, semantic class member precision of 85% is achieved at 17% recall for the HR-Net task and precision of 85% is achieved at 55% recall for the ATIS task.
UR - http://www.scopus.com/inward/record.url?scp=33846230604&partnerID=8YFLogxK
U2 - 10.1109/ASRU.2005.1566510
DO - 10.1109/ASRU.2005.1566510
M3 - Conference contribution
AN - SCOPUS:33846230604
SN - 0780394798
SN - 9780780394797
T3 - Proceedings of ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
SP - 278
EP - 283
BT - Proceedings of ASRU 2005
PB - IEEE Computer Society
T2 - ASRU 2005: 2005 IEEE Automatic Speech Recognition and Understanding Workshop
Y2 - 27 November 2005 through 1 December 2005
ER -