A genetic algorithm approach for topic clustering: A centroid-based encoding scheme

Dionisios N. Sotiropoulos, Demitrios E. Pournarakis, George M. Giaglis

Research output: Chapter in Book/Report/Conference proceedingConference contribution

Abstract

This paper addresses the problem of topic clustering, through the utilization of a novel genetic algorithm approach which is highly scalable on large volumes of textual data, by introducing a centroid-based encoding scheme. The proposed topic clustering method is anchored on the Latent Dirichlet Allocation (LDA) probabilistic topic modeling framework, aiming at identifying cluster formations that are optimal in terms of semantic coherence. Our work focuses on reformulating the clustering problem as a discrete optimization problem within the n-dimensional standard simplex since all the LDA-based data patterns correspond to n-valued probability distribution vectors. The novelty of our proposed genetic algorithm approach lies primarily upon the adaptation of the centroid-based encoding scheme, in the sense that cluster assignments are implicitly extracted by assigning each data point to the nearest cluster center. Experimentation was conducted on a large corpus of twitter posts, particularly relating to the UBER transportation network. The obtained topic clustering results indicate significant improvement in extracting semantically focused groups of documents when compared against traditional clustering algorithms, such as the k-means. The clustering superiority of our proposed genetic algorithm is also justified by measuring the intra- and inter-cluster semantic distances of the obtained cluster formations.

Original languageEnglish
Title of host publicationIISA 2016 - 7th International Conference on Information, Intelligence, Systems and Applications
PublisherInstitute of Electrical and Electronics Engineers Inc.
ISBN (Electronic)9781509034291
DOIs
Publication statusPublished - 14 Dec 2016
Event7th International Conference on Information, Intelligence, Systems and Applications, IISA 2016 - Chalkidiki, Greece
Duration: 13 Jul 201615 Jul 2016

Other

Other7th International Conference on Information, Intelligence, Systems and Applications, IISA 2016
CountryGreece
CityChalkidiki
Period13/07/1615/07/16

Fingerprint Dive into the research topics of 'A genetic algorithm approach for topic clustering: A centroid-based encoding scheme'. Together they form a unique fingerprint.

  • Cite this

    Sotiropoulos, D. N., Pournarakis, D. E., & Giaglis, G. M. (2016). A genetic algorithm approach for topic clustering: A centroid-based encoding scheme. In IISA 2016 - 7th International Conference on Information, Intelligence, Systems and Applications [7785378] Institute of Electrical and Electronics Engineers Inc.. https://doi.org/10.1109/IISA.2016.7785378