A system for automatic classification of twitter messages into categories

Alexandros Theodotou, Athena Stassopoulou

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Twitter is a widely used online social networking site where users post short messages limited to 140 characters. The small length of these messages is a challenge when it comes to classifying them into categories. In this paper we propose a system that automatically classifies Twitter messages into a set of predefined categories. The system takes into account not only the tweet text, but also external features such as words from linked URLs, mentioned user profiles, and Wikipedia articles. The system is evaluated using various combinations of feature sets. According to our results, the combination of feature sets that achieves the highest accuracy of 90.8% is when the original tweet terms are combined with user profile terms along with terms extracted from linked URLs. Including terms from Wikipedia pages, found specifically for each tweet, is shown to decrease accuracy for the original test set, however accuracy was shown to increase using a fraction of the original test set containing only tweets without URLs.

Original languageEnglish
Title of host publicationModeling and Using Context - 9th International and Interdisciplinary Conference, CONTEXT 2015, Proceedings
PublisherSpringer Verlag
Pages532-537
Number of pages6
Volume9405
ISBN (Print)9783319255903
DOIs
Publication statusPublished - 2015
Event9th International and Interdisciplinary Conference on Modeling and Using Context, CONTEXT 2015 - Lanarca, Cyprus
Duration: 2 Nov 20156 Nov 2015

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume9405
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other9th International and Interdisciplinary Conference on Modeling and Using Context, CONTEXT 2015
Country/TerritoryCyprus
CityLanarca
Period2/11/156/11/15

Fingerprint

Dive into the research topics of 'A system for automatic classification of twitter messages into categories'. Together they form a unique fingerprint.

Cite this