TY - GEN
T1 - A system for automatic classification of twitter messages into categories
AU - Theodotou, Alexandros
AU - Stassopoulou, Athena
PY - 2015
Y1 - 2015
N2 - Twitter is a widely used online social networking site where users post short messages limited to 140 characters. The small length of these messages is a challenge when it comes to classifying them into categories. In this paper we propose a system that automatically classifies Twitter messages into a set of predefined categories. The system takes into account not only the tweet text, but also external features such as words from linked URLs, mentioned user profiles, and Wikipedia articles. The system is evaluated using various combinations of feature sets. According to our results, the combination of feature sets that achieves the highest accuracy of 90.8% is when the original tweet terms are combined with user profile terms along with terms extracted from linked URLs. Including terms from Wikipedia pages, found specifically for each tweet, is shown to decrease accuracy for the original test set, however accuracy was shown to increase using a fraction of the original test set containing only tweets without URLs.
AB - Twitter is a widely used online social networking site where users post short messages limited to 140 characters. The small length of these messages is a challenge when it comes to classifying them into categories. In this paper we propose a system that automatically classifies Twitter messages into a set of predefined categories. The system takes into account not only the tweet text, but also external features such as words from linked URLs, mentioned user profiles, and Wikipedia articles. The system is evaluated using various combinations of feature sets. According to our results, the combination of feature sets that achieves the highest accuracy of 90.8% is when the original tweet terms are combined with user profile terms along with terms extracted from linked URLs. Including terms from Wikipedia pages, found specifically for each tweet, is shown to decrease accuracy for the original test set, however accuracy was shown to increase using a fraction of the original test set containing only tweets without URLs.
UR - http://www.scopus.com/inward/record.url?scp=84952342725&partnerID=8YFLogxK
U2 - 10.1007/978-3-319-25591-0_44
DO - 10.1007/978-3-319-25591-0_44
M3 - Conference contribution
AN - SCOPUS:84952342725
SN - 9783319255903
VL - 9405
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 532
EP - 537
BT - Modeling and Using Context - 9th International and Interdisciplinary Conference, CONTEXT 2015, Proceedings
PB - Springer Verlag
T2 - 9th International and Interdisciplinary Conference on Modeling and Using Context, CONTEXT 2015
Y2 - 2 November 2015 through 6 November 2015
ER -