TY - GEN
T1 - On the utility of incremental feature selection for the classification of textual data streams
AU - Katakis, Ioannis
AU - Tsoumakas, Grigorios
AU - Vlahavas, Ioannis
PY - 2005/12/1
Y1 - 2005/12/1
N2 - In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider different subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algorithms.
AB - In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider different subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algorithms.
UR - http://www.scopus.com/inward/record.url?scp=33646504407&partnerID=8YFLogxK
U2 - 10.1007/11573036_32
DO - 10.1007/11573036_32
M3 - Conference contribution
AN - SCOPUS:33646504407
SN - 3540296735
SN - 9783540296737
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 338
EP - 348
BT - Advances in Informatics - 10th Panhellenic Conference on Informatics, PCI 2005, Proceedings
T2 - 10th Panhellenic Conference on Informatics, PCI 2005
Y2 - 11 November 2005 through 13 November 2005
ER -