On the utility of incremental feature selection for the classification of textual data streams

Ioannis Katakis, Grigorios Tsoumakas, Ioannis Vlahavas

Research output: Chapter in Book/Report/Conference proceedingConference contribution

29 Citations (Scopus)

Abstract

In this paper we argue that incrementally updating the features that a text classification algorithm considers is very important for real-world textual data streams, because in most applications the distribution of data and the description of the classification concept changes over time. We propose the coupling of an incremental feature ranking method and an incremental learning algorithm that can consider different subsets of the feature vector during prediction (what we call a feature based classifier), in order to deal with the above problem. Experimental results with a longitudinal database of real spam and legitimate emails shows that our approach can adapt to the changing nature of streaming data and works much better than classical incremental learning algorithms.

Original languageEnglish
Title of host publicationAdvances in Informatics - 10th Panhellenic Conference on Informatics, PCI 2005, Proceedings
Pages338-348
Number of pages11
DOIs
Publication statusPublished - 1 Dec 2005
Event10th Panhellenic Conference on Informatics, PCI 2005 - Volos, Greece
Duration: 11 Nov 200513 Nov 2005

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume3746 LNCS
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Conference

Conference10th Panhellenic Conference on Informatics, PCI 2005
CountryGreece
CityVolos
Period11/11/0513/11/05

Fingerprint Dive into the research topics of 'On the utility of incremental feature selection for the classification of textual data streams'. Together they form a unique fingerprint.

  • Cite this

    Katakis, I., Tsoumakas, G., & Vlahavas, I. (2005). On the utility of incremental feature selection for the classification of textual data streams. In Advances in Informatics - 10th Panhellenic Conference on Informatics, PCI 2005, Proceedings (pp. 338-348). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 3746 LNCS). https://doi.org/10.1007/11573036_32