Tracking recurring contexts using ensemble classifiers: An application to email filtering

Ioannis Katakis, Grigorios Tsoumakas, Ioannis Vlahavas

Research output: Contribution to journalArticlepeer-review

141 Citations (Scopus)

Abstract

Concept drift constitutes a challenging problem for the machine learning and data mining community that frequently appears in real world stream classification problems. It is usually defined as the unforeseeable concept change of the target variable in a prediction task. In this paper, we focus on the problem of recurring contexts, a special sub-type of concept drift, that has not yet met the proper attention from the research community. In the case of recurring contexts, concepts may re-appear in future and thus older classification models might be beneficial for future classifications. We propose a general framework for classifying data streams by exploiting stream clustering in order to dynamically build and update an ensemble of incremental classifiers. To achieve this, a transformation function that maps batches of examples into a new conceptual representation model is proposed. The clustering algorithm is then applied in order to group batches of examples into concepts and identify recurring contexts. The ensemble is produced by creating and maintaining an incremental classifier for every concept discovered in the data stream. An experimental study is performed using (a) two new real-world concept drifting datasets from the email domain, (b) an instantiation of the proposed framework and (c) five methods for dealing with drifting concepts. Results indicate the effectiveness of the proposed representation and the suitability of the concept-specific classifiers for problems with recurring contexts.

Original languageEnglish
Pages (from-to)371-391
Number of pages21
JournalKnowledge and Information Systems
Volume22
Issue number3
DOIs
Publication statusPublished - 1 Mar 2010

Fingerprint

Dive into the research topics of 'Tracking recurring contexts using ensemble classifiers: An application to email filtering'. Together they form a unique fingerprint.

Cite this