An investigation of web crawler behavior: Characterization and metrics

Marios D. Dikaiakos, Athena Stassopoulou, Loizos Papageorgiou

Research output: Contribution to journalArticlepeer-review

Abstract

In this paper, we present a characterization study of search-engine crawlers. For the purposes of our work, we use Web-server access logs from five academic sites in three different countries. Based on these logs, we analyze the activity of different crawlers that belong to five search engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic and to general characterization studies. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. We propose a set of simple metrics that describe qualitative characteristics of crawler behavior, vis-à-vis a crawler's preference on resources of a particular format, its frequency of visits on a Web site, and the pervasiveness of its visits to a particular site. To the best of our knowledge, this is the first extensive and in depth characterization of search-engine crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of Web crawlers.

Original languageEnglish
Pages (from-to)880-897
Number of pages18
JournalComputer Communications
Volume28
Issue number8
DOIs
Publication statusPublished - 16 May 2005

Keywords

  • Crawlers
  • Web characterization

Fingerprint

Dive into the research topics of 'An investigation of web crawler behavior: Characterization and metrics'. Together they form a unique fingerprint.

Cite this