Characterizing crawler behavior from web server access logs

Marios Dikaiakos, Athena Stassopoulou, Loizos Papageorgiou

Research output: Chapter in Book/Report/Conference proceedingConference contribution

12 Citations (Scopus)

Abstract

In this paper, we present a study of crawler behavior based on Web-server access logs. To this end, we use logs from five different academic sites in three countries. Based on these logs, we analyze the activity of different crawlers that belong to five Search Engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic, and to general characterization studies based on Web-server access logs. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of WWW robots.

Original languageEnglish
Title of host publicationLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PublisherSpringer Verlag
Pages369-378
Number of pages10
Volume2738
ISBN (Print)3540408088, 9783540408086
DOIs
Publication statusPublished - 2003
Event4th International Conference on E-Commerce and Web Technology, EC-Web 2003 - Prague, Czech Republic
Duration: 2 Sep 20035 Sep 2003

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume2738
ISSN (Print)0302-9743
ISSN (Electronic)1611-3349

Other

Other4th International Conference on E-Commerce and Web Technology, EC-Web 2003
CountryCzech Republic
CityPrague
Period2/09/035/09/03

Fingerprint Dive into the research topics of 'Characterizing crawler behavior from web server access logs'. Together they form a unique fingerprint.

  • Cite this

    Dikaiakos, M., Stassopoulou, A., & Papageorgiou, L. (2003). Characterizing crawler behavior from web server access logs. In Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics) (Vol. 2738, pp. 369-378). (Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics); Vol. 2738). Springer Verlag. https://doi.org/10.1007/b11826