TY - GEN
T1 - Characterizing crawler behavior from web server access logs
AU - Dikaiakos, Marios
AU - Stassopoulou, Athena
AU - Papageorgiou, Loizos
PY - 2003
Y1 - 2003
N2 - In this paper, we present a study of crawler behavior based on Web-server access logs. To this end, we use logs from five different academic sites in three countries. Based on these logs, we analyze the activity of different crawlers that belong to five Search Engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic, and to general characterization studies based on Web-server access logs. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of WWW robots.
AB - In this paper, we present a study of crawler behavior based on Web-server access logs. To this end, we use logs from five different academic sites in three countries. Based on these logs, we analyze the activity of different crawlers that belong to five Search Engines: Google, AltaVista, Inktomi, FastSearch and CiteSeer. We compare crawler behavior to the characteristics of the general World-Wide Web traffic, and to general characterization studies based on Web-server access logs. We analyze crawler requests to derive insights into the behavior and strategy of crawlers. Our results and observations provide useful insights into crawler behavior and serve as basis of our ongoing work on the automatic detection of WWW robots.
UR - http://www.scopus.com/inward/record.url?scp=33947181188&partnerID=8YFLogxK
U2 - 10.1007/b11826
DO - 10.1007/b11826
M3 - Conference contribution
AN - SCOPUS:33947181188
SN - 3540408088
SN - 9783540408086
VL - 2738
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 369
EP - 378
BT - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
PB - Springer Verlag
T2 - 4th International Conference on E-Commerce and Web Technology, EC-Web 2003
Y2 - 2 September 2003 through 5 September 2003
ER -