TY - GEN
T1 - A probabilistic reasoning approach for discovering web crawler sessions
AU - Stassopoulou, Athena
AU - Dikaiakos, Marios D.
PY - 2007
Y1 - 2007
N2 - In this paper we introduce a probabilistic-reasoning approach to detect Web robots (crawlers) from human visitors of Web sites. Our approach employs a Naive Bayes network to classify the HTTP sessions of a Web-server access log as crawler or human induced. The Bayesian network combines various pieces of evidence that were shown to distinguish between crawler and human HTTP traffic. The parameters of the Bayesian network are determined with machine learning techniques, and the resulting classification is based on the maximum posterior probability of all classes, given the available evidence. Our method is applied on real Web logs and provides a classification accuracy of 95%. The high accuracy with which our system detects crawler sessions, proves the robustness and effectiveness of the proposed methodology.
AB - In this paper we introduce a probabilistic-reasoning approach to detect Web robots (crawlers) from human visitors of Web sites. Our approach employs a Naive Bayes network to classify the HTTP sessions of a Web-server access log as crawler or human induced. The Bayesian network combines various pieces of evidence that were shown to distinguish between crawler and human HTTP traffic. The parameters of the Bayesian network are determined with machine learning techniques, and the resulting classification is based on the maximum posterior probability of all classes, given the available evidence. Our method is applied on real Web logs and provides a classification accuracy of 95%. The high accuracy with which our system detects crawler sessions, proves the robustness and effectiveness of the proposed methodology.
UR - http://www.scopus.com/inward/record.url?scp=38049025300&partnerID=8YFLogxK
M3 - Conference contribution
AN - SCOPUS:38049025300
SN - 9783540724834
VL - 4505 LNCS
T3 - Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
SP - 265
EP - 272
BT - Advances in Data and Web Management - Joint 9th Asia-Pacific Web Conference, APWeb 2007 and 8th International Conference on Web-Age Information Management, WAIM 2007, Proceedings
T2 - Joint 9th Asia-Pacific Web Conference on Advances in Data and Web Management, APWeb 2007 and 8th International Conference on Web-Age Information Management, WAIM 2007
Y2 - 16 June 2007 through 18 June 2007
ER -