TY - JOUR
T1 - Web robot detection
T2 - A probabilistic reasoning approach
AU - Stassopoulou, Athena
AU - Dikaiakos, Marios D.
PY - 2009/2/27
Y1 - 2009/2/27
N2 - In this paper, we introduce a probabilistic modeling approach for addressing the problem of Web robot detection from Web-server access logs. More specifically, we construct a Bayesian network that classifies automatically access log sessions as being crawler- or human-induced, by combining various pieces of evidence proven to characterize crawler and human behavior. Our approach uses an adaptive-threshold technique to extract Web sessions from access logs. Then, we apply machine learning techniques to determine the parameters of the probabilistic model. The resulting classification is based on the maximum posterior probability of all classes given the available evidence. We apply our method to real Web-server logs and obtain results that demonstrate the robustness and effectiveness of probabilistic reasoning for crawler detection.
AB - In this paper, we introduce a probabilistic modeling approach for addressing the problem of Web robot detection from Web-server access logs. More specifically, we construct a Bayesian network that classifies automatically access log sessions as being crawler- or human-induced, by combining various pieces of evidence proven to characterize crawler and human behavior. Our approach uses an adaptive-threshold technique to extract Web sessions from access logs. Then, we apply machine learning techniques to determine the parameters of the probabilistic model. The resulting classification is based on the maximum posterior probability of all classes given the available evidence. We apply our method to real Web-server logs and obtain results that demonstrate the robustness and effectiveness of probabilistic reasoning for crawler detection.
KW - Bayesian classifiers
KW - Probabilistic reasoning
KW - Web crawler detection
UR - http://www.scopus.com/inward/record.url?scp=58549116778&partnerID=8YFLogxK
U2 - 10.1016/j.comnet.2008.09.021
DO - 10.1016/j.comnet.2008.09.021
M3 - Article
AN - SCOPUS:58549116778
SN - 1389-1286
VL - 53
SP - 265
EP - 278
JO - Computer Networks
JF - Computer Networks
IS - 3
ER -