TY - GEN
T1 - An experimental comparison of machine learning classification algorithms for breast cancer diagnosis
AU - Kaklamanis, Markos Marios
AU - Filippakis, Michael
AU - Touloupos, Marios
AU - Christodoulou, Klitos
PY - 2020/1/1
Y1 - 2020/1/1
N2 - In this paper four machine learning algorithms are compared in order to predict if a cell nucleus is benign or malignant using the Breast Cancer Wisconsin (Diagnostic) Data Set. The algorithms are K-Nearest Neighbours, Classification and Regression Trees (CART), Naïve Bayes and Support Vector Machines with Radial Basis Function Kernel. Data visualization and Pre- Processing using PCA will help in the understanding and the preparation of the dataset for the training phase while parameter tuning will determine the optimal parameter for every model using R as programming language. Also, 10-fold Cross Validation is used as a resampling method after comparing it with Bootstrapping, as it is the most efficient out of the two. In the end, our comparison shows that the machine learning model that marked the highest Accuracy is the one that is trained using K Nearest Neighbours. Nowadays, one of the most common forms of cancer among women is breast cancer with more than one million cases and nearly 600,000 deaths occurring worldwide annually [1]. It is the second leading cause of death among women and thus it must be detected at an early stage in order not to become fatal [2]. Thus, the importance of diagnosing if a biopsied cell is benign or malignant is vital. However, this process is quite complicated as it involves several stages of gathering and analysing samples with many variables, making the final diagnosis a demanding and timely procedure. The rapid growth of Artificial Intelligence and Machine learning and their implementation in Medicine give us a new perspective in the way we process and analyse medical data. Medical experts can use Data Mining techniques and improve their decision making by extracting useful information from massive amounts of data.
AB - In this paper four machine learning algorithms are compared in order to predict if a cell nucleus is benign or malignant using the Breast Cancer Wisconsin (Diagnostic) Data Set. The algorithms are K-Nearest Neighbours, Classification and Regression Trees (CART), Naïve Bayes and Support Vector Machines with Radial Basis Function Kernel. Data visualization and Pre- Processing using PCA will help in the understanding and the preparation of the dataset for the training phase while parameter tuning will determine the optimal parameter for every model using R as programming language. Also, 10-fold Cross Validation is used as a resampling method after comparing it with Bootstrapping, as it is the most efficient out of the two. In the end, our comparison shows that the machine learning model that marked the highest Accuracy is the one that is trained using K Nearest Neighbours. Nowadays, one of the most common forms of cancer among women is breast cancer with more than one million cases and nearly 600,000 deaths occurring worldwide annually [1]. It is the second leading cause of death among women and thus it must be detected at an early stage in order not to become fatal [2]. Thus, the importance of diagnosing if a biopsied cell is benign or malignant is vital. However, this process is quite complicated as it involves several stages of gathering and analysing samples with many variables, making the final diagnosis a demanding and timely procedure. The rapid growth of Artificial Intelligence and Machine learning and their implementation in Medicine give us a new perspective in the way we process and analyse medical data. Medical experts can use Data Mining techniques and improve their decision making by extracting useful information from massive amounts of data.
KW - Breast cancer
KW - Classification
KW - Data mining
KW - Diagnosis
KW - Machine learning
UR - http://www.scopus.com/inward/record.url?scp=85083983184&partnerID=8YFLogxK
U2 - 10.1007/978-3-030-44322-1_2
DO - 10.1007/978-3-030-44322-1_2
M3 - Conference contribution
AN - SCOPUS:85083983184
SN - 9783030443214
T3 - Lecture Notes in Business Information Processing
SP - 18
EP - 30
BT - Information Systems - 16th European, Mediterranean, and Middle Eastern Conference, EMCIS 2019, Proceedings
A2 - Themistocleous, Marinos
A2 - Papadaki, Maria
PB - Springer India
T2 - 16th European, Mediterranean, and Middle Eastern Conference on Information System, EMCIS 2019
Y2 - 9 December 2019 through 10 December 2019
ER -