TY - GEN
T1 - Adverse Drug Reaction Classification in Social Media
T2 - 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023
AU - Durand, Julie
AU - Stassopoulou, Athena
AU - Katakis, Ioannis
N1 - Publisher Copyright:
© 2023 IEEE.
PY - 2023
Y1 - 2023
N2 - Many patients readily share experiences about their medical conditions and treatments on online social media, which makes these platforms a potentially valuable source of infor-mation on adverse drug reactions (ADRs). In this work, the detection of mentions of AD Rs in Reddit posts is approached as a multi-label classification problem. A dataset of 537 annotated posts was created by supplementing a publicly available dataset with freshly collected and annotated posts. The labels were mapped to the Medical Dictionary for Regulatory Activities (MedDRA) and their distribution within each MedDRA level guided the creation of 12 data subsets. On each data subset, we applied 4 different multi-label learning methods - Binary Relevance (BR), Classifier Chains (CC), Label Powerset (LP) and random k-Iabelsets (RAkEL), each associated with 4 different base classifiers: Decision Trees (DT), Naive Bayes (NB), Random Forest (RF) and Support Vector Machine (SVM). The best F-scores were with DT on the data subset based on the 20 most frequent labels at MedDRA Preferred Term (PT) level. The best hamming loss was with the data subset based on all labels at PT level. The type of multi-label learning method did not appear to influence performance significantly. Our results show a promising direction in the use of multi-label classification of ADRs from social media posts for pharmacovigilance purposes.
AB - Many patients readily share experiences about their medical conditions and treatments on online social media, which makes these platforms a potentially valuable source of infor-mation on adverse drug reactions (ADRs). In this work, the detection of mentions of AD Rs in Reddit posts is approached as a multi-label classification problem. A dataset of 537 annotated posts was created by supplementing a publicly available dataset with freshly collected and annotated posts. The labels were mapped to the Medical Dictionary for Regulatory Activities (MedDRA) and their distribution within each MedDRA level guided the creation of 12 data subsets. On each data subset, we applied 4 different multi-label learning methods - Binary Relevance (BR), Classifier Chains (CC), Label Powerset (LP) and random k-Iabelsets (RAkEL), each associated with 4 different base classifiers: Decision Trees (DT), Naive Bayes (NB), Random Forest (RF) and Support Vector Machine (SVM). The best F-scores were with DT on the data subset based on the 20 most frequent labels at MedDRA Preferred Term (PT) level. The best hamming loss was with the data subset based on all labels at PT level. The type of multi-label learning method did not appear to influence performance significantly. Our results show a promising direction in the use of multi-label classification of ADRs from social media posts for pharmacovigilance purposes.
KW - adverse drug reaction
KW - multi-label classification
KW - social media
UR - http://www.scopus.com/inward/record.url?scp=85182522963&partnerID=8YFLogxK
U2 - 10.1109/WI-IAT59888.2023.00045
DO - 10.1109/WI-IAT59888.2023.00045
M3 - Conference contribution
AN - SCOPUS:85182522963
T3 - Proceedings - 2023 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023
SP - 280
EP - 286
BT - Proceedings - 2023 22nd IEEE/WIC International Conference on Web Intelligence and Intelligent Agent Technology, WI-IAT 2023
PB - Institute of Electrical and Electronics Engineers Inc.
Y2 - 26 October 2023 through 29 October 2023
ER -