Advancing Image Spam Detection: Evaluating Machine Learning Models Through Comparative Analysis

dc.authorscopusid 59206867000
dc.authorscopusid 59938845100
dc.authorscopusid 55225971200
dc.authorscopusid 37010805100
dc.authorscopusid 6602924425
dc.contributor.author Jamil, Mahnoor
dc.contributor.author Trpcheska, Hristina Mihajloska
dc.contributor.author Popovska-Mitrovikj, Aleksandra
dc.contributor.author Dimitrova, Vesna
dc.contributor.author Creutzburg, Reiner
dc.date.accessioned 2025-07-15T18:46:01Z
dc.date.available 2025-07-15T18:46:01Z
dc.date.issued 2025
dc.department Kadir Has University en_US
dc.department-temp [Jamil, Mahnoor; Trpcheska, Hristina Mihajloska; Popovska-Mitrovikj, Aleksandra; Dimitrova, Vesna] Ss Cyril & Methodius Univ, Fac Comp Sci & Engn, Skopje 1000, North Macedonia; [Jamil, Mahnoor] Kadir Has Univ, Sch Grad Studies, TR-34083 Istanbul, Turkiye; [Creutzburg, Reiner] SRH Univ Appl Sci Heidelberg, Sch Technol & Architecture, D-12059 Berlin, Germany; [Creutzburg, Reiner] TH Brandenburg, Fachbereich Informat & Medien, D-14770 Brandenburg, Germany en_US
dc.description.abstract Image-based spam poses a significant challenge for traditional text-based filters, as malicious content is often embedded within images to bypass keyword detection techniques. This study investigates and compares the performance of six machine learning models-ResNet50, XGBoost, Logistic Regression, LightGBM, Support Vector Machine (SVM), and VGG16-using a curated dataset containing 678 legitimate (ham) and 520 spam images. The novelty of this research lies in its comprehensive side-by-side evaluation of diverse models on the same dataset, using standardized dataset preprocessing, balanced data splits, and validation techniques. Model performance was assessed using evaluation metrics such as accuracy, receiver operating characteristic (ROC) curve, precision, recall, and area under the curve (AUC). The results indicate that ResNet50 achieved the highest classification performance, followed closely by XGBoost and Logistic Regression. This work provides practical insights into the strengths and limitations of traditional, ensemble-based, and deep learning models for image-based spam detection. The findings can support the development of more effective and generalizable spam filtering solutions in multimedia-rich communication platforms. en_US
dc.description.sponsorship European Union [101082683]; Faculty of Computer Science and Engineering at Ss. Cyril; Methodius University in Skopje en_US
dc.description.sponsorship This work was supported partially by the European Union in the framework of ERASMUS MUNDUS, Project CyberMACS #101082683 and Faculty of Computer Science and Engineering at Ss. Cyril and Methodius University in Skopje en_US
dc.description.woscitationindex Science Citation Index Expanded
dc.identifier.doi 10.3390/app15116158
dc.identifier.issn 2076-3417
dc.identifier.issue 11 en_US
dc.identifier.scopus 2-s2.0-105007702913
dc.identifier.scopusquality Q3
dc.identifier.uri https://doi.org/10.3390/app15116158
dc.identifier.uri https://hdl.handle.net/20.500.12469/7387
dc.identifier.volume 15 en_US
dc.identifier.wos WOS:001505753600001
dc.identifier.wosquality Q2
dc.language.iso en en_US
dc.publisher MDPI en_US
dc.relation.publicationcategory Makale - Uluslararası Hakemli Dergi - Kurum Öğretim Elemanı en_US
dc.rights info:eu-repo/semantics/closedAccess en_US
dc.subject Spam Detection en_US
dc.subject Image Spam en_US
dc.subject Machine Learning en_US
dc.subject Support Vector Machine en_US
dc.subject XGBoost en_US
dc.subject Logistic Regression en_US
dc.subject ResNet50 en_US
dc.subject LightGBM en_US
dc.subject VGG16 en_US
dc.title Advancing Image Spam Detection: Evaluating Machine Learning Models Through Comparative Analysis en_US
dc.type Article en_US
dspace.entity.type Publication

Files