Advanced Search

Show simple item record

dc.contributor.advisorDAĞ, HASANen_US
dc.contributor.authorDemirkıran, Ferhat
dc.date.accessioned2023-07-25T07:39:39Z
dc.date.available2023-07-25T07:39:39Z
dc.date.issued2022-01
dc.identifier.urihttps://hdl.handle.net/20.500.12469/4361
dc.description.abstractClassification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Hence, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classification as these sequences represent the behavior of malware. However, traditional ma chine and deep learning models remain incapable of capturing sequence relation ships among API calls. Unlike traditional machine and deep learning models, the transformer-based models process the sequences in whole and learn relationships among API calls due to multi-head attention mechanisms and positional embed dings. Our experiments demonstrate that the transformer model with one trans former block layer surpass the performance of the widely used base architecture, LSTM. Moreover, BERT or CANINE, the pre-trained transformer models, out performs in classifying highly imbalanced malware families according to evaluation metrics: F1-score and AUC score. Furthermore, our proposed bagging-based ran dom transformer forest (RTF) model, an ensemble of BERT or CANINE, reaches the state-of-the-art evaluation scores on the three out of four datasets, specifically it captures a state-of-the-art F1-score of 0.6149 on one of the commonly used bench mark dataset.en_US
dc.language.isoengen_US
dc.publisherKadir Has Üniversitesien_US
dc.rightsinfo:eu-repo/semantics/openAccessen_US
dc.subjectTransformeren_US
dc.subjectTokenization-Freeen_US
dc.subjectAPI Callsen_US
dc.subjectImbalanceden_US
dc.subjectMulticlassen_US
dc.subjectBERTen_US
dc.subjectCANINEen_US
dc.subjectEnsembleen_US
dc.subjectMalware Classificationen_US
dc.titleAn ensemble of pre-trained transformer models for imbalanced multiclass malware classificationen_US
dc.typemasterThesisen_US
dc.departmentEnstitüler, Lisansüstü Eğitim Enstitüsü, İşletme Ana Bilim Dalıen_US
dc.identifier.wosWOS:000881541300005en_US
dc.identifier.scopus2-s2.0-85136643921en_US
dc.relation.publicationcategoryTezen_US
dc.identifier.yoktezid718678en_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record