An Ensemble of Pre-Trained Transformer Models for Imbalanced Multiclass Malware Classification

Dağ, Hasan; Demirkıran, Ferhat

An Ensemble of Pre-Trained Transformer Models for Imbalanced Multiclass Malware Classification

dc.contributor.advisor	Dağ, Hasan	en_US
dc.contributor.author	Dağ, Hasan
dc.contributor.author	Demirkıran, Ferhat
dc.contributor.author	Demirkıran, Ferhat
dc.contributor.other	Management Information Systems
dc.date	2022-01
dc.date.accessioned	2023-07-25T07:39:39Z
dc.date.available	2023-07-25T07:39:39Z
dc.date.issued	2022
dc.department	Enstitüler, Lisansüstü Eğitim Enstitüsü, İşletme Ana Bilim Dalı	en_US
dc.description.abstract	Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Hence, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classification as these sequences represent the behavior of malware. However, traditional ma chine and deep learning models remain incapable of capturing sequence relation ships among API calls. Unlike traditional machine and deep learning models, the transformer-based models process the sequences in whole and learn relationships among API calls due to multi-head attention mechanisms and positional embed dings. Our experiments demonstrate that the transformer model with one trans former block layer surpass the performance of the widely used base architecture, LSTM. Moreover, BERT or CANINE, the pre-trained transformer models, out performs in classifying highly imbalanced malware families according to evaluation metrics: F1-score and AUC score. Furthermore, our proposed bagging-based ran dom transformer forest (RTF) model, an ensemble of BERT or CANINE, reaches the state-of-the-art evaluation scores on the three out of four datasets, specifically it captures a state-of-the-art F1-score of 0.6149 on one of the commonly used bench mark dataset.	en_US
dc.identifier.citationcount	16
dc.identifier.scopus	2-s2.0-85136643921	en_US
dc.identifier.uri	https://hdl.handle.net/20.500.12469/4361
dc.identifier.wos	WOS:000881541300005	en_US
dc.identifier.yoktezid	718678	en_US
dc.language.iso	en	en_US
dc.publisher	Kadir Has Üniversitesi	en_US
dc.relation.publicationcategory	Tez	en_US
dc.rights	info:eu-repo/semantics/openAccess	en_US
dc.scopus.citedbyCount	43
dc.subject	Transformer	en_US
dc.subject	Tokenization-Free	en_US
dc.subject	API Calls	en_US
dc.subject	Imbalanced	en_US
dc.subject	Multiclass	en_US
dc.subject	BERT	en_US
dc.subject	CANINE	en_US
dc.subject	Ensemble	en_US
dc.subject	Malware Classification	en_US
dc.title	An Ensemble of Pre-Trained Transformer Models for Imbalanced Multiclass Malware Classification	en_US
dc.type	Master Thesis	en_US
dc.wos.citedbyCount	26
dspace.entity.type	Publication
relation.isAuthorOfPublication	e02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isAuthorOfPublication	695a8adc-2330-4d32-ab37-8b781716d609
relation.isAuthorOfPublication.latestForDiscovery	e02bc683-b72e-4da4-a5db-ddebeb21e8e7
relation.isOrgUnitOfPublication	ff62e329-217b-4857-88f0-1dae00646b8c
relation.isOrgUnitOfPublication.latestForDiscovery	ff62e329-217b-4857-88f0-1dae00646b8c

Files

Original bundle

Now showing 1 - 1 of 1

Name:: Ferhat_Demirkıran.pdf
Size:: 800.23 KB
Format:: Adobe Portable Document Format
Description:: An Ensemble of Pre-Trained Transformer Models for Imbalanced Multiclass Malware Classification

Download

Collections

Tez Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu