An Ensemble of Pre-Trained Transformer Models for Imbalanced Multiclass Malware Classification

Dağ, Hasan; Demirkıran, Ferhat

An Ensemble of Pre-Trained Transformer Models for Imbalanced Multiclass Malware Classification

Files

Ferhat_Demirkıran.pdf (800.23 KB)

Date

2022

Authors

Dağ, Hasan

Demirkıran, Ferhat

Publisher

Kadir Has Üniversitesi

Organizational Units

Organizational Unit

Management Information Systems

Abstract

Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Hence, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classification as these sequences represent the behavior of malware. However, traditional ma chine and deep learning models remain incapable of capturing sequence relation ships among API calls. Unlike traditional machine and deep learning models, the transformer-based models process the sequences in whole and learn relationships among API calls due to multi-head attention mechanisms and positional embed dings. Our experiments demonstrate that the transformer model with one trans former block layer surpass the performance of the widely used base architecture, LSTM. Moreover, BERT or CANINE, the pre-trained transformer models, out performs in classifying highly imbalanced malware families according to evaluation metrics: F1-score and AUC score. Furthermore, our proposed bagging-based ran dom transformer forest (RTF) model, an ensemble of BERT or CANINE, reaches the state-of-the-art evaluation scores on the three out of four datasets, specifically it captures a state-of-the-art F1-score of 0.6149 on one of the commonly used bench mark dataset.

Keywords

Transformer, Tokenization-Free, API Calls, Imbalanced, Multiclass, BERT, CANINE, Ensemble, Malware Classification

URI

https://hdl.handle.net/20.500.12469/4361

Collections

Tez Koleksiyonu
Scopus İndeksli Yayınlar Koleksiyonu
WoS İndeksli Yayınlar Koleksiyonu

Full item page

An Ensemble of Pre-Trained Transformer Models for Imbalanced Multiclass Malware Classification

Files

Date

Authors

Journal Title

Journal ISSN

Volume Title

Publisher

Open Access Color

OpenAIRE Downloads

OpenAIRE Views

Research Projects

Organizational Units

Journal Issue

Events

Abstract

Description

Keywords

Turkish CoHE Thesis Center URL

Fields of Science

Citation

WoS Q

Scopus Q

Source

Volume

Issue

Start Page

End Page

URI

Collections