Enhancing Malware Classification: a Comparative Study of Feature Selection Models With Parameter Optimization
No Thumbnail Available
Date
2024
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Institute of Electrical and Electronics Engineers Inc.
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
This study assesses the impact of seven feature selection algorithms (Minimum Redundancy Maximum Relevance (MRMR), Mutual Information (MI), Chi-Square (Chi), Leave One Feature Out (LOFO), Feature Relevance-based Unsupervised Feature Selection (FRUFS), A General Framework for Auto-Weighted Feature Selection via Global Redundancy Minimization (AGRM), and BoostARoota) across two malware datasets (Microsoft and API call sequences) using three machine learning models (Extreme Gradient Boosting (Xgboost), Random Forest, and Histogram-Based Gradient Boosting (Hist Gradient Boosting)). The analysis reveals that no feature selection algorithm uniformly outperforms the others as their effectiveness varies based on the dataset and model characteristics. Specifically, BoostARoota demonstrated significant compatibility with the Microsoft dataset, especially after parameter optimization, whereas its performance varied with the API call sequences dataset, suggesting the need for customized parameter selection. This study highlights the necessity of tailored feature selection approaches and parameter adjustments to optimize machine learning model performance across different datasets. © 2024 IEEE.
Description
Keywords
Feature selection, Machine learning, Malware classification, Parameter optimization
Turkish CoHE Thesis Center URL
Fields of Science
Citation
0
WoS Q
Scopus Q
Source
2024 Systems and Information Engineering Design Symposium, SIEDS 2024 -- 2024 Systems and Information Engineering Design Symposium, SIEDS 2024 -- 3 May 2024 -- Charlottesville -- 199691
Volume
Issue
Start Page
511
End Page
516