Ensemble of feature selection models for malware datasets
Loading...
Date
2022
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Kadir Has Üniversitesi
Open Access Color
OpenAIRE Downloads
OpenAIRE Views
Abstract
While the development of technology has made our lives easier, our dependence on it has also increased. Cybercriminals develop various types of malware to exploit this dependence. Thus, malware classification is essential for security researchers and incident response teams to take action against them and accelerate mitigation. In this study, we selected seven feature selection methods considering their popularity, effectiveness, and complexity: LOFO Importance (Leave One Feature Out) , FRUFS (Feature Relevance based Unsupervised Feature Selection), AGRM (A General Framework for Auto-Weighted Feature Selection with Global Redundancy Minimization), MI (Mutual Information), Chi-square test, mRMR (Minimum Redundancy and Maximum Relevance), BoostARoota. We performed all the experiments in this study using XGBoost (Extreme Gradient Boosting), RF (Random Forest), and HGB (Histogram-Based Gradient Boosting) machine learning classifiers and accuracy, F1-score, and AUC-score (Area under the ROC Curve) evaluation metrics. We measured the parameter sensitivities of these feature selection methods having adjustable parameters on two high-dimensional datasets: the Microsoft Malware Prediction dataset and the API Call Sequences dataset. These feature selection methods and parameters are FRUFS (model-c, random-state), BoostARoota (clf, iters), and LOFO (model). Only the ‘model’ parameter of the LOFO algorithm significantly affects the accuracy and F1-score evaluation metric results among the adjustable parameters. We then compared these seven feature selection algorithms using two high-dimensional malware datasets: the Microsoft Malware Prediction dataset and the API Import dataset. Overall results show that AGRM obtained better metric results than other feature selection methods. Behind AGRM, FRUFS, LOFO, MI, and mRMR achieved the best results in different metrics. Compared to MI and mRMR, LOFO is much less used in the malware domain, while FRUFS has not been used before. Since AGRM performs better and FRUFS and LOFO are newer than other algorithms, we decided to continue our work with these three feature selection methods. Finally, we combined three selected feature selection methods, LOFO Importance, FRUFS, and AGRM, to find the most important features and work with fewer features by reducing the multidimensionality. We trained three feature subsets from these feature selection methods with three models, XGBoost, RF, and HGB classifiers, using a stacking ensemble on the Microsoft Malware Prediction dataset and the API Import dataset. From the nine prediction probabilities we obtained, we eliminated the prediction probabilities containing the same information by setting a threshold in the correlation matrix. We gave the final prediction probabilities we obtained to the SVM (Support Vector Machine) meta classifier. Our model obtained an average of 1.2% better classification accuracy than the selected three feature selection methods on one of the well know malware datasets (Microsoft Malware Prediction dataset). For the API Import dataset, our model obtained an average 8% better classification accuracy than LOFO and FRUFS feature selection algorithms, and AGRM could not be used in that comparison due to insufficient RAM. Therefore, our proposed model was trained with fewer features and got better results.
Description
Keywords
Feature Selection, Ensemble, FRUFS, AGRM, LOFO, Malware Classification