Browsing by Author "Cayir, Aykut"

Now showing 1 - 3 of 3

Citation - WoS: 26
Citation - Scopus: 41
An ensemble of pre-trained transformer models for imbalanced multiclass malware classification
(Elsevier Advanced Technology, 2022) Dağ, Hasan; Demirkıran, Ferhat; Unal, Gur; Dag, Hasan; Management Information Systems
Classification of malware families is crucial for a comprehensive understanding of how they can infect devices, computers, or systems. Hence, malware identification enables security researchers and incident responders to take precautions against malware and accelerate mitigation. API call sequences made by malware are widely utilized features by machine and deep learning models for malware classification as these sequences represent the behavior of malware. However, traditional machine and deep learning models remain incapable of capturing sequence relationships among API calls. Unlike traditional machine and deep learning models, the transformer-based models process the sequences in whole and learn relationships among API calls due to multi-head attention mechanisms and positional embeddings. Our experiments demonstrate that the Transformer model with one transformer block layer surpasses the performance of the widely used base architecture, LSTM. Moreover, BERT or CANINE, the pre-trained transformer models, outperforms in classifying highly imbalanced malware families according to evaluation metrics: F1-score and AUC score. Furthermore, our proposed bagging-based random transformer forest (RTF) model, an ensemble of BERT or CANINE, reaches the state-of-the-art evaluation scores on the three out of four datasets, specifically it captures a state-of-the-art F1-score of 0.6149 on one of the commonly used benchmark dataset. (C) 2022 Elsevier Ltd. All rights reserved.
Citation - WoS: 6
Citation - Scopus: 12
Network Intrusion Detection System by Learning Jointly From Tabular and Text-Based Features
(Wiley, 2024) Duzgun, Berkant; Dağ, Hasan; Cayir, Aykut; Unal, Ugur; Dag, Hasan; Management Information Systems
Network intrusion detection systems (NIDS) play a critical role in maintaining the security and integrity of computer networks. These systems are designed to detect and respond to anomalous activities that may indicate malicious intent or unauthorized access. The need for robust NIDS solutions has never been more pressing in today's digital landscape, characterized by constantly evolving cyber threats. Deploying effective NIDS can be challenging, particularly in accurately identifying network anomalies amid the ever-increasing sophisticated and difficult-to-detect cyber threats. The motivation for our research stems from the recognition that while NIDS studies have made significant strides, there remains a crucial need for more effective and accurate methods to detect network anomalies. Commonly used features in NIDS studies include network logs, with some studies exploring text-based features such as payload. However, traditional machine and deep learning models may need to be improved in learning jointly from tabular and text-based features. Here, we present a new approach that integrates both tabular and text-based features to improve the performance of NIDS. Our research aims to address the existing limitations of NIDS and contribute to the development of more reliable and efficient network security solutions by introducing more effective and accurate methods for detecting network anomalies. Our internal experiments have revealed that the deep learning approach utilizing tabular features produces favourable results, whereas the pre-trained transformer approach needs to perform sufficiently. Hence, our proposed approach, which integrates both feature types using deep learning and pre-trained transformer approaches, achieves superior performance. These findings indicate that integrating both feature types using deep learning and pre-trained transformer approaches can significantly improve the accuracy of network anomaly detection. Moreover, our proposed approach outperforms the state-of-the-art methods in terms of accuracy, F1-score, and recall on commonly used NIDS datasets consisting of ISCX-IDS2012, UNSW-NB15, and CIC-IDS2017, with F1-scores of 99.80%, 92.37%, and 99.69%, respectively, indicating its effectiveness in detecting network anomalies.
Performance Comparison of Locality Sensitive Hashing and Random Forest Algorithms for Handwritten Digits Recognition
(Kadir Has Üniversitesi, 2014) Cayir, Aykut; Arsan, Taner; Arsan, Taner; Computer Engineering
The significant increase in data created has caused to come out a new concept which is called big data. in addition to that multidimensional data instances in big data sets have many new features. Therefore some problems become much more critical for data analysis in big data sets. One of these very important problems is classification of multidimensional data instances in big data sets in a reasonable time. Classification is also related to K-Nearest Neighbors problem in machine learning and data mining areas. A perfect example of the classification problem is object or pattern recognition for images in real world applications. Pattern or object recognition can be reduced to similarity search problem. in this work we focused on the similarity search problem in large scale databases. Firstly we implemented two popular machine learning algorithms: Locality Sensitive Hashing (LSH) and Random Forest (RF) with the Python programming language. Then we compared these two parameter-dependent algorithms in two different handwritten digits-characters datasets: MNiST and NOTMNiST. in the experiments we examined the algorithms performance in terms of recognition accuracy CPU time for various algorithm specific parameters. Finally we observed that LSH and RF exhibit positive and negative features according to their parameters and we reached the conclusion that LSH is more useful for time critical applications and RF is more favorable for accuracy critical applications. -- Abstract'tan.