keyboard_arrow_up
Feature Selection with Random Forest for Ransomware Detection

Authors

Qingzhong Liu, Sam Houston State University, USA

Abstract

Ransomware remains a critical cybersecurity challenge, necessitating advanced detection methods to mitigate its impact. This study investigates the efficacy of Random Forest (RF)-based feature selection for ransomware detection. Two datasets were analyzed: a large-scale Android ransomware dataset from Kaggle and the Ransomware Dataset 2024, which includes diverse malware families such as Cerber, REvil, and WannaCry. Using RF's feature importance ranking, we conducted classification experiments across binary, multi-class (5 categories), and granular (27 families) tasks. For the Kaggle dataset, a refined feature subset preserved classification accuracy while eliminating redundancy. Feature Set 1 achieved peak accuracy, surpassing earlier RF-based benchmarks, while Feature Set 5 balanced accuracy and stability, demonstrating diminishing returns with excessive features. For the 2024 dataset, binary classification peaked at 99.45% accuracy, multi-class at 95.91%, and family-level classification at 91.02%, highlighting feature selection's role in optimizing detection across granularities. These results align with RF's established superiority in ransomware detection, especially on feature selection.

Keywords

Random Forest, Ransomware Detection, Feature Selection, Feature Importance

Full Text  Volume 15, Number 10