Robust Android Malware Detection Competition

The Robust Android Malware Detection Competition, linked to the Cybersecurity Use Case of ELSA EU project, aims to evaluate machine learning methods when they are used as a first line of defense against malicious software (malware) targeting the Android Operating System. On this task, machine learning usually performs well, learning common patterns from data and enabling the detection of potentially never-before-seen malware samples. However, it has been shown that those detectors (i) tend to exhibit a rapid performance decay over time due to the natural evolution of samples and (ii) can be bypassed by slightly manipulating malware samples in an adversarial manner. The practical impact of these two issues is that current learning-based malware detectors need constant updates and retraining on newly collected and labeled data.

We propose a threefold benchmark to provide tools for comparing AI-based Android malware detectors in a realistic setting. They challenge the research community to go beyond simplistic assumptions to ultimately design more robust AI models that can be maintained and updated more efficiently, saving human labor and effort. The competition is deployed in periodical evaluation rounds and is structured in three separate tracks:

  • Track 1: Adversarial Robustness to Feature-space Attacks.
    In this scenario, we aim to measure how much the models' predictions change against increasing amounts of adversarial manipulations, assuming the attacker knows the features used and the model itself and has unrestricted access to it. A feature-space evasion attack will be performed on test applications, perturbing the feature vector with constraints to ensure that applying these manipulations to an APK preserves its malicious functionalities. The applied perturbation is bounded based on the number of modified features.
  • Track 2: Adversarial Robustness to Problem-space Attacks.
    The problem-space attack scenario consists of manipulating the APK files directly rather than only simulating the effect of the attack at the feature level. In this case, we assume the attacker does not know the target model and its features. An input-space evasion attack will be performed on the test applications, applying functionality-preserving manipulation to the APKs. The applied manipulations are bounded based on the size of the injected data.
  • Track 3: Temporal Robustness to Data Drift.
    In this setting, we will collect the performance evaluation of the given AI-based detectors with respect to (future) data collected over time.


Authors


Angelo Sotgiu
University of Cagliari
CINI
Maura Pintor
University of Cagliari
CINI
Ambra Demontis
University of Cagliari
CINI
Battista Biggio
University of Cagliari
CINI

Acknowledgments



ELSA
+