Authors
George Edward, Mahdi Eslamimehr, Quandary Peak Research, USA
Abstract
The weaponization of Large Language Models (LLMs) for automated malware generation poses an existential threat to conventional detection paradigms. AI-generated malware exhibits polymorphic, metamorphic, and context-aware evasion capabilities that render signature-based and shallow heuristic defenses obsolete. This paper introduces CogniCrypt, a novel hybrid analysis framework that synergistically combines concolic execution with LLM-augmented path prioritization and deep-learning-based vulnerability classification to detect zero-day AI-generated malware with provable guarantees. We formalize the detection problem within a first-order temporal logic over program execution traces, define a lattice-theoretic abstraction for path constraint spaces, and prove both the soundness and relative completeness of our detection algorithm, assuming classifier correctness. The framework introduces three novel algorithms: (i) an LLM-guided concolic exploration strategy that reduces the average number of explored paths by 73.2% compared to depth-first search while maintaining equivalent malicious-path coverage; (ii) a transformer-based path-constraint classifier trained on symbolic execution traces; and (iii) a feedback loop that iteratively refines the LLM’s prioritization policy using reinforcement learning from detection outcomes. We provide a comprehensive implementation built upon angr 9.2, Z3 4.12, Hugging Face Transformers 4.38, and PyTorch 2.2, with full configuration details enabling reproducibility. Experimental evaluation on the EMBER, Malimg, SOREL-20M, and a novel AI-Gen-Malware benchmark comprising 2,500 LLM-synthesized samples demonstrates that CogniCrypt achieves 98.7% accuracy on conventional malware and 97.5% accuracy on AI-generated threats, outperforming ClamAV, YARA, MalConv, and EMBER-GBDT baselines by margins of 8.4-52.2 percentage points on AI-generated samples.
Keywords
Concolic Execution, Large Language Models, AI-Generated Malware, Symbolic Execution, Vulnerability Discovery, Software Security, Formal Verification, Deep Learning, Zero-Day Detection, Secure Coding.