학술논문

CANAL - Cyber Activity News Alerting Language Model : Empirical Approach vs. Expensive LLMs
Document Type
Conference
Source
2024 IEEE 3rd International Conference on AI in Cybersecurity (ICAIC) AI in Cybersecurity (ICAIC), 2024 IEEE 3rd International Conference on. :1-12 Feb, 2024
Subject
Communication, Networking and Broadcast Technologies
Engineering Profession
General Topics for Engineers
Large Language Models (LLM)
BERT
Natural Language Processing (NLP)
Machine Learning
Generative AI (Gen AI)
Cyber Risk Modeling
Cyber Signal Discovery
Cyber News Alerts
Empirical Cost Analysis
Language
Abstract
In today’s digital landscape, where cyber attacks have become the norm, the detection of cyber attacks and threats is critically imperative across diverse domains. Our research presents a new empirical framework for cyber threat modeling, adept at parsing and categorizing cyber-related information from news articles, enhancing real-time vigilance for market stakeholders. At the core of this framework is a fine-tuned BERT model, which we call CANAL - Cyber Activity News Alerting Language Model, tailored for cyber categorization using a novel silver labeling approach powered by Random Forest. We benchmark CANAL against larger, costlier LLMs, including GPT-4, LLaMA, and Zephyr, highlighting their zero to few-shot learning in cyber news classification. CANAL demonstrates superior performance by outperforming all other LLM counterparts in both accuracy and cost-effectiveness. Furthermore, we introduce the Cyber Signal Discovery module, a strategic component designed to efficiently detect emerging cyber signals from news articles. Collectively, CANAL and Cyber Signal Discovery module equip our framework to provide a robust and cost-effective solution for businesses that require agile responses to cyber intelligence.