Skip to the content.

🔐 Scanning LLM Models for Security Vulnerabilities

Scanning Large Language Models (LLMs) for security vulnerabilities involves a mix of static analysis, dynamic testing, adversarial input crafting, and monitoring anomalous behaviors. LLMs introduce unique threats like prompt injection, data leakage, and unauthorized tool use.


1. Identify Security Objectives and Threats

Use frameworks such as:


2. Use Specialized LLM Security Scanning Tools

Tool Use Case Description
Garak Fuzzing & probing Open-source scanner for prompt injection, jailbreaks, and leaking
PyRIT Red-teaming Framework from Microsoft for adversarial prompt simulation
SecGPT Attack/defense Evaluates models against security-relevant input
Aegis LLM firewall Detects and blocks malicious or jailbreak prompts
LLMGuard Prompt validation Validates and sanitizes LLM inputs/outputs
PromptBench Benchmarking Tests model robustness against adversarial inputs

3. Run Security Tests

✅ Input Sanitization Checks

✅ Output Leakage Tests

✅ Role Escalation & Plugin Abuse

✅ Adversarial Prompt Injection


4. Perform Red Teaming Exercises


5. Monitor and Audit Model Behavior


6. Secure the LLM Environment


7. Include Security in Model Evaluation Pipelines


📚 Additional Resources


references: Detect Malicious AI Models ML Model Security