Discovering and
mitigating LLM
failure modes
Research Focus
/ Core methodologies
01
Adaptive Stress Testing
Using reinforcement learning agents to systematically discover vulnerabilities and failure modes in LLMs
02
Deterministic Methods
Ensuring reproducibility through batch-invariant inference to reliably trigger and analyze genuine model weaknesses
03
Failure Mode Analysis
Analyzing hidden state representations and decoding adversarial triggers to identify patterns, create taxonomies, and reveal linguistic structures that cause failures
Founding Team
/ Researchers from Norwegian University of Science and Technology (NTNU) and UC Berkeley