The next decade's attack surface is a transformer.
Every chatbot, every agent, every multimodal system shipping to production is a new vector — for hallucination, jailbreak, prompt injection, autonomous misbehavior, and failure modes nobody has named yet.
defenders are outnumbered. attackers improvise in public. vendors race each other.
Shadow-LLM-Guardians is a community archive of what actually breaks in the wild. Reproducibly. Citably. Without NDAs.
─────────────────────────────────────────────────────────────────────
We are Shadow-LLM-Guardians — a working group of researchers, red teamers, and engineers cataloguing the failures of frontier AI systems. The archive is the first surface. The team is forming.
The Archive
Every documented failure case — hallucinations, jailbreaks, prompt injections, agent loops, destructive actions, over-refusals, sycophancy. Reproducibility, threat model, and provenance attached to every entry. Citable by paper, by analyst, by anyone.
Reproducers & Defenses
Open toolchains that re-execute submitted cases against current model versions. Regression dashboards. Defense recipes. A growing benchmark suite the next blue-team engineer can pull and run.
A Standing Red/Blue Team for the AI Age
Shadow-LLM-Guardians began as a domain registered in 2023. It will not stay an archive. The long game: a permanent, independent attack-defense capability for the systems the rest of the world depends on but rarely audits.
─────────────────────────────────────────────────────────────────────
// what we collect — hallucinations, jailbreaks, prompt injections, agent loops, destructive actions, over-refusals, sycophancy, alignment failures, tool misuse, multimodal failures, and the long tail of weird behavior that doesn't have a name yet.
// what we don't collect — attack tutorials with no defensive value, zero-day exploits before responsible disclosure, content that targets individuals, anything that would harm vulnerable people if amplified.