SYS:ONLINELAT:n/aBUILD:a744dbb
cat /etc/motd

The next decade's attack surface is a transformer.

Every chatbot, every agent, every multimodal system shipping to production is a new vector — for hallucination, jailbreak, prompt injection, autonomous misbehavior, and failure modes nobody has named yet.

defenders are outnumbered. attackers improvise in public. vendors race each other.

Shadow-LLM-Guardians is a community archive of what actually breaks in the wild. Reproducibly. Citably. Without NDAs.

cases.indexed
001
cases.active
001
auth.required
github
archive.policy
open
mission_statement.md
─────────────────────────────────────────────────────────────────────

We are Shadow-LLM-Guardians — a working group of researchers, red teamers, and engineers cataloguing the failures of frontier AI systems. The archive is the first surface. The team is forming.

[NOW]

The Archive

Every documented failure case — hallucinations, jailbreaks, prompt injections, agent loops, destructive actions, over-refusals, sycophancy. Reproducibility, threat model, and provenance attached to every entry. Citable by paper, by analyst, by anyone.

[NEXT]

Reproducers & Defenses

Open toolchains that re-execute submitted cases against current model versions. Regression dashboards. Defense recipes. A growing benchmark suite the next blue-team engineer can pull and run.

[LATER]

A Standing Red/Blue Team for the AI Age

Shadow-LLM-Guardians began as a domain registered in 2023. It will not stay an archive. The long game: a permanent, independent attack-defense capability for the systems the rest of the world depends on but rarely audits.

─────────────────────────────────────────────────────────────────────

// what we collect — hallucinations, jailbreaks, prompt injections, agent loops, destructive actions, over-refusals, sycophancy, alignment failures, tool misuse, multimodal failures, and the long tail of weird behavior that doesn't have a name yet.

// what we don't collect — attack tutorials with no defensive value, zero-day exploits before responsible disclosure, content that targets individuals, anything that would harm vulnerable people if amplified.

join_us