LLM Agents
hard
~35 hours
Guardrailed AI Chat with Defense-in-Depth
Build a domain-specific chatbot with layered safety: input classification, output filtering, and structured generation constraints. Evaluate against adversarial prompt injection attacks.
Skills Demonstrated
Prompt injection defense
Input/output classifiers
Structured generation (JSON mode, grammar constraints)
Red-teaming methodology
Implementation Steps
- Build base chatbot with FastAPI + Anthropic SDK streaming
- Add input classifier (fine-tuned DistilBERT) for intent detection
- Implement output filter with regex + semantic similarity checks
- Add structured generation mode using JSON schemas
- Create red-team evaluation suite with 50+ adversarial prompts
- Build dashboard showing blocked attempts and safety metrics
Interview Relevance
Why this project matters for interviews
Safety and guardrails are the #1 concern for production LLM deployments. This project shows you understand defense-in-depth — critical for roles at Anthropic, Google, Meta AI Safety.