Guardrailed AI Chat with Defense-in-Depth

LLM Agents hard ~35 hours

Build a domain-specific chatbot with layered safety: input classification, output filtering, and structured generation constraints. Evaluate against adversarial prompt injection attacks.

Skills Demonstrated

Prompt injection defense Input/output classifiers Structured generation (JSON mode, grammar constraints) Red-teaming methodology

Implementation Steps

Build base chatbot with FastAPI + Anthropic SDK streaming
Add input classifier (fine-tuned DistilBERT) for intent detection
Implement output filter with regex + semantic similarity checks
Add structured generation mode using JSON schemas
Create red-team evaluation suite with 50+ adversarial prompts
Build dashboard showing blocked attempts and safety metrics

Interview Relevance

Why this project matters for interviews Safety and guardrails are the #1 concern for production LLM deployments. This project shows you understand defense-in-depth — critical for roles at Anthropic, Google, Meta AI Safety.