Showcase Projects — AI Interview Prep

LLM Agents hard

Multi-Tool RAG Agent with Context Management

Build an LLM agent that orchestrates web search, code execution, and database queries. Implement context compaction to prevent quality degradation over long conversations.

ReAct loop implementation Tool orchestration Context window management Streaming responses

LLM Agents hard

Guardrailed AI Chat with Defense-in-Depth

Build a domain-specific chatbot with layered safety: input classification, output filtering, and structured generation constraints. Evaluate against adversarial prompt injection attacks.

Prompt injection defense Input/output classifiers Structured generation (JSON mode, grammar constraints) Red-teaming methodology

Reinforcement Learning expert

RLHF Pipeline with Process Rewards

Implement a complete RLHF pipeline: supervised fine-tuning, reward model training with process rewards (per-step), and PPO optimization. Compare outcome-based vs process-based reward models.

RLHF implementation Reward model training PPO with KL penalty Process vs outcome rewards

Reinforcement Learning hard

Sim-to-Real Robot Control with Domain Randomization

Train an RL agent in simulation (MuJoCo/Isaac Gym) with domain randomization, then demonstrate transfer to a real or more realistic environment. Track sim-to-real gap metrics.

Sim-to-real transfer Domain randomization SAC/PPO implementation Robotics control

Computer Vision hard

Multi-Scale Defect Detection with SAHI

Build an industrial defect detection system that handles tiny defects using SAHI (tiled inference), FPN-based detection, and a custom evaluation pipeline that tracks detection at each scale.

Object detection (YOLOv8/RT-DETR) SAHI tiled inference Multi-scale evaluation Production deployment with TensorRT

Computer Vision expert

Vision-Language Model Fine-Tuning Pipeline

Fine-tune a vision-language model (LLaVA-style) for a specific domain (e.g., medical imaging, satellite imagery). Build the full pipeline from data curation to evaluation.

Multimodal model fine-tuning Vision-language alignment Domain-specific data curation Evaluation methodology

NLP hard

Production RAG System with Evaluation Pipeline

Build a RAG system with re-ranking, hybrid search (dense + sparse), and a comprehensive evaluation pipeline measuring faithfulness, relevance, and answer correctness.

Hybrid retrieval (BM25 + dense) Cross-encoder re-ranking RAG evaluation (RAGAS framework) Production chunking strategies

NLP hard

Multilingual NER with Zero-Shot Transfer

Build a named entity recognition system that trains on English data and transfers to 5+ languages without target-language training data. Implement entity-level evaluation to measure true generalization.

Cross-lingual NLP Zero-shot transfer Entity-level evaluation methodology Tokenizer-aware data processing

ML System Design hard

End-to-End ML Pipeline with Feature Store

Build a complete ML pipeline: feature engineering with a feature store, model training with experiment tracking, A/B test deployment, and monitoring with drift detection.

Feature store (Feast/Tecton) Experiment tracking (MLflow/W&B) Model serving with latency SLAs Data drift detection

ML System Design expert

Distributed Training Dashboard with Profiling

Build a training orchestration tool that profiles GPU utilization, communication overhead, and MFU across different parallelism strategies (data parallel, tensor parallel, pipeline parallel).

Distributed training GPU profiling Parallelism strategies Performance optimization

AI Engineering hard

AI-Powered Quiz Platform with Adaptive Difficulty

Build a production quiz platform that uses LLMs to generate questions, evaluate answers, and adapt difficulty in real-time. Implement semantic caching and model routing to keep costs under $0.01/request.

LLM API integration with cost optimization Semantic caching (embeddings + similarity search) Adaptive difficulty algorithms FastAPI + PostgreSQL production backend

AI Engineering medium

Real Estate Price Predictor with Full ML Pipeline

End-to-end ML project: data collection, feature engineering, model selection (linear regression through gradient boosting), hyperparameter tuning, and deployment with a REST API.

Feature engineering and selection Model comparison (linear, tree-based, ensemble) Cross-validation and hyperparameter tuning Model deployment with FastAPI

AI Agents hard

Enterprise Customer Support Agent with Memory

Build a multi-turn customer support agent with tiered memory (working, short-term, long-term), tool integration (order lookup, refund processing), and conversation handoff to humans when confidence is low.

Tiered memory architecture Tool integration with error handling Human-in-the-loop escalation Conversation analytics dashboard

AI Agents expert

Self-Healing Multi-Agent Pipeline with Observability

Build a multi-agent system (researcher, writer, reviewer) with self-healing: automatic retry, fallback strategies, anomaly detection, and a real-time observability dashboard showing agent health and cost.

Multi-agent orchestration Self-healing with circuit breakers Cost tracking and budget governance Distributed tracing and observability