LLM Agents
hard
Multi-Tool RAG Agent with Context Management
Build an LLM agent that orchestrates web search, code execution, and database queries. Implement context compaction to prevent quality degradation over long conversations.
ReAct loop implementation
Tool orchestration
Context window management
Streaming responses
LLM Agents
hard
Guardrailed AI Chat with Defense-in-Depth
Build a domain-specific chatbot with layered safety: input classification, output filtering, and structured generation constraints. Evaluate against adversarial prompt injection attacks.
Prompt injection defense
Input/output classifiers
Structured generation (JSON mode, grammar constraints)
Red-teaming methodology
Reinforcement Learning
expert
RLHF Pipeline with Process Rewards
Implement a complete RLHF pipeline: supervised fine-tuning, reward model training with process rewards (per-step), and PPO optimization. Compare outcome-based vs process-based reward models.
RLHF implementation
Reward model training
PPO with KL penalty
Process vs outcome rewards
Reinforcement Learning
hard
Sim-to-Real Robot Control with Domain Randomization
Train an RL agent in simulation (MuJoCo/Isaac Gym) with domain randomization, then demonstrate transfer to a real or more realistic environment. Track sim-to-real gap metrics.
Sim-to-real transfer
Domain randomization
SAC/PPO implementation
Robotics control
Computer Vision
hard
Multi-Scale Defect Detection with SAHI
Build an industrial defect detection system that handles tiny defects using SAHI (tiled inference), FPN-based detection, and a custom evaluation pipeline that tracks detection at each scale.
Object detection (YOLOv8/RT-DETR)
SAHI tiled inference
Multi-scale evaluation
Production deployment with TensorRT
Computer Vision
expert
Vision-Language Model Fine-Tuning Pipeline
Fine-tune a vision-language model (LLaVA-style) for a specific domain (e.g., medical imaging, satellite imagery). Build the full pipeline from data curation to evaluation.
Multimodal model fine-tuning
Vision-language alignment
Domain-specific data curation
Evaluation methodology
NLP
hard
Production RAG System with Evaluation Pipeline
Build a RAG system with re-ranking, hybrid search (dense + sparse), and a comprehensive evaluation pipeline measuring faithfulness, relevance, and answer correctness.
Hybrid retrieval (BM25 + dense)
Cross-encoder re-ranking
RAG evaluation (RAGAS framework)
Production chunking strategies
NLP
hard
Multilingual NER with Zero-Shot Transfer
Build a named entity recognition system that trains on English data and transfers to 5+ languages without target-language training data. Implement entity-level evaluation to measure true generalization.
Cross-lingual NLP
Zero-shot transfer
Entity-level evaluation methodology
Tokenizer-aware data processing
ML System Design
hard
End-to-End ML Pipeline with Feature Store
Build a complete ML pipeline: feature engineering with a feature store, model training with experiment tracking, A/B test deployment, and monitoring with drift detection.
Feature store (Feast/Tecton)
Experiment tracking (MLflow/W&B)
Model serving with latency SLAs
Data drift detection
ML System Design
expert
Distributed Training Dashboard with Profiling
Build a training orchestration tool that profiles GPU utilization, communication overhead, and MFU across different parallelism strategies (data parallel, tensor parallel, pipeline parallel).
Distributed training
GPU profiling
Parallelism strategies
Performance optimization
AI Engineering
hard
AI-Powered Quiz Platform with Adaptive Difficulty
Build a production quiz platform that uses LLMs to generate questions, evaluate answers, and adapt difficulty in real-time. Implement semantic caching and model routing to keep costs under $0.01/request.
LLM API integration with cost optimization
Semantic caching (embeddings + similarity search)
Adaptive difficulty algorithms
FastAPI + PostgreSQL production backend
AI Engineering
medium
Real Estate Price Predictor with Full ML Pipeline
End-to-end ML project: data collection, feature engineering, model selection (linear regression through gradient boosting), hyperparameter tuning, and deployment with a REST API.
Feature engineering and selection
Model comparison (linear, tree-based, ensemble)
Cross-validation and hyperparameter tuning
Model deployment with FastAPI
AI Agents
hard
Enterprise Customer Support Agent with Memory
Build a multi-turn customer support agent with tiered memory (working, short-term, long-term), tool integration (order lookup, refund processing), and conversation handoff to humans when confidence is low.
Tiered memory architecture
Tool integration with error handling
Human-in-the-loop escalation
Conversation analytics dashboard
AI Agents
expert
Self-Healing Multi-Agent Pipeline with Observability
Build a multi-agent system (researcher, writer, reviewer) with self-healing: automatic retry, fallback strategies, anomaly detection, and a real-time observability dashboard showing agent health and cost.
Multi-agent orchestration
Self-healing with circuit breakers
Cost tracking and budget governance
Distributed tracing and observability