LLM Agents hard
Multi-Tool RAG Agent with Context Management
Build an LLM agent that orchestrates web search, code execution, and database queries. Implement context compaction to prevent quality degradation over long conversations.
ReAct loop implementation Tool orchestration Context window management Streaming responses
LLM Agents hard
Guardrailed AI Chat with Defense-in-Depth
Build a domain-specific chatbot with layered safety: input classification, output filtering, and structured generation constraints. Evaluate against adversarial prompt injection attacks.
Prompt injection defense Input/output classifiers Structured generation (JSON mode, grammar constraints) Red-teaming methodology
Reinforcement Learning expert
RLHF Pipeline with Process Rewards
Implement a complete RLHF pipeline: supervised fine-tuning, reward model training with process rewards (per-step), and PPO optimization. Compare outcome-based vs process-based reward models.
RLHF implementation Reward model training PPO with KL penalty Process vs outcome rewards
Reinforcement Learning hard
Sim-to-Real Robot Control with Domain Randomization
Train an RL agent in simulation (MuJoCo/Isaac Gym) with domain randomization, then demonstrate transfer to a real or more realistic environment. Track sim-to-real gap metrics.
Sim-to-real transfer Domain randomization SAC/PPO implementation Robotics control
Computer Vision hard
Multi-Scale Defect Detection with SAHI
Build an industrial defect detection system that handles tiny defects using SAHI (tiled inference), FPN-based detection, and a custom evaluation pipeline that tracks detection at each scale.
Object detection (YOLOv8/RT-DETR) SAHI tiled inference Multi-scale evaluation Production deployment with TensorRT
Computer Vision expert
Vision-Language Model Fine-Tuning Pipeline
Fine-tune a vision-language model (LLaVA-style) for a specific domain (e.g., medical imaging, satellite imagery). Build the full pipeline from data curation to evaluation.
Multimodal model fine-tuning Vision-language alignment Domain-specific data curation Evaluation methodology
NLP hard
Production RAG System with Evaluation Pipeline
Build a RAG system with re-ranking, hybrid search (dense + sparse), and a comprehensive evaluation pipeline measuring faithfulness, relevance, and answer correctness.
Hybrid retrieval (BM25 + dense) Cross-encoder re-ranking RAG evaluation (RAGAS framework) Production chunking strategies
NLP hard
Multilingual NER with Zero-Shot Transfer
Build a named entity recognition system that trains on English data and transfers to 5+ languages without target-language training data. Implement entity-level evaluation to measure true generalization.
Cross-lingual NLP Zero-shot transfer Entity-level evaluation methodology Tokenizer-aware data processing
ML System Design hard
End-to-End ML Pipeline with Feature Store
Build a complete ML pipeline: feature engineering with a feature store, model training with experiment tracking, A/B test deployment, and monitoring with drift detection.
Feature store (Feast/Tecton) Experiment tracking (MLflow/W&B) Model serving with latency SLAs Data drift detection
ML System Design expert
Distributed Training Dashboard with Profiling
Build a training orchestration tool that profiles GPU utilization, communication overhead, and MFU across different parallelism strategies (data parallel, tensor parallel, pipeline parallel).
Distributed training GPU profiling Parallelism strategies Performance optimization
AI Engineering hard
AI-Powered Quiz Platform with Adaptive Difficulty
Build a production quiz platform that uses LLMs to generate questions, evaluate answers, and adapt difficulty in real-time. Implement semantic caching and model routing to keep costs under $0.01/request.
LLM API integration with cost optimization Semantic caching (embeddings + similarity search) Adaptive difficulty algorithms FastAPI + PostgreSQL production backend
AI Engineering medium
Real Estate Price Predictor with Full ML Pipeline
End-to-end ML project: data collection, feature engineering, model selection (linear regression through gradient boosting), hyperparameter tuning, and deployment with a REST API.
Feature engineering and selection Model comparison (linear, tree-based, ensemble) Cross-validation and hyperparameter tuning Model deployment with FastAPI
AI Agents hard
Enterprise Customer Support Agent with Memory
Build a multi-turn customer support agent with tiered memory (working, short-term, long-term), tool integration (order lookup, refund processing), and conversation handoff to humans when confidence is low.
Tiered memory architecture Tool integration with error handling Human-in-the-loop escalation Conversation analytics dashboard
AI Agents expert
Self-Healing Multi-Agent Pipeline with Observability
Build a multi-agent system (researcher, writer, reviewer) with self-healing: automatic retry, fallback strategies, anomaly detection, and a real-time observability dashboard showing agent health and cost.
Multi-agent orchestration Self-healing with circuit breakers Cost tracking and budget governance Distributed tracing and observability