AI Engineering Quiz — AI Interview Prep

medium AI Engineering

The Decision Tree Overfitting Trap

Your decision tree achieves 99.8% training accuracy and 72% test accuracy. You increase max_depth hoping to capture more patterns. Test accuracy drops to 68%. What's the correct fix?

overfitting tree pruning ensemble methods Random Forest bias-variance tradeoff

The Trap

Deep decision trees memorize training data by creating hyper-specific rules for individual examples. Increasing depth makes this worse -- it's adding complexity to an already overfit model.

Correct Approach

Constrain the tree: limit max_depth (3-10 typical), set min_samples_leaf (5-20), use min_samples_split. Better yet, use ensemble methods: Random Forest (reduces variance via bagging) or Gradient Boosting (reduces bias via sequential learning). Use cross-validation to tune hyperparameters.

Follow-up Questions

How does Random Forest reduce variance without increasing bias?
When would you choose Gradient Boosting over Random Forest?

Back to Categories

hard AI Engineering

The API Cost Explosion Trap

Your AI quiz platform costs $0.12 per request using GPT-4. At 10K daily users, that's $36K/month. The product team says maximum budget is $3K/month. How do you reduce cost by 12x without killing quality?

semantic caching model routing cost optimization fine-tuning distillation

The Trap

Naively switching to a cheaper model drops quality. The real issue is that most requests don't need the full power of GPT-4 -- many are cacheable, classifiable by difficulty tier, or can be handled by a fine-tuned smaller model.

Correct Approach

Layer the solution: (1) Semantic cache -- identical/similar questions return cached responses (50-70% cache hit rate typical). (2) Model routing -- easy questions to GPT-3.5/Haiku, hard ones to GPT-4/Opus. (3) Fine-tune a small model on your specific domain using GPT-4 outputs as training data. (4) Batch non-urgent requests. Target: $0.01/request = $3K/month at 10K users/day.

Follow-up Questions

How do you build a semantic cache with high hit rates?
What metrics determine which model tier handles a request?

hard AI Engineering

The Fraud Detection Imbalance Trap

Your credit card fraud detection model reports 99.7% accuracy. The team celebrates. But fraud losses haven't decreased at all. What went wrong?

class imbalance precision-recall SMOTE anomaly detection business metrics

The Trap

With 0.3% fraud rate, a model that predicts 'not fraud' for everything achieves 99.7% accuracy. Accuracy is a meaningless metric for imbalanced classification.

Correct Approach

Switch metrics: use precision-recall AUC, F1-score, or Matthews Correlation Coefficient. Apply class rebalancing: SMOTE for oversampling, or class weights in the loss function. Use anomaly detection approaches (Isolation Forest) as complement. Set business-relevant thresholds: cost of false negative vs false positive.

Follow-up Questions

How do you choose between oversampling and undersampling?
What's the cost-sensitive learning approach to imbalanced data?

medium AI Engineering

The Prompt Engineering vs Fine-Tuning Trap

Your team spent 3 weeks crafting elaborate prompts for a classification task. Accuracy is 78%. A junior engineer fine-tunes a small model in 2 days and hits 92%. When should you prompt-engineer vs fine-tune?

prompt engineering limits fine-tuning distillation cost-latency tradeoff

The Trap

Prompt engineering has diminishing returns for structured tasks with clear labels. Complex prompts add latency, cost, and brittleness. Fine-tuning on even 500 labeled examples usually outperforms prompting for classification.

Correct Approach

Decision framework: Use prompting for exploratory tasks, few-shot prototyping, and open-ended generation. Fine-tune when you have labeled data (>200 examples), need consistent structured output, or cost/latency matters. Consider distillation: use a large model to label data, then fine-tune a small model on those labels.

Follow-up Questions

What's the minimum dataset size for effective fine-tuning?
How do you handle distribution shift after fine-tuning?

medium AI Engineering

The Feature Scaling Silence Trap

Your gradient descent converges in 10,000 iterations for one dataset but diverges on another with similar size. Both datasets have the same features. What's the likely cause?

feature scaling gradient descent convergence StandardScaler loss landscape geometry

The Trap

Unscaled features with vastly different ranges (e.g., age 0-100 vs salary 10K-500K) create elongated contours in the loss landscape. Gradient descent oscillates along the steep dimension and crawls along the flat one.

Correct Approach

Apply feature scaling: StandardScaler (zero mean, unit variance) for normally distributed features, MinMaxScaler for bounded features, RobustScaler for outlier-heavy data. Always fit on training data, transform on test. For tree-based models, scaling is unnecessary -- they split on thresholds, not distances.

Follow-up Questions

Why don't tree-based models need feature scaling?
How does batch normalization relate to input scaling?

AI Engineering

The Trap

Correct Approach

Follow-up Questions

The Trap

Correct Approach

Follow-up Questions

The Trap

Correct Approach

Follow-up Questions

The Trap

Correct Approach

Follow-up Questions

The Trap

Correct Approach

Follow-up Questions

Quiz Complete!