The Data Scientist Wannabe

"Jupyter notebooks as far as the eye can see."

vibe Talks about "the model" like it's sentient. Has never deployed one.

Python, pandas, and a folder full of .ipynb files that tell the story of three abandoned Kaggle competitions and a correlation you swear is significant. The model is never quite prod-ready.

Typical stack

Python Jupyter pandas scikit-learn matplotlib Kaggle

Known examples

Early fast.ai students Jeremy Howard made this archetype legitimate — the notebooks that went further became real ML engineers

The Kaggle leaderboard grinder Top 1% on toy datasets, 0 production deployments

Signature traits

→ GitHub littered with .ipynb files that only run locally
→ Has trained at least one model that "achieves 94% accuracy" on the training set
→ README includes a confusion matrix screenshot
→ Knows scikit-learn better than the Python standard library

Strengths

✓ Comfortable with data manipulation and statistical thinking
✓ Can explore and visualize datasets quickly
✓ Understands the ML pipeline end-to-end in theory

Watch out for

⚑ Notebooks ≠ software — nothing is production-deployable
⚑ Data leakage and overfitting are politely ignored
⚑ Software engineering fundamentals often lacking

How to level up

Deploy something. Take your best notebook, rewrite it as a proper Python module with tests, and serve it via a simple API. The gap between "notebook works" and "model in production" is where real ML engineers live.

Is this you? Find out for real.

Roast my stack

← All archetypes