LogoPipeline
ISSUE #214 IS LIVE

The AutoML stack, benchmarked
and delivered every Tuesday.

Neural architecture search to no-code deployment — Pipeline dissects what moved, what shipped, and what's worth your attention before the standup. For engineers who build with data for a living.

Read This Week's IssueView Benchmarks
pipeline-dashboard — kernel connected
live
ISSUE #214 · FEB 25, 2026

AutoKeras 2.1 vs. Optuna 3.4: The Latency Gap That Matters

The headline numbers look close. AutoKeras 2.1 clocks in at 2.3s median inference on the Kaggle tabular benchmark suite; Optuna-tuned XGBoost lands at 2.1s. But the distribution tells a different story — AutoKeras' 95th-percentile latency spikes to 11.7s, nearly 3× Optuna's 4.2s. If you're serving real-time predictions, that tail is your SLA breach waiting to happen.

Accuracy Delta
+1.4%
AutoKeras vs baseline
P95 Latency
11.7s
AutoKeras spike
Setup Time
12 min
vs 4.5h hand-tuning
Memory
2.3×
overhead vs XGBoost
Read This Week's Issue
TABULAR CLASSIFICATION · KAGGLE SUITE · FEB 2026

AutoML Platform Rankings

Updated weekly
#FrameworkF1 Score ↓P50 LatencyDeploy ComplexityAdoption %
01H2O AutoML0.9241.8sMedium68%
02AutoKeras 2.10.9182.3sLow74%
03FLAML0.9121.4sHigh51%
04AutoGluon 1.20.9073.1sLow62%
05Optuna 3.40.8992.1sHigh83%
TOP DISCUSSIONS · THIS WEEK

Community Pulse

1,247 active

Is FLAML actually production-ready or still a research toy?

We've been running FLAML in staging for 6 weeks. The good: setup is genuinely 15 minutes. The bad: it silently ignores feature interaction constraints you set in the config...

Priya VenkataramanML Lead · Stripe34 replies
FLAMLProduction

AutoGluon 1.2 multimodal — anyone running it on < 16GB VRAM?

The new multimodal pipeline is impressive on paper. Tested on an A10G and it fits, but only if you disable the ensemble stacking layer. Anyone found a workaround that...

Marcus OkonkwoSenior DS · Cohere21 replies
AutoGluonGPU

NAS vs. HPO in 2026 — are they converging or diverging?

After the DARTS v3 paper last month, I'm not sure where the line is anymore. The search space overlaps significantly with HPO when you account for...

Sofia LindqvistResearch Eng · HuggingFace58 replies
NASHPOResearch
AutoKeras 2.1 released · P95 latency regression flagged·H2O 3.46 · AutoML accuracy record on Kaggle tabular suite·FLAML 2.3 · LightGBM backend updated · 12% throughput gain·DARTS v3 paper · NAS convergence in 4 GPU-hours·AutoGluon 1.2 multimodal · requires CUDA 12.1+·Optuna 3.4 · TPE sampler rewrite · 23% faster wall-clock·PyCaret 3.3 · Polars backend (beta) · 3× faster preprocessing·TPOT 0.12 · maintenance mode · no new features planned·AutoKeras 2.1 released · P95 latency regression flagged·H2O 3.46 · AutoML accuracy record on Kaggle tabular suite·FLAML 2.3 · LightGBM backend updated · 12% throughput gain·DARTS v3 paper · NAS convergence in 4 GPU-hours·AutoGluon 1.2 multimodal · requires CUDA 12.1+·Optuna 3.4 · TPE sampler rewrite · 23% faster wall-clock·PyCaret 3.3 · Polars backend (beta) · 3× faster preprocessing·TPOT 0.12 · maintenance mode · no new features planned·
02 / FRAMEWORK BENCHMARKS

The table your team will screenshot.

Kaggle tabular classification suite · 15 datasets · median of 5 runs each. Click any column header to re-rank. Updated weekly.

Last updated Feb 25, 2026
pipeline_benchmark_v2.4.csv — 7 frameworks · 8 metrics
#FrameworkF1 ScoreP50 Lat.P95 Lat.Memory ×SetupDeployAdoptionVerdict
01
H2O AutoMLv3.46
0.9241.8s5.2s1.4×8 minMedium68%Best accuracy
02
AutoKerasv2.1
0.9182.3s11.7s2.3×12 minLow74%Watch P95
03
FLAMLv2.3
0.9121.4s3.9s1.1×15 minHigh51%Fastest
04
AutoGluonv1.2
0.9073.1s8.4s3.1×6 minLow62%Best multimodal
05
Optunav3.4
0.8992.1s4.2s0.9×45 minHigh83%Most adopted
06
TPOTv0.12
0.8914.7s14.1s1.8×20 minMedium39%Declining
07
PyCaretv3.3
0.8842.8s6.3s1.6×10 minLow71%Best DX
Methodology: 15 Kaggle tabular datasets · 5 runs each · AWS c5.2xlarge · Python 3.12Get Full Report in Your Inbox →
03 / PAST EDITIONS

Depth is the only thing we don't compress.

Every issue goes past the announcement. We run the code, check the math, and tell you what the benchmark paper didn't.

Browse all 214 issues
Abstract neural network visualization with blue glowing nodes on dark background
#21112 min
NASDARTSDeep LearningFeb 4, 2026

Neural Architecture Search Is Finally Boring (And That's Good)

DARTS v3 landed quietly last month. No blog post, no Twitter thread — just a commit and a paper. We ran it against EfficientNetV2-S on ImageNet and the story is more nuanced than the abstract suggests.

Read issue
Server room with rows of lit up rack servers in blue light
#2089 min
DeploymentMLOpsBenchmarksJan 14, 2026

The No-Code Model Deployment Trap

Three platforms promise one-click deployment. We deployed the same XGBoost model to all three and measured what "one click" actually costs — in latency, in money, and in the debugging hours you'll never get back.

Read issue
Data analytics dashboard on monitor showing charts and graphs in dark mode
#20415 min
AutoGluonProductionCase StudyDec 17, 2025

AutoGluon 1.2's Multimodal Pipeline: 6-Week Production Report

We shipped it. Here's what broke, what held, and the one config flag that makes or breaks inference performance at scale. Spoiler: it's not the one in the docs.

Read issue
Business analytics charts and financial data displayed on laptop screen
#19911 min
H2OEnterpriseProcurementNov 12, 2025

H2O vs. AutoGluon: The Enterprise Procurement Reality

Forget the benchmarks for a second. When you're buying for 200 data scientists, the question isn't F1 score — it's support SLAs, SSO, audit logging, and whether the vendor picks up the phone.

Read issue
FREE · NO ACCOUNT REQUIRED

Read this week's issue before you subscribe.

Issue #214 is live. AutoKeras 2.1 latency breakdown, the FLAML production report, and three NAS papers worth your time. No paywall, no signup.

Read This Week's Issue
04 / COMMUNITY PROOF

Read by the people who build the stack.

Not a general data newsletter. A specific one — for the engineers who know the difference between AutoML and automated ML, and care about which one it is.

28.4K
Subscribers
67%
Open Rate
4.1 min
Avg. Read Time
214
Issues Published
READ BY ENGINEERS AT
Databricks
Scale AI
Hugging Face
Cohere
Mistral AI
Weights & Biases
Replit
Snowflake
Palantir
Modal Labs
Together AI
Anyscale
Databricks
Scale AI
Hugging Face
Cohere
Mistral AI
Weights & Biases
Replit
Snowflake
Palantir
Modal Labs
Together AI
Anyscale
WHAT READERS SAY
Pipeline is the only newsletter I read the same morning it lands. The benchmark methodology is rigorous enough that I've cited it in internal RFC docs without embarrassment.
AK
Arjun Krishnaswamy
Staff ML Engineer · Databricks
EVERY TUESDAY · FREE FOREVER

The kernel is connected. Are you?

Join 28,400 ML engineers and data science leads. One email, every Tuesday. Unsubscribe in one click, no questions asked.