ISSUE #214 IS LIVE

Published Feb 25, 2026

The AutoML stack, benchmarked
and delivered every Tuesday.

Neural architecture search to no-code deployment — Pipeline dissects what moved, what shipped, and what's worth your attention before the standup. For engineers who build with data for a living.

Read This Week's Issue View Benchmarks28,400 engineers already subscribed

pipeline-dashboard — kernel connected

live

ISSUE #214 · FEB 25, 2026

AutoKeras 2.1 vs. Optuna 3.4: The Latency Gap That Matters

10 min read

The headline numbers look close. AutoKeras 2.1 clocks in at 2.3s median inference on the Kaggle tabular benchmark suite; Optuna-tuned XGBoost lands at 2.1s. But the distribution tells a different story — AutoKeras' 95th-percentile latency spikes to 11.7s, nearly 3× Optuna's 4.2s. If you're serving real-time predictions, that tail is your SLA breach waiting to happen.

Accuracy Delta

+1.4%

AutoKeras vs baseline

P95 Latency

11.7s

AutoKeras spike

Setup Time

12 min

vs 4.5h hand-tuning

Memory

2.3×

overhead vs XGBoost

Read This Week's Issue

TABULAR CLASSIFICATION · KAGGLE SUITE · FEB 2026

AutoML Platform Rankings

Updated weekly

#	Framework	F1 Score ↓	P50 Latency	Deploy Complexity	Adoption %
01	H2O AutoML	0.924	1.8s	Medium	68%
02	AutoKeras 2.1	0.918	2.3s	Low	74%
03	FLAML	0.912	1.4s	High	51%
04	AutoGluon 1.2	0.907	3.1s	Low	62%
05	Optuna 3.4	0.899	2.1s	High	83%

Full Benchmark Report →

TOP DISCUSSIONS · THIS WEEK

Community Pulse

1,247 active

Is FLAML actually production-ready or still a research toy?

We've been running FLAML in staging for 6 weeks. The good: setup is genuinely 15 minutes. The bad: it silently ignores feature interaction constraints you set in the config...

Priya VenkataramanML Lead · Stripe34 replies

FLAMLProduction

AutoGluon 1.2 multimodal — anyone running it on < 16GB VRAM?

The new multimodal pipeline is impressive on paper. Tested on an A10G and it fits, but only if you disable the ensemble stacking layer. Anyone found a workaround that...

Marcus OkonkwoSenior DS · Cohere21 replies

AutoGluonGPU

NAS vs. HPO in 2026 — are they converging or diverging?

After the DARTS v3 paper last month, I'm not sure where the line is anymore. The search space overlaps significantly with HPO when you account for...

Sofia LindqvistResearch Eng · HuggingFace58 replies

NASHPOResearch

AutoKeras 2.1 released · P95 latency regression flagged·H2O 3.46 · AutoML accuracy record on Kaggle tabular suite·FLAML 2.3 · LightGBM backend updated · 12% throughput gain·DARTS v3 paper · NAS convergence in 4 GPU-hours·AutoGluon 1.2 multimodal · requires CUDA 12.1+·Optuna 3.4 · TPE sampler rewrite · 23% faster wall-clock·PyCaret 3.3 · Polars backend (beta) · 3× faster preprocessing·TPOT 0.12 · maintenance mode · no new features planned·AutoKeras 2.1 released · P95 latency regression flagged·H2O 3.46 · AutoML accuracy record on Kaggle tabular suite·FLAML 2.3 · LightGBM backend updated · 12% throughput gain·DARTS v3 paper · NAS convergence in 4 GPU-hours·AutoGluon 1.2 multimodal · requires CUDA 12.1+·Optuna 3.4 · TPE sampler rewrite · 23% faster wall-clock·PyCaret 3.3 · Polars backend (beta) · 3× faster preprocessing·TPOT 0.12 · maintenance mode · no new features planned·

01 / MACRO SIGNALS

The numbers your team will cite in next quarter's planning doc.

Pipeline tracks the metrics that move roadmaps — not the press-release numbers, but the ones that show up in benchmark repos and internal postmortems.

+18 pts YoY

73%

Enterprise AutoML Adoption

of enterprises tested ≥1 AutoML tool in 2025

4.2×

NAS Speedup Over Manual Search

median across DARTS-variant benchmarks, Q1 2026

vs 2.8× in 2024

32% CAGR

$2.4B

AutoML Market Size 2025

projected $9.7B by 2030 at 32% CAGR

↑ from 38% in 2024

61%

Teams Skipping HPO Entirely

now relying on AutoML defaults for initial deployment

↓ from 4.5h manual

14 min

Median Time-to-First-Model

AutoGluon on standard tabular benchmark, 2026 baseline

11-pt gap on unstructured

89%

Accuracy vs. Hand-Tuned

AutoML matches or exceeds expert baselines on structured data

02 / FRAMEWORK BENCHMARKS

The table your team will screenshot.

Kaggle tabular classification suite · 15 datasets · median of 5 runs each. Click any column header to re-rank. Updated weekly.

Last updated Feb 25, 2026

pipeline_benchmark_v2.4.csv — 7 frameworks · 8 metrics

Sortable

#	Framework	F1 Score↓	P50 Lat.	P95 Lat.	Memory ×	Setup	Deploy	Adoption	Verdict
01	H2O AutoMLv3.46	0.924	1.8s	5.2s	1.4×	8 min	Medium	68%	Best accuracy
02	AutoKerasv2.1	0.918	2.3s	11.7s	2.3×	12 min	Low	74%	Watch P95
03	FLAMLv2.3	0.912	1.4s	3.9s	1.1×	15 min	High	51%	Fastest
04	AutoGluonv1.2	0.907	3.1s	8.4s	3.1×	6 min	Low	62%	Best multimodal
05	Optunav3.4	0.899	2.1s	4.2s	0.9×	45 min	High	83%	Most adopted
06	TPOTv0.12	0.891	4.7s	14.1s	1.8×	20 min	Medium	39%	Declining
07	PyCaretv3.3	0.884	2.8s	6.3s	1.6×	10 min	Low	71%	Best DX

Methodology: 15 Kaggle tabular datasets · 5 runs each · AWS c5.2xlarge · Python 3.12Get Full Report in Your Inbox →

03 / PAST EDITIONS

Depth is the only thing we don't compress.

Every issue goes past the announcement. We run the code, check the math, and tell you what the benchmark paper didn't.

Browse all 214 issues

Abstract neural network visualization with blue glowing nodes on dark background

#21112 min

NASDARTSDeep LearningFeb 4, 2026

Neural Architecture Search Is Finally Boring (And That's Good)

DARTS v3 landed quietly last month. No blog post, no Twitter thread — just a commit and a paper. We ran it against EfficientNetV2-S on ImageNet and the story is more nuanced than the abstract suggests.

Read issue

Server room with rows of lit up rack servers in blue light

#2089 min

DeploymentMLOpsBenchmarksJan 14, 2026

The No-Code Model Deployment Trap

Three platforms promise one-click deployment. We deployed the same XGBoost model to all three and measured what "one click" actually costs — in latency, in money, and in the debugging hours you'll never get back.

Read issue

Data analytics dashboard on monitor showing charts and graphs in dark mode

#20415 min

AutoGluonProductionCase StudyDec 17, 2025

AutoGluon 1.2's Multimodal Pipeline: 6-Week Production Report

We shipped it. Here's what broke, what held, and the one config flag that makes or breaks inference performance at scale. Spoiler: it's not the one in the docs.

Read issue

Business analytics charts and financial data displayed on laptop screen

#19911 min

H2OEnterpriseProcurementNov 12, 2025

H2O vs. AutoGluon: The Enterprise Procurement Reality

Forget the benchmarks for a second. When you're buying for 200 data scientists, the question isn't F1 score — it's support SLAs, SSO, audit logging, and whether the vendor picks up the phone.

Read issue

FREE · NO ACCOUNT REQUIRED

Read this week's issue before you subscribe.

Issue #214 is live. AutoKeras 2.1 latency breakdown, the FLAML production report, and three NAS papers worth your time. No paywall, no signup.

Read This Week's Issue

04 / COMMUNITY PROOF

Read by the people who build the stack.

Not a general data newsletter. A specific one — for the engineers who know the difference between AutoML and automated ML, and care about which one it is.

28.4K

Subscribers

67%

Open Rate

4.1 min

Avg. Read Time

214

Issues Published

READ BY ENGINEERS AT

Databricks

Scale AI

Hugging Face

Cohere

Mistral AI

Weights & Biases

Replit

Snowflake

Palantir

Modal Labs

Together AI

Anyscale

Databricks

Scale AI

Hugging Face

Cohere

Mistral AI

Weights & Biases

Replit

Snowflake

Palantir

Modal Labs

Together AI

Anyscale

WHAT READERS SAY

“Pipeline is the only newsletter I read the same morning it lands. The benchmark methodology is rigorous enough that I've cited it in internal RFC docs without embarrassment.”

AK

Arjun Krishnaswamy

Staff ML Engineer · Databricks

EVERY TUESDAY · FREE FOREVER

The kernel is connected. Are you?

Join 28,400 ML engineers and data science leads. One email, every Tuesday. Unsubscribe in one click, no questions asked.

Or read this week's issue first — no signup required →

The AutoML stack, benchmarkedand delivered every Tuesday.

AutoKeras 2.1 vs. Optuna 3.4: The Latency Gap That Matters

AutoML Platform Rankings

Community Pulse

Is FLAML actually production-ready or still a research toy?

AutoGluon 1.2 multimodal — anyone running it on < 16GB VRAM?

NAS vs. HPO in 2026 — are they converging or diverging?

The numbers your team will cite in next quarter's planning doc.

The table your team will screenshot.

Depth is the only thing we don't compress.

Neural Architecture Search Is Finally Boring (And That's Good)

The No-Code Model Deployment Trap

AutoGluon 1.2's Multimodal Pipeline: 6-Week Production Report

H2O vs. AutoGluon: The Enterprise Procurement Reality

Read this week's issue before you subscribe.

Read by the people who build the stack.

The kernel is connected. Are you?

The AutoML stack, benchmarked
and delivered every Tuesday.