The Problem With "One-Shot" AI in Marketing

Most AI-powered marketing tools work the same way:

  1. A marketer provides a brief.
  2. The AI generates one version of the asset — a keyword list, an ad, a strategy brief.
  3. The marketer edits it and ships it.

This is useful, but it leaves enormous value on the table. A human marketer would never ship the first draft of a campaign; they'd run variants, test tone, hunt for long-tail keywords the baseline missed. But "try 100 versions and pick the best" isn't how most AI products are structured — and it certainly isn't how they're priced.

Meanwhile, at the frontier of ML research, there's a completely different pattern emerging.

Karpathy's AutoResearch: Machine-Speed Iteration

In March 2026, Andrej Karpathy released autoresearch — an open-source tool that lets an AI agent autonomously run ML training experiments overnight. The idea is simple and elegant:

  1. Propose — the agent modifies code (train.py).
  2. Train — run a fixed 5-minute experiment.
  3. Evaluate — score the result on a single metric (validation bits-per-byte).
  4. Ratchet — if the score improved, keep the change. Otherwise, revert via git reset.
  5. Repeat — ~12 experiments per hour, ~100 overnight.

Karpathy's own runs yielded 126 experiments in a single night and ~700 over two days, with ~20 improvements that transferred perfectly to larger models. Shopify's CEO Tobi Lütke adapted the pattern for an internal query-expansion model and got a 19% validation score improvement from 37 experiments on a 0.8B parameter model.

The key insight isn't about ML training. It's about a general pattern:

Separate research execution from research judgment. The human defines what to optimize and how to measure success; the AI handles methodical iteration at machine speed.

Any domain with (a) a clear automated scoring function and (b) fast iteration cycles can apply it. Healthcare marketing has both.

Bringing AutoResearch to Healthcare Marketing

At Luma Health, we help health systems fill open appointment slots through AI-driven marketing. Our platform scrapes live scheduling availability from MyChart, generates targeted Google Ads campaigns, and provides market intelligence reports to help administrators make data-driven decisions.

The entire pipeline was built on one-shot AI generation — useful, but every generation started from scratch. Past wins weren't compounding into future campaigns.

So we implemented three autoresearch loops.

1. Keyword Portfolio Discovery

The ratchet metric: a composite score combining monthly search volume, competition level, patient intent (High / Research / Wasteful), cost-per-click, visit-type coverage, and budget efficiency.

Mutation strategies: expand (generate related long-tail), prune (drop the lowest-scoring term), swap (replace weakest with a variation of the strongest), niche-dive (generate long-tail off the top performer), and geo-variant (swap "near me" for city-specific modifiers).

First Run · UAMS Orthopedics, Little Rock AR
110 keywords, 35K+ monthly searches targeted, +4.1% portfolio lift
84s
Duration
20
Experiments
13 kept
Winners Retained
+4.1%
Score Lift

Long-tail terms surfaced that the human-seeded baseline missed: "orthopedic doctor accepting new patients near me", "sports injury specialist Little Rock AR", "hip and knee pain treatment Little Rock."

2. Ad Creative Optimization

The ratchet metric: keyword density, CTA strength, character-limit compliance, readability, plus five LLM-as-judge dimensions: patient intent alignment, medical accuracy, differentiation, urgency calibration, and overall effectiveness.

Mutation strategies: ten strategic angles — urgency, convenience, provider expertise, patient outcomes, question-lead, stat-lead, conversational tone, clinical tone, CTA-strengthening, and direct keyword injection.

Result on a UAMS sports medicine campaign: 24 seconds, 7 experiments, +4.7% lift. The winning strategy was keyword_injection.

Before · baseline
UAMS Sports Medicine
Hamilton Newhart, MD · Schedule Today
Board-certified sports medicine doctor. Treatment for knee, shoulder, and sports injuries.
Same-week appointments available at UAMS Health in Little Rock.
Book Appointment
After · +4.7% lift
Sports Medicine Doctor
Knee Pain Specialist · Orthopedic Doctor Visit
Non-surgical solutions for knee pain & injuries. Little Rock appointments.
Dr. Newhart in Little Rock offers 12 slots. Book your sports medicine visit.
Schedule Now

The AI recognized that "knee pain" and "sports medicine" were high-value keywords but weren't appearing in the headlines — exactly the kind of observation a seasoned PPC analyst would make.

3. Market Intelligence Report Quality

This is the most leveraged loop because improvements transfer across every organization on the platform. Instead of optimizing a single asset, it optimizes the prompt template used to generate all future reports.

The ratchet metric: automated scoring of data citation density, specificity (ratio of concrete terms to vague platitudes), recommendation count, completeness (references to all five data sources: CDC, Google Trends, HRSA, CMS, Census), and structural compliance — plus LLM-as-judge scoring on actionability, data grounding, insight novelty, and strategic coherence.

Mutation strategies: twelve prompt mutations — specificity injection, data source emphasis, output format switches, reasoning chain ("first identify the top 3 signals, then..."), role priming, and constraint additions.

First Run · Market Intelligence Report Generation
A single framing change — +51.8% report quality across all dimensions
72s
Duration
6
Experiments
+51.8%
Score Lift
role_analyst
Winning Mutation

Reframing the system prompt to position the AI as "a healthcare data analyst focused on converting market data into actionable campaign recommendations" produced a 51.8% improvement in report quality across all scoring dimensions. That optimized template now generates every market intelligence report on the platform.

The Compounding Effect

The non-obvious part — and the part we think matters most — is what happens after a Performance Lab run.

Every winning artifact is saved to a winners store. When a user later generates a new campaign, the campaign builder automatically checks the store and uses past winners as reference exemplars in the LLM prompt. Not copied verbatim — used as high-quality inspiration to seed the next generation.

Concretely:

  1. Week 1: Run Performance Lab on your orthopedics campaign → AI finds a strong keyword list and ad copy. You ship them.
  2. Week 2: You create a new orthopedics campaign for a different provider. The Campaign Designer automatically uses Week 1's winners as its starting point — not a blank slate.
  3. Week 3: You run Performance Lab on the Week 2 campaign. It iterates forward from the already-optimized state and finds more lift.

Each run permanently raises the baseline. The service literally gets smarter every time anyone uses it.

The optimized report template is global — one winning template improves reports for every organization on the platform. The keyword and creative winners are org- and specialty-specific, so gains for orthopedics don't leak into dermatology.

Why This Matters for Healthcare

Healthcare marketing has unusual constraints. Google Ads enforces strict compliance policies for medical categories. Patients search in specific, high-intent ways ("orthopedic surgeon accepting new patients near me") that differ from general consumer search. Small differences in keyword selection can mean the difference between a $3.78 CPC and a $13.00 CPC on the same query intent.

These constraints make healthcare marketing exactly the kind of domain where systematic iteration pays off disproportionately. A 4% lift on a $30,000/month ad spend is $1,200/month. A 50% improvement in report quality means marketing managers spend their time acting on specific recommendations instead of translating generic suggestions.

And the research pattern itself is provably sound: Karpathy demonstrated it on nanochat. Shopify validated it on query expansion. We're showing it works for patient acquisition.

The Engineering in One Paragraph

Three ratchet loops, each with four files: a scoring function (composite of automated metrics + LLM-as-judge), an experiment runner (applies one mutation), a loop driver (the ratchet with plateau detection and adaptive strategy weighting), and a runner script for operators. Winners persist to disk as JSON. The campaign generator reads them at generation time and injects them as prompt references. 95+ unit tests covering scoring, mutations, strategy selection, and plateau behavior. Built on Node.js, OpenAI GPT-4o, and our existing Supabase / Google Ads stack.

What's Next

We're planning three extensions, all following the same pattern: clear score, tight mutation strategies, adaptive weighting, plateau detection.

  1. Decision tree path optimization — autoresearch for MyChart scraping strategies. Ratchet on "appointment slots discovered per scrape minute."
  2. Landing page conversion optimization — ratchet on a composite of above-fold CTA placement, mobile responsiveness, trust signals, and LLM-judged clarity.
  3. Scraping reliability — ratchet on success rate across a fixed test set of orgs. Tune timeouts, selectors, and wait strategies autonomously.

Credit Where It's Due

This work sits on the shoulders of Andrej Karpathy's autoresearch release and the broader "agentic engineering" shift he's described. The pattern is his; the healthcare marketing application is ours. If you're building any kind of AI-powered generation tool — from code to copy to curriculum — the autoresearch pattern is probably the single highest-leverage adaptation available to you right now.

Try it on your own domain. Define a scoring function. Write a mutation step. Ratchet overnight. Wake up to compounding gains.