Yesterday in AI: 25 April 2026 — 186 Real Deals: Claude Agents Left Haiku Users Behind
Anthropic revealed smarter Claude agents silently outperform weaker ones in real commerce; Hugging Face shipped an AI that trains other AIs; 12,000 developers are running Claude Code without an Anthropic API key.
By OMC Editorial on 2026-04-26
Anthropic ran a real-money employee marketplace and proved what many suspected: the model tier assigned to your AI agent shapes your economic outcomes — and users with weaker agents never realized they were losing.
Project Deal: 186 Real Trades, One Hidden Disadvantage
In December 2025, Anthropic gave 69 employees a $100 budget each and assigned a Claude agent to negotiate on their behalf in a Craigslist-style internal marketplace. Participants didn't know which model they got — Claude Opus 4.5 flagship or Claude Haiku 4.5 lightweight — until the experiment ended.
The agents struck 186 deals totaling $4,086. Opus-backed sellers earned $2.68 more per item on average, and buyers saved $2.45 per item; Opus agents completed roughly 2.07 more transactions each. Employees with Haiku agents received objectively worse outcomes yet reported similar satisfaction — they couldn't tell. One Haiku agent pitched "perfectly spherical orbs of possibility" ping-pong balls and still closed the sale; another recognized a buyer's snowboard preference from a prior conversation and matched the exact model wanted. The finding: in multi-agent commerce, model quality is a silent tax users cannot see.
Results published April 25, 2026. TechCrunchhttps://techcrunch.com/2026/04/25/anthropic-created-a-test-marketplace-for-agent-on-agent-commerce/ | Anthropic feature pagehttps://www.anthropic.com/features/project-deal
Hugging Face ml-intern: An AI That Trains Other AIs
Hugging Face released ml-intern, an open-source agent that acts as a junior ML engineer: it autonomously reads arXiv papers, selects datasets from the Hugging Face Hub, writes training scripts, runs fine-tuning jobs, monitors evaluations, and iterates — all without human supervision. Built on the smolagents framework.
In the launch demo, ml-intern took Qwen3-1.7B from 10% to 32% on GPQA a graduate-level scientific reasoning benchmark in under 10 hours. It identified NVIDIA's NemoTron-CrossThink dataset by traversing citatio