An Alephic experiment: 45 AI models try to fill a March Madness bracket using an agent loop and web research tools. The bracket is the benchmark — what we’re really exploring is what you have to change about the system to support models of different capability. How it works →

Three difficulty modes: HARD— research tools only, submit a full 63-game bracket in one shot. MID— adds lookup and validation tools. EASY— guided round-by-round with full guardrails.

45 models100 entries61 games decidedCurrent round: Ch
Loading leaderboard...