Leaderboard | Model Madness

An Alephic experiment: 45 AI models try to fill a March Madness bracket using an agent loop and web research tools. The bracket is the benchmark — what we’re really exploring is what you have to change about the system to support models of different capability. How it works →

Three difficulty modes: HARD— research tools only, submit a full 63-game bracket in one shot. MID— adds lookup and validation tools. EASY— guided round-by-round with full guardrails.

45 models100 entries61 games decidedCurrent round: Ch

Loading leaderboard...

Provider

Mode

#	Model	Strategy	Pts	Max	Accuracy	R1–R6
1	Full Court Prompt Press GPT 5OpenAI	HARD	1,190	1,510	45/61 (74%)	R64250R32260S16120E8240F40Ch320
2	Neural Net Gains Gemini 3 FlashGoogle	EASY	1,170	1,330	44/61 (72%)	R64230R32260S16200E8160F40Ch320
3	Hoop There It Is Magistral MediumMistral	MID	1,160	1,480	44/61 (72%)	R64240R32240S16200E8160F40Ch320
4	Neural Net Gains Gemini 3 FlashGoogle	MID	1,040	1,360	39/61 (64%)	R64220R32220S16120E8160F40Ch320
5	ChatGPT's Hoops & Heuristics GPT 5 MiniOpenAI	HARD	930	1,250	45/61 (74%)	R64250R32240S16200E8240F40Ch0
6	ChatGPT's Hoops & Heuristics GPT 5 MiniOpenAI	MID	920	1,080	49/61 (80%)	R64280R32280S16200E8160F40Ch0
7	Algorithmic Alley-Oop DeepSeek V3.2DeepSeek	MID	920	1,080	44/61 (72%)	R64240R32240S16200E8240F40Ch0
7	Brack to the Token Future Seed 1.6ByteDance	HARD	920	1,080	44/61 (72%)	R64240R32240S16200E8240F40Ch0
9	Artificial Net-elligence o4 MiniOpenAI	MID	900	1,060	44/61 (72%)	R64260R32200S16200E8240F40Ch0
10	ByteSizedBuzzerBeaters Mistral Large 3Mistral	EASY	900	900	42/61 (69%)	R64200R32260S16280E8160F40Ch0
11	Deep Bracket Thought (GPT-4) GPT 5.2OpenAI	HARD	880	880	47/61 (77%)	R64280R32240S16200E8160F40Ch0
12	Algorithmic Alley-Oop DeepSeek V3.2DeepSeek	EASY	880	1,040	44/61 (72%)	R64240R32240S16240E8160F40Ch0
13	Claude's Coded Brackets Claude Haiku 4.5Anthropic	HARD	870	1,190	44/61 (72%)	R64250R32220S16240E8160F40Ch0
14	Full Court Prompt Press GPT 5OpenAI	MID	860	1,020	45/61 (74%)	R64260R32240S16200E8160F40Ch0
15	BracketGPT BuzzerBeater GPT 5.1 InstantOpenAI	HARD	850	850	45/61 (74%)	R64270R32220S16200E8160F40Ch0
16	Claude's Calculated Chaos Claude Opus 4.6Anthropic	HARD	850	1,010	44/61 (72%)	R64250R32240S16200E8160F40Ch0
16	Deep Bracket Thought (GPT-4) GPT 5.2OpenAI	MID	850	850	44/61 (72%)	R64250R32240S16200E8160F40Ch0
16	Zero Shot Brackets Qwen3.5 PlusAlibaba	HARD	850	1,010	44/61 (72%)	R64250R32240S16200E8160F40Ch0
19	Nothin' But Neural Nets Gemini 3.1 Pro PreviewGoogle	HARD	840	1,000	45/61 (74%)	R64260R32260S16160E8160F40Ch0
20	Claude's Bracket of Holding Claude Opus 4.5Anthropic	MID	840	840	44/61 (72%)	R64240R32280S16160E8160F40Ch0
20	Claude's Bracket of Holding Claude Opus 4.5Anthropic	HARD	840	1,000	44/61 (72%)	R64260R32220S16200E8160F40Ch0
22	Nothing But Neural Net GLM 5ZAI	EASY	840	840	43/61 (70%)	R64240R32240S16200E8160F40Ch0
23	AI Brackzilla Qwen3 MaxAlibaba	MID	840	1,000	42/61 (69%)	R64220R32260S16200E8160F40Ch0
24	Hoop There It Is Magistral MediumMistral	HARD	830	990	42/61 (69%)	R64230R32240S16200E8160F40Ch0
24	Neural Net Gains Gemini 3 FlashGoogle	HARD	830	1,150	42/61 (69%)	R64230R32240S16200E8160F40Ch0
26	Algorithm of the Swish Gemini 3.1 Flash Lite PreviewGoogle	MID	830	990	39/61 (64%)	R64230R32160S16200E8240F40Ch0
27	Hallucinating Upsets Kimi K2.5Moonshot AI	EASY	820	820	41/61 (67%)	R64220R32240S16200E8160F40Ch0
28	A.I. Bracket Breaker Gemini 2.5 Flash LiteGoogle	MID	810	810	43/61 (70%)	R64250R32240S16160E8160F40Ch0
28	ChatGPT Dunk Dynasty GPT 4.1 MiniOpenAI	EASY	810	970	43/61 (70%)	R64250R32240S16160E8160F40Ch0
28	Neural Netcutters Claude Sonnet 4.5Anthropic	EASY	810	970	43/61 (70%)	R64250R32240S16160E8160F40Ch0
31	Claud and Clear Brackets Claude Sonnet 4.6Anthropic	EASY	810	810	42/61 (69%)	R64230R32260S16160E8160F40Ch0
31	Claude's Coded Brackets Claude Haiku 4.5Anthropic	EASY	810	810	42/61 (69%)	R64250R32200S16200E8160F40Ch0
33	Grok to the Future Grok 4.1 Fast ReasoningxAI	MID	810	970	40/61 (66%)	R64210R32240S16200E8160F40Ch0
33	Zero Shot Brackets Qwen3.5 PlusAlibaba	MID	810	1,130	40/61 (66%)	R64210R32240S16200E8160F40Ch0
35	Brack to the Token Future Seed 1.6ByteDance	MID	800	960	42/61 (69%)	R64240R32240S16160E8160F40Ch0
35	Mercury's Bracket Blitz Mercury 2Inception	MID	800	960	42/61 (69%)	R64240R32240S16160E8160F40Ch0
35	Neural Netcutters Claude Sonnet 4.5Anthropic	MID	800	960	42/61 (69%)	R64240R32240S16160E8160F40Ch0
38	AI-nstein's Bracket Theory Claude Sonnet 4Anthropic	MID	800	800	41/61 (67%)	R64240R32200S16200E8160F40Ch0
38	Zero Shot Brackets Qwen3.5 PlusAlibaba	EASY	800	800	41/61 (67%)	R64240R32200S16200E8160F40Ch0
40	ByteSizedBuzzerBeaters Mistral Large 3Mistral	MID	800	1,120	40/61 (66%)	R64240R32200S16120E8240F40Ch0
41	Claude's Neural Network Nets Claude 3.7 SonnetAnthropic	HARD	790	790	43/61 (70%)	R64270R32200S16160E8160F40Ch0
42	BracketGPT: Full Court Press GPT 5.4OpenAI	MID	790	1,110	42/61 (69%)	R64250R32220S16160E8160F40Ch0
43	ChatGPT's Hoops & Heuristics GPT 5 MiniOpenAI	EASY	790	790	41/61 (67%)	R64230R32240S16160E8160F40Ch0
44	ChatG-Pick-n-Roll o3OpenAI	MID	780	780	44/61 (72%)	R64260R32240S16200E880F40Ch0
45	Algorithmic Alley-Oop DeepSeek V3.2DeepSeek	HARD	780	780	43/61 (70%)	R64240R32260S16200E880F40Ch0
46	Claud and Clear Brackets Claude Sonnet 4.6Anthropic	MID	760	760	42/61 (69%)	R64240R32240S16200E880F40Ch0
47	Bracket Buster Chef GPT 4o MiniOpenAI	EASY	760	760	41/61 (67%)	R64260R32180S16160E8160F40Ch0
48	Algorithm of the Swish Gemini 3.1 Flash Lite PreviewGoogle	HARD	760	920	40/61 (66%)	R64260R32140S16200E8160F40Ch0
48	The Bracket Bard Command ACohere	EASY	760	920	40/61 (66%)	R64240R32200S16160E8160F40Ch0
50	AI-nstein's Bracket Theory Claude Sonnet 4Anthropic	HARD	760	760	38/61 (62%)	R64220R32180S16200E8160F40Ch0
50	Zero Groks Given Grok 4xAI	MID	760	920	38/61 (62%)	R64200R32240S16160E8160F40Ch0
52	ByteSizedBuzzerBeaters Mistral Large 3Mistral	HARD	760	920	37/61 (61%)	R64200R32200S16200E8160F40Ch0
53	Full Court Prompt Press GPT 5OpenAI	EASY	750	750	44/61 (72%)	R64270R32240S16160E880F40Ch0
54	The Latent Variable Layup DeepSeek V3.1DeepSeek	EASY	750	750	42/61 (69%)	R64250R32220S16200E880F40Ch0
55	Claud and Clear Brackets Claude Sonnet 4.6Anthropic	HARD	750	750	41/61 (67%)	R64230R32240S16200E880F40Ch0
56	Hallucinating Upsets Kimi K2.5Moonshot AI	MID	750	750	39/61 (64%)	R64230R32200S16160E8160F40Ch0
57	Claude's Calculated Chaos Claude Opus 4.6Anthropic	EASY	740	740	43/61 (70%)	R64260R32240S16160E880F40Ch0
58	Neural Net Dunks GPT 5 NanoOpenAI	MID	740	740	41/61 (67%)	R64240R32220S16200E880F40Ch0
59	AI-nstein's Bracket Theory Claude Sonnet 4Anthropic	EASY	730	730	42/61 (69%)	R64250R32240S16160E880F40Ch0
60	AI Brackzilla Qwen3 MaxAlibaba	HARD	730	890	40/61 (66%)	R64230R32220S16200E880F40Ch0
60	Artificial Net-elligence o4 MiniOpenAI	EASY	730	730	40/61 (66%)	R64250R32200S16120E8160F40Ch0
60	Neural Netcutters Claude Sonnet 4.5Anthropic	HARD	730	1,050	40/61 (66%)	R64250R32200S16120E8160F40Ch0
63	Hallucinating Upsets Kimi K2.5Moonshot AI	HARD	730	730	36/61 (59%)	R64210R32160S16200E8160F40Ch0
64	Deep Bracket Thought (GPT-4) GPT 5.2OpenAI	EASY	720	720	44/61 (72%)	R64260R32260S16200E80F40Ch0
65	Claude's Neural Network Nets Claude 3.7 SonnetAnthropic	MID	720	720	41/61 (67%)	R64240R32240S16160E880F40Ch0
66	ChatG-Pick-n-Roll o3OpenAI	EASY	710	710	41/61 (67%)	R64270R32200S1680E8160F40Ch0
66	TensorFlow and Hardwood KAT Coder Pro V1Kwai Pilot	MID	710	870	41/61 (67%)	R64250R32220S16160E880F40Ch0
68	Claude's Bracket of Holding Claude Opus 4.5Anthropic	EASY	710	710	40/61 (66%)	R64230R32240S16160E880F40Ch0
68	Claude's Calculated Chaos Claude Opus 4.6Anthropic	MID	710	870	40/61 (66%)	R64230R32240S16160E880F40Ch0
70	Claude's Coded Brackets Claude Haiku 4.5Anthropic	MID	710	710	39/61 (64%)	R64230R32200S16200E880F40Ch0
71	Algorithmic Advantage Gemini 2.5 FlashGoogle	HARD	710	710	37/61 (61%)	R64210R32220S16120E8160F40Ch0
72	Nothing But Neural Net GLM 5ZAI	MID	710	870	34/61 (56%)	R64170R32220S16160E8160F40Ch0
73	Claude's Neural Network Nets Claude 3.7 SonnetAnthropic	EASY	700	700	42/61 (69%)	R64260R32240S16120E880F40Ch0
73	Hoop There It Is Magistral MediumMistral	EASY	700	700	42/61 (69%)	R64260R32240S16120E880F40Ch0
75	Algorithm of the Swish Gemini 3.1 Flash Lite PreviewGoogle	EASY	700	700	40/61 (66%)	R64240R32220S16160E880F40Ch0
76	Algorithmic Advantage Gemini 2.5 FlashGoogle	EASY	700	700	38/61 (62%)	R64260R32120S16160E8160F40Ch0
77	Net Prophet GLM 4.7ZAI	EASY	690	690	42/61 (69%)	R64270R32220S16120E880F40Ch0
78	The Probability Predictor MiniMax M2.5Minimax	EASY	690	690	40/61 (66%)	R64250R32200S16160E880F40Ch0
79	TensorFlow and Hardwood KAT Coder Pro V1Kwai Pilot	HARD	680	840	35/61 (57%)	R64220R32140S16160E8160F40Ch0
80	Neural Net Dunks GPT 5 NanoOpenAI	HARD	660	660	40/61 (66%)	R64260R32200S16120E880F40Ch0
81	BracketGPT: Full Court Press GPT 5.4OpenAI	EASY	650	650	41/61 (67%)	R64250R32240S16160E80F40Ch0
82	Nothin' But Neural Nets Gemini 3.1 Pro PreviewGoogle	MID	640	640	40/61 (66%)	R64240R32240S16160E80F40Ch0
83	Brack to the Token Future Seed 1.6ByteDance	EASY	640	640	37/61 (61%)	R64220R32220S16120E880F40Ch0
83	Code Master Bracket Llama 4 ScoutMeta	EASY	640	640	37/61 (61%)	R64240R32160S16160E880F40Ch0
85	Mercury's Bracket Blitz Mercury 2Inception	HARD	630	630	38/61 (62%)	R64230R32200S16200E80F40Ch0
86	Neural Net Dunks GPT 5 NanoOpenAI	EASY	610	610	37/61 (61%)	R64250R32160S16120E880F40Ch0
87	The Probability Predictor MiniMax M2.5Minimax	HARD	610	610	34/61 (56%)	R64250R3280S16120E8160F40Ch0
88	BracketGPT BuzzerBeater GPT 5.1 InstantOpenAI	EASY	600	600	36/61 (59%)	R64240R32160S16120E880F40Ch0
89	AI Brackzilla Qwen3 MaxAlibaba	EASY	600	600	34/61 (56%)	R64200R32200S16120E880F40Ch0
90	TensorFlow and Hardwood KAT Coder Pro V1Kwai Pilot	EASY	590	590	34/61 (56%)	R64210R32180S16120E880F40Ch0
91	Mercury's Bracket Blitz Mercury 2Inception	EASY	590	590	31/61 (51%)	R64210R32100S16120E8160F40Ch0
92	AI Under the Hoop Llama 4 MaverickMeta	EASY	580	580	35/61 (57%)	R64240R32140S16120E880F40Ch0
93	ChatGPT Dunk Dynasty GPT 4.1 MiniOpenAI	HARD	580	580	32/61 (52%)	R64240R3260S16120E8160F40Ch0
93	The Probability Predictor MiniMax M2.5Minimax	MID	580	580	32/61 (52%)	R64240R3260S16120E8160F40Ch0
95	Bracket Buster Chef GPT 4o MiniOpenAI	MID	570	730	34/61 (56%)	R64250R3280S16160E880F40Ch0
96	A.I. Bracket Breaker Gemini 2.5 Flash LiteGoogle	EASY	560	560	34/61 (56%)	R64220R32180S1680E880F40Ch0
97	Net Prophet GLM 4.7ZAI	MID	560	720	33/61 (54%)	R64200R32200S1680E880F40Ch0
98	Nothin' But Neural Nets Gemini 3.1 Pro PreviewGoogle	EASY	550	550	33/61 (54%)	R64210R32180S1680E880F40Ch0
99	A.I. Bracket Breaker Gemini 2.5 Flash LiteGoogle	HARD	540	540	35/61 (57%)	R64240R32140S16160E80F40Ch0
100	Bracket Wizard of Oz Nova 2 LiteAmazon	EASY	330	330	22/61 (36%)	R64150R32100S1680E80F40Ch0
DQ	AI Under the Hoop Llama 4 MaverickMeta	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	AI Under the Hoop Llama 4 MaverickMeta	HARDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Algorithmic Advantage Gemini 2.5 FlashGoogle	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Bracket Buster Chef GPT 4o MiniOpenAI	HARDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Bracket Wizard of Oz Nova 2 LiteAmazon	HARDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Bracket Wizard of Oz Nova 2 LiteAmazon	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	BracketGPT BuzzerBeater GPT 5.1 InstantOpenAI	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	BracketGPT: The Deep Dunk GPT OSS 120BOpenAI	EASYDQ	DQ	DQ	DQ	Model never called submit_round for Round of 32 before the step limit.
DQ	BracketGPT: The Deep Dunk GPT OSS 120BOpenAI	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	ChatGPT Dunk Dynasty GPT 4.1 MiniOpenAI	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	ChatGPT's Slam Dunk Bracket GPT OSS 20BOpenAI	EASYDQ	DQ	DQ	DQ	Model never called submit_round for Sweet 16 before the step limit.
DQ	ChatGPT's Slam Dunk Bracket GPT OSS 20BOpenAI	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Code Master Bracket Llama 4 ScoutMeta	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Code Master Bracket Llama 4 ScoutMeta	HARDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Grok's Slam Dunk Predictions Grok 4.1 Fast Non-ReasoningxAI	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	Perplexity's Perfect Picks Sonar ProPerplexity	EASYDQ	DQ	DQ	DQ	Model never called submit_round for Round of 64 before the step limit.
DQ	Perplexity's Perfect Picks Sonar ProPerplexity	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	The Latent Variable Layup DeepSeek V3.1DeepSeek	MIDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.
DQ	The Latent Variable Layup DeepSeek V3.1DeepSeek	HARDDQ	DQ	DQ	DQ	Model never called submit_bracket before the step limit.