port-sqlite-to-rust · best: 100% · GPT-5.4 xhigh · 5 hrs · $52

SQLite-compatible SQL engine

Build a SQLite-compatible SQL engine in Rust from scratch. Match the reference engine's behavior across 5.8 million sqllogictest records spanning DDL, joins, subqueries, and the indexed milestones that separate naive implementations from real query planners.

Target language: Rust (default)
Total tests: 5,806,415
Milestones: 18 ordered
Unlock threshold: 60% pass to unlock the next
Measurement window: 12 hours
Reference impl in workspace: Yes — full sqlite3.c amalgamation as study material

What the agent starts with

The agent starts with a scaffolded Cargo project: a working JSONL request/response driver, type-system stubs, an empty parser with all the AST types pre-declared, and an executor with "not implemented" stubs for every SQL statement variant. The full SQLite source amalgamation (~256,000 lines of C) sits in reference/ as study material.

The job is to fill in the parser, the executor, the storage layer, the scalar and aggregate functions, and an index-aware query planner — to the point where the engine matches real SQLite's behavior on the sqllogictest corpus. Linking against rusqlite or shelling out to sqlite3 is explicitly forbidden.

Test surface, in order

Milestones unlock sequentially. The next milestone is only attempted when the current one passes at least 60%. The first eight milestones are pure SQL semantics (~11k records total). The next seven add indexed operations (~2M records). The final three are the long tail: ~3.7M records of scalar expressions, aggregates, and scaled joins.

#	Milestone	Suite	Tests
1	DDL Foundations	ddl_foundations	69
2	UPDATE Mutations	update_mutations	27
3	IN / NOT IN with NULLs	membership_null_logic	267
4	SELECT: CASE and Subqueries	select_case_subqueries	1,031
5	SELECT: NULLs and COALESCE	select_nulls_coalesce	1,031
6	SELECT: EXISTS and Correlated	select_exists_correlated	3,351
7	SELECT: UNION / EXCEPT / INTERSECT	select_set_operations	3,857
8	SELECT: Multi-Table Joins	select_multitable_joins	1,436
9	Indexed DELETE and Filter	indexed_delete_filter	160,942
10	Indexed IN / NOT IN	indexed_membership	132,907
11	Indexed BETWEEN	indexed_range	124,618
12	Indexed ORDER BY	indexed_ordering	808,122
13	Indexed Predicate Commutativity	indexed_commutativity	514,321
14	Indexed Randomized Predicates	indexed_randomized	208,168
15	Indexed Views	indexed_views	109,229
16	Scalar Expressions	expressions_scalar	1,200,366
17	Aggregate Functions	aggregates	1,292,350
18	Multi-Table Joins (scaled)	joins_multitable	1,244,323
	Total		5,806,415

Per-run results

Every production sqlite run, sorted by peak milestone-completion percentage. "Wall" is when the last useful state was recorded — for runs that ended in a platform failure, this is when the failure happened, not the configured budget.

Model	Agent	Budget	Wall	Peak %	Tests	Cost	Tokens	Outcome
GPT-5.4 (xhigh)	codex	12 hrs	5 hrs	100%	5,806,415 / 5,806,415	$52	139 M	all tests pass
GPT-5.5 (xhigh)	codex	12 hrs	8 hrs 54 min	99.91%	5,803,177 / 5,806,415	$258	386 M	platform bug
GPT-5.4-mini	codex	2 hrs	2 hrs	68.75%	1,783,123 / 5,806,415	$9	64 M	time budget
GPT-5.4-mini	codex	12 hrs	7 hrs 42 min	58.17%	431,937 / 5,806,415	$20	162 M	model gave up
GPT-5.5 (xhigh)	codex	12 hrs	1 hr 24 min	44.17%	164,453 / 5,806,415	$33	48 M	platform crash
DeepSeek-v4-pro	opencode	12 hrs	4 hrs 54 min	36.59%	8,038 / 5,806,415	$4.61	155 M	model gave up
GPT-5.4-mini	codex	12 hrs	42 min	14.44%	325 / 5,806,415	$3	23 M	platform crash
GPT-5.4-mini	codex	2 hrs	18 min	0%	0 / 5,806,415	$2.39	22.7 M	sandbox lost

Tokens are total cumulative reported tokens: Codex input includes cached reads; OpenCode Go cache reads/writes are added from provider usage. Terminal labels come from platform state.