Test suites · world-cup-2026-v3 · all 10 phases scored · 182 plans
What TestSprite probes,
line by line.
Every score on the leaderboard derives from this suite. Each plan is a structured natural-language test that the TestSprite agent reads, then executes against the deployed URL with a real headless Chromium. Pass / fail / blocked per plan. The full plan JSONs are PR-able on GitHub.
Suite
Phase 1 · Landing pageBracket UI, 12 group standings, FIFA-style hero
- a1b2c3d4Clicking a Round-of-16 cell navigates from homepage to a match detail pageP0
- b2c3d4e5Clicking the Groups navigation surfaces all twelve groups A through LP1
- c3d4e5f6Clicking a Group A team row navigates to a page that names that teamP1
- d4e5f6a7Unknown match slug shows a 404 view and recovers via home linkP1
- e5f6a7b8Clicking the Final cell reaches a match detail page labeled FinalP0
- f6a7b8c9Clicking one cell from each bracket stage reaches four distinct match pagesP0
- a7b8c9d0Clicking a team row in Group A then in Group L lands on two distinct destinationsP1
- b8c9d0e1Three Round-of-16 cell clicks lead to three different match detail pagesP0
- c9d0e1f2Clicking a bracket cell reaches a match detail page that shows kickoff and venueP1
Phase 2 · Match details78 /match/<id> SSR permalinks with teams, flags, kickoff, venue, round
Phase 3 · PredictionsPer-match winner + scoreline + probability bars + reasoning; KO tie resolution; champion locked at SIGSTART
- 065b46e6No prediction has a team predicted to play itselfP0
- 4879d4e3Predicted scoreline values stay within a sane 0 to 9 rangeP0
- 55cf53a7Knockout-stage predictions never declare a draw winnerP1
- 890bb57eTied knockout predictions surface an extra-time or penalties resolutionP1
- fc0b0802Group-stage predictions allow draw as a valid winnerP2
- fcec9b10Prediction block includes a reasoning paragraphP1
Phase 4 · LineupsLineups tab — predicted XI (11 per team), formation label + pitch diagram, per-player injury/suspension notes
Phase 5 · Your analysisAnalysis tab — 3-5 paragraphs per match with inline citations resolving to a References panel; no boilerplate; 200-600 chars/paragraph
Phase 6 · Related newsNews section — ≥3 fresh items per match with title, source, date, and HEAD-checked URL; source diversity across domains
Phase 7 · Betting oddsOdds tab — implied probabilities + consensus row + agent-vs-market consistency
Phase 8 · Multi-language i18nen/es/pt locale routes, persisted switcher, real translation, no placeholder data
Phase 9 · Matchday polishDesign-token skin + light/dark + completeness, graceful 404, a11y, perf
Phase 10 · Final polish · releaseBranded hero image + AA contrast, cross-surface consistency (champion/scoreline/groups/prose), no-TBD authenticity, editorial bracket, light-default theme, prior-phase regression
world-cup-2026-v3 · all 10 phases scored · 182 plans
The v3.2 spec ships 10 feature-themed phases: landing (phase 1) → match details (phase 2) → predictions (phase 3) → lineups (phase 4) → analysis (phase 5) → related news (phase 6) → betting odds (phase 7) → multi-language (phase 8) → light/dark polish (phase 9) → final polish (phase 10). All 10 are scored — 182 plans total, re-run cumulatively against the deployed app at every later phase. Cohorts 1 (v1) and 2 (v2) have been retired as dry-runs.
Browse the v3 suite →