SERIES
AI Models
Reviews, tests, and analysis of specific AI models. What each model actually does well, where it breaks down, and what the release means for builders.
3
pieces
19m
total read
Apr 2026
last updated
3 pieces
Review
Claude Opus 4.7: +11 SWE-Bench, Cyber Nerfed, and It Pushes Back
Better at coding. Deliberately worse at hacking. And it argues when you're wrong. Anthropic's Opus 4.7 is the first frontier model to ship a regression on purpose — and the reasoning matters.
Apr 14 7 min
Review
Gemma 4: 8 Real-World Tests (JSON, Code, Vision, Reasoning)
We pushed Google's Gemma 4 through 8 real-world execution tests covering structured JSON extraction, advanced vision QA, and live code evaluations. Here is what we found and the architectural lessons learned while evaluating it.
Apr 13 6 min
Explainer
The Claude Mythos Leak: Why the Capybara Tier Was Withheld
A massive CMS leak confirmed the existence of Claude Mythos (the 'Capybara' tier). Operating at 10-trillion parameters, the model is so advanced at automating elite-level cybersecurity exploits that Anthropic locked it behind Project Glasswing.
Apr 9 6 min
Other series