SERIES

AI Models

Reviews, tests, and analysis of specific AI models. What each model actually does well, where it breaks down, and what the release means for builders.

3

pieces

19m

total read

Apr 2026

last updated

3 pieces

Claude Opus 4.7: +11 SWE-Bench, Cyber Nerfed, and It Pushes Back

Better at coding. Deliberately worse at hacking. And it argues when you're wrong. Anthropic's Opus 4.7 is the first frontier model to ship a regression on purpose — and the reasoning matters.

Gemma 4: 8 Real-World Tests (JSON, Code, Vision, Reasoning)

We pushed Google's Gemma 4 through 8 real-world execution tests covering structured JSON extraction, advanced vision QA, and live code evaluations. Here is what we found and the architectural lessons learned while evaluating it.

The Claude Mythos Leak: Why the Capybara Tier Was Withheld

A massive CMS leak confirmed the existence of Claude Mythos (the 'Capybara' tier). Operating at 10-trillion parameters, the model is so advanced at automating elite-level cybersecurity exploits that Anthropic locked it behind Project Glasswing.

Other series

Inside the Code