I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance ...
Chinese AI labs Z.ai and Moonshot AI have released open-weight models that outperformed leading proprietary systems from OpenAI, Google, and Anthropic on key coding benchmarks. GLM-5.1 and Kimi K2.6 ...
Chinese open-weight AI models GLM-5.1 and Kimi K2.6 have overtaken leading closed-source rivals from OpenAI, Google, and Anthropic in coding benchmarks like SWE-Bench Pro. These models not only ...
What Cherny is describing, in engineering terms, is the operating principle behind test-driven development (TDD). TDD has ...
On Thursday, OpenAI announced the release of GPT-5.5, the latest update to its flagship model. It is exactly as much of an upgrade as the jump from 5.4 to 5.5 would suggest.
OpenAI is rolling out GPT-5.5 in Codex, with a 400K context window and higher coding benchmark scores than GPT-5.4.
Anthropic's new flagship model Claude Opus 4.7 beat every benchmark we threw at it, and eats tokens like a hungry teenager.
Tencent Holdings Ltd. revealed a major upgrade to its foundational model, marking the first high-stakes test for China’s most ...
Endor Labs, today announced the launch of the agentic code security benchmark, extending the existing SusVibes framework from leading academic researchers to evaluate how securely AI coding agents ...
The company is positioning its newest system as its strongest agentic coding model yet, as it faces pressure to keep pace ...
Researchers say a prompt injection bug in Google's Antigravity AI coding tool could have let attackers run commands, despite ...