Coding Test - Search News

1don MSN

I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance

I put GPT-5.5 through a 10-round test: It scored 93/100, losing points only for exuberance ...

Hosted on MSN

Chinese open-weight AI models surpass major rivals in coding tests

Chinese AI labs Z.ai and Moonshot AI have released open-weight models that outperformed leading proprietary systems from OpenAI, Google, and Anthropic on key coding benchmarks. GLM-5.1 and Kimi K2.6 ...

Hosted on MSN

Chinese open-weight AI models surpass closed rivals in coding tests

Chinese open-weight AI models GLM-5.1 and Kimi K2.6 have overtaken leading closed-source rivals from OpenAI, Google, and Anthropic in coding benchmarks like SWE-Bench Pro. These models not only ...

10d

The Most Ignored Practice In AI Coding: Test-Driven Development

What Cherny is describing, in engineering terms, is the operating principle behind test-driven development (TDD). TDD has ...

I Am Officially 0.1% More Excited About the Future of ChatGPT

On Thursday, OpenAI announced the release of GPT-5.5, the latest update to its flagship model. It is exactly as much of an upgrade as the jump from 5.4 to 5.5 would suggest.

Developer Tech

OpenAI brings GPT-5.5 to Codex for coding tasks

OpenAI is rolling out GPT-5.5 in Codex, with a 400K context window and higher coding benchmark scores than GPT-5.4.

Decrypt

Claude Opus 4.7 Is Here: Anthropic’s Latest Model Delivers, But It’s a Token Eating Machine

Anthropic's new flagship model Claude Opus 4.7 beat every benchmark we threw at it, and eats tokens like a hungry teenager.

2don MSN

Tencent Unveils AI Model in High-Stakes Test for OpenAI Hire

Tencent Holdings Ltd. revealed a major upgrade to its foundational model, marking the first high-stakes test for China’s most ...

10d

Endor Labs Launches Agentic Code Security Benchmark, Finds Top-Performing AI Coding Agents Pass Tests But Still Fail Security

Endor Labs, today announced the launch of the agentic code security benchmark, extending the existing SusVibes framework from leading academic researchers to evaluate how securely AI coding agents ...

OpenAI releases GPT-5.5, a more powerful engine for coding, science, and general work

The company is positioning its newest system as its strongest agentic coding model yet, as it faces pressure to keep pace ...

Decrypt

Google Fixes AI Coding Tool Flaw That Let Attackers Execute Malicious Code: Report

Researchers say a prompt injection bug in Google's Antigravity AI coding tool could have let attackers run commands, despite ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results