Ai Benchmarks for Code

Quesma Releases OTelBench: Independent Benchmark Reveals Frontier LLMs Struggle with Real-World SRE Tasks

New benchmark shows top LLMs achieve only 29% pass rate on OpenTelemetry instrumentation, exposing the gap between ...

Explained: Sarvam AI that managed to beat Google Gemini and ChatGPT in key AI benchmarks

Bengaluru-based Sarvam AI has outperformed Google’s Gemini and OpenAI’s ChatGPT in Indian language benchmarks, showcasing locally trained models for documents, speech, and low-bandwidth use across ...

InfoWorld

Why benchmarks are key to AI progress

Researchers are racing to develop more challenging, interpretable, and fair assessments of AI models that reflect real-world use cases. The stakes are high. Benchmarks are often reduced to leaderboard ...

U.S. News & World Report

New AI Benchmarks Test Speed of Running AI Applications

SAN FRANCISCO (Reuters) - Artificial intelligence group MLCommons unveiled two new benchmarks that it said can help determine how quickly top-of-the-line hardware and software can run AI applications.

Detroit Free Press

First Benchmark for Legacy Code Comprehension Shows Specialized AI Approach Outperforms General-PurposeModels

LegacyCodeBench tests whether AI can understand COBOL well enough to document itaccurately not just generate plausible text NEW YORK, NY, UNITED STATES, January 13 ...

Some results have been hidden because they may be inaccessible to you

Show inaccessible results