News

The rStar2-Agent framework boosts a 14B model to outperform a 671B giant, offering a path to state-of-the-art AI without ...
The breakthrough of RhymeRL stems from an in-depth analysis of the Rollout phase in the reinforcement learning process. The research found that there is a significant “historical similarity” in the ...
In September 2025, the INFLY TECH team, in collaboration with Fudan University and Griffith University, released a groundbreaking study that delves into a puzzling phenomenon encountered during the ...
CoreWeave said it will acquire OpenPipe, a Bellevue, Wash.-based startup that helps developers train AI agents using ...
The newly released ERNIE X1.1 reasoning model is a significant upgrade that delivers major advancements across core ...
Reinforcement Learning does NOT make the base model more intelligent and limits the world of the base model in exchange for early pass performances. Graphs show that after pass 1000 the reasoning ...
Baidu is back with another AI announcement, and this time they’re really swinging for the fences. The Chinese tech giant just ...
CoreWeave, Inc. (NASDAQ: CRWV), the AI Hyperscaler™, today announced a definitive agreement to acquire OpenPipe Inc, a ...
Want smarter insights in your inbox? Sign up for our weekly newsletters to get only what matters to enterprise AI, data, and security leaders. Subscribe Now (Updated Monday, 1/27 8am) DeepSeek-R1’s ...
AI cheats not because it’s broken, but because it has learned our own bad habit—rewarding what feels good over what is true.
A U.S. Naval Research Laboratory (NRL) research team successfully conducted the first reinforcement learning (RL) control of ...