Close Menu

    Subscribe to Updates

    Get the latest Tech news from SynapseFlow

    What's Hot

    NASA Selects Finalists in Student Aircraft Maintenance Competition – NASA

    March 13, 2026

    D-Link D501 5G adapter review

    March 13, 2026

    Disney+ is rolling out its TikTok-like ‘Verts’ short-form video feed

    March 13, 2026
    Facebook X (Twitter) Instagram
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    Facebook X (Twitter) Instagram YouTube
    synapseflow.co.uksynapseflow.co.uk
    • AI News & Updates
    • Cybersecurity
    • Future Tech
    • Reviews
    • Software & Apps
    • Tech Gadgets
    synapseflow.co.uksynapseflow.co.uk
    Home»AI News & Updates»Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks
    Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks
    AI News & Updates

    Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

    The Tech GuyBy The Tech GuyDecember 13, 2025No Comments3 Mins Read0 Views
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Advertisement



    Ai2's new Olmo 3.1 extends reinforcement learning training for stronger reasoning benchmarks

    The Allen Institute for AI (Ai2) recently released what it calls its most powerful family of models yet, Olmo 3. But the company kept iterating on the models, expanding its reinforcement learning (RL) runs, to create Olmo 3.1.

    Advertisement

    The new Olmo 3.1 models focus on efficiency, transparency, and control for enterprises. 

    Ai2 updated two of the three versions of Olmo 2: Olmo 3.1 Think 32B, the flagship model optimized for advanced research, and Olmo 3.1 Instruct 32B, designed for instruction-following, multi-turn dialogue, and tool use. 

    Olmo 3 has a third version, Olmo 3-Base for programming, comprehension, and math. It also works well for continue fine-tuning. 

    Ai2 said that to upgrade Olmo 3 Think 32B to Olmo 3.1, its researchers extended its best RL run with a longer training schedule. 

    “After the original Olmo 3 launch, we resumed our RL training run for Olmo 3 32B Think, training for an additional 21 days on 224 GPUs with extra epochs over our Dolci-Think-RL dataset,” Ai2 said in a blog post. “This yielded Olmo 3.1 32B Think, which brings substantial gains across math, reasoning, and instruction-following benchmarks: improvements of 5+ points on AIME, 4+ points on ZebraLogic, 4+ points on IFEval, and 20+ points on IFBench, alongside stronger performance on coding and complex multi-step tasks.”

    To get to Olmo 3.1 Instruct, Ai2 said its researchers applied the recipe behind the smaller Instruct size, 7B, to the larger model.

    Olmo 3.1 Instruct 32B is "optimized for chat, tool use, & multi-turn dialogue—making it a much more performant sibling of Olmo 3 Instruct 7B and ready for real-world applications,” Ai2 said in a post on X. 

    For now, the new checkpoints are available on the Ai2 Playground or Hugging Face, with API access coming soon. 

    Better performance on benchmarks

    The Olmo 3.1 models performed well on benchmark tests, predictably beating the Olmo 3 models. 

    Olmo 3.1 Think outperformed Qwen 3 32B models in the AIME 2025 benchmark and performed close to Gemma 27B. 

    Olmo 3.1 Instruct performed strongly against its open-source peers, even beating models like Gemma 3 on the Math benchmark.

    “As for Olmo 3.1 32B Instruct, it’s a larger-scale instruction-tuned model built for chat, tool use, and multi-turn dialogue. Olmo 3.1 32B Instruct is our most capable fully open chat model to date and — in our evaluations — the strongest fully open 32B-scale instruct model,” the company said. 

    Ai2 also upgraded its RL-Zero 7B models for math and coding. The company said on X that both models benefited from longer and more stable training runs.

    Commitment to transparency and open source 

    Ai2 previously told VentureBeat that it designed the Olmo 3 family of models to offer enterprises and research labs more control and understanding of the data and training that went into the model. 

    Organizations could add to the model’s data mix and retrain it to also learn from what’s been added.  

    This has long been a commitment for Ai2, which also offers a tool called OlmoTrace that tracks how LLM outputs match its training data.  

    “Together, Olmo 3.1 Think 32B and Olmo 3.1 Instruct 32B show that openness and performance can advance together. By extending the same model flow, we continue to improve capabilities while retaining end-to-end transparency over data, code, and training decisions,” Ai2 said. 

    Advertisement
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    The Tech Guy
    • Website

    Related Posts

    Railway secures $100 million to challenge AWS with AI-native cloud infrastructure

    January 22, 2026

    Claude Code costs up to $200 a month. Goose does the same thing for free.

    January 20, 2026

    Listen Labs raises $69M after viral billboard hiring stunt to scale AI customer interviews

    January 16, 2026

    Salesforce rolls out new Slackbot AI agent as it battles Microsoft and Google in workplace AI

    January 13, 2026

    Converge Bio raises $25M, backed by Bessemer and execs from Meta, OpenAI, Wiz

    January 13, 2026

    Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

    January 13, 2026
    Leave A Reply Cancel Reply

    Advertisement
    Top Posts

    The iPad Air brand makes no sense – it needs a rethink

    October 12, 202516 Views

    ChatGPT Group Chats are here … but not for everyone (yet)

    November 14, 20258 Views

    Facebook updates its algorithm to give users more control over which videos they see

    October 8, 20258 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Advertisement
    About Us
    About Us

    SynapseFlow brings you the latest updates in Technology, AI, and Gadgets from innovations and reviews to future trends. Stay smart, stay updated with the tech world every day!

    Our Picks

    NASA Selects Finalists in Student Aircraft Maintenance Competition – NASA

    March 13, 2026

    D-Link D501 5G adapter review

    March 13, 2026

    Disney+ is rolling out its TikTok-like ‘Verts’ short-form video feed

    March 13, 2026
    categories
    • AI News & Updates
    • Cybersecurity
    • Future Tech
    • Reviews
    • Software & Apps
    • Tech Gadgets
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 SynapseFlow All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.