Close Menu

    Subscribe to Updates

    Get the latest Tech news from SynapseFlow

    What's Hot

    Brain Implants Let Paralyzed People Type Nearly as Fast as Smartphone Users

    March 18, 2026

    Intel Arc update adds pre-compiled shaders to speed up game load times by up to 3x

    March 18, 2026

    Microsoft 365 network connectivity test tool fails

    March 18, 2026
    Facebook X (Twitter) Instagram
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    Facebook X (Twitter) Instagram YouTube
    synapseflow.co.uksynapseflow.co.uk
    • AI News & Updates
    • Cybersecurity
    • Future Tech
    • Reviews
    • Software & Apps
    • Tech Gadgets
    synapseflow.co.uksynapseflow.co.uk
    Home»AI News & Updates»New training method boosts AI multimodal reasoning with smaller, smarter datasets
    New training method boosts AI multimodal reasoning with smaller, smarter datasets
    AI News & Updates

    New training method boosts AI multimodal reasoning with smaller, smarter datasets

    The Tech GuyBy The Tech GuyDecember 3, 2025No Comments6 Mins Read0 Views
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Advertisement



    New training method boosts AI multimodal reasoning with smaller, smarter datasets

    Researchers at MiroMind AI and several Chinese universities have released OpenMMReasoner, a new training framework that improves the capabilities of language models in multimodal reasoning.

    Advertisement

    The framework uses a two-stage process. It first refines a base model with a curated dataset in a supervised fine-tuning (SFT) stage. Then, a reinforcement learning (RL) stage guides the model to reason more effectively in tasks that involve both text and visual data. 

    Experiments show that models trained with OpenMMReasoner outperform other leading visual reasoning models, often while being trained on a smaller, higher-quality dataset. The framework and all its assets, including a trained 7B model, are fully open source, providing a reliable foundation for building applications that require traceability and robustness.

    According to Kaichen Zhang, co-author of a research paper that outlines the new method, OpenMMReasoner offers significant benefits for businesses looking beyond large, closed systems. "A smaller open-source reasoning model has practical advantages: Enterprises can deploy it locally, reduce latency, lower token costs associated with long chains of thought, maintain full control over their data and [it is] fine-tunable to adapt to their specific downstream task," he told VentureBeat.

    The challenge of transparent multimodal reasoning

    Recent advances in reinforcement learning with verifiable rewards (RLVR) have significantly improved the reasoning abilities of large language models (LLMs). RLVR trains LLMs to generate chain-of-thought (CoT) tokens (which mimic the reasoning processes humans use) before generating the final answer. This improves the model’s capability to solve complex reasoning tasks such as math and coding. 

    Motivated by this success, researchers have applied similar RL-based methods to large multimodal models (LMMs), showing that the benefits can extend beyond text to improve visual understanding and problem-solving across different modalities.

    However, a lack of transparency in the training pipeline has been a major barrier. Many studies on multimodal reasoning do not provide detailed information about their data curation and training processes, making it difficult to reproduce their results or understand what makes these models work.

    “This lack of openness restricts reproducibility and obscures a deeper understanding of how reasoning-capable LMMs are actually built and how their training dynamics evolve,” the researchers note.

    The OpenMMReasoner recipe

    OpenMMReasoner addresses this gap with a fully transparent and scalable training recipe built on open-source LMMs. The researchers found it was critical to curate high-quality datasets by scaling data diversity. Although using diverse data sources is important, increasing the diversity of correct answers for the same question was an essential axis for improvement.

    The first stage of the recipe is a three-step supervised fine-tuning (SFT) pipeline. It begins with data sourcing, where the team collected approximately 103,000 raw question-answer pairs from public datasets covering general visual Q&A and reasoning tasks. Next, they added a data distillation step, using a powerful model (Qwen3-VL-235B-Instruct) to generate new, high-quality reasoning traces for selected questions. (The data will then be used to train a smaller model.)

    To increase answer diversity, the team generated multiple verified reasoning traces for each question. This expanded the dataset to 583,000 samples. Finally, they implemented a “domain mixing” phase, adding data from mathematical reasoning domains to further generalize the model's capabilities, resulting in a final SFT dataset of 874,000 examples.

    The second stage is an RL recipe that uses a smaller, 74,000-sample dataset curated from domains like science, math and puzzles. The model is trained with a composite reward function that considers both the correctness of the final answer and the consistency of the output format. To improve efficiency, the process includes a penalty for "overthinking," discouraging the model from generating excessively long answers (a problem with many reasoning models trained through RL, which mistakenly learn to generate overly long reasoning sequences, resulting in excess cost and slower answers).

    This recipe can provide a blueprint for enterprises training their own models. "For companies with limited domain-specific data, a feasible strategy is to first increase answer diversity for their existing dataset, then use domain mixing to integrate this domain data into a general reasoning recipe like ours," Zhang explained. "This allows the model to acquire strong general-purpose reasoning skills while also adapting to industry-specific tasks, without needing millions of samples."

    A more efficient and capable reasoning model

    According to Zhang, the step-by-step process fundamentally changes the reliability of the model's outputs. "Traditional models often 'jump' directly to an answer, which means they explore only a narrow portion of the reasoning space," he said. "In contrast, a reasoning-first approach forces the model to explicitly examine multiple intermediate steps… [allowing it] to traverse much deeper paths and arrive at answers with far more internal consistency."

    The researchers used the OpenMMReasoner recipe to generate data to fine-tune the Qwen2.5-VL-7B-Instruct open-source vision-language model. The result is a highly capable LMM that consistently outperforms state-of-the-art methods, such as Open Vision Reasoner (OVR), across a wide range of multimodal reasoning benchmarks. The SFT stage alone creates a strong baseline model that achieves superior performance and data efficiency compared to other SFT approaches, despite using a significantly smaller training dataset.

    The subsequent RL phase further sharpens and stabilizes these abilities, leading to more consistent and improved performance. After RL, the final model achieves state-of-the-art results on several benchmarks, including WeMath, MathVerse and MathVista.

    One of the key findings was that, as the model improved at multimodal reasoning, it also showed a "gradual emergence of textual reasoning behaviors, suggesting a transfer of reasoning competence from multimodal to purely linguistic domains," the researchers note. This indicates that skills learned in one modality can strengthen performance in another. 

    "Our results show that strengthening multimodal reasoning can even improve text-only mathematical skills—evidence that core logical abilities can transfer across modalities," Zhang said. "Looking ahead, we do expect these methods to extend to video and audio."

    The researchers also found that token efficiency is crucial. While allowing a model to generate longer reasoning steps can improve performance, excessive tokens reduce efficiency. Their results show that setting a smaller "reasoning budget" can achieve comparable or even better accuracy, an important consideration for deploying cost-effective enterprise applications.

    By open-sourcing all components of their workflow, the researchers provide a reproducible view of the entire process. For enterprise teams, this transparency is invaluable. "For business leaders concerned about vendor lock-in, hidden biases or opaque data sources, this level of transparency is essential," Zhang stated. "It empowers teams to validate the data, customize the pipeline for new domains and maintain long-term independence from any single provider."

    Advertisement
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    The Tech Guy
    • Website

    Related Posts

    Railway secures $100 million to challenge AWS with AI-native cloud infrastructure

    January 22, 2026

    Claude Code costs up to $200 a month. Goose does the same thing for free.

    January 20, 2026

    Listen Labs raises $69M after viral billboard hiring stunt to scale AI customer interviews

    January 16, 2026

    Salesforce rolls out new Slackbot AI agent as it battles Microsoft and Google in workplace AI

    January 13, 2026

    Converge Bio raises $25M, backed by Bessemer and execs from Meta, OpenAI, Wiz

    January 13, 2026

    Anthropic launches Cowork, a Claude Desktop agent that works in your files — no coding required

    January 13, 2026
    Leave A Reply Cancel Reply

    Advertisement
    Top Posts

    The iPad Air brand makes no sense – it needs a rethink

    October 12, 202516 Views

    ChatGPT Group Chats are here … but not for everyone (yet)

    November 14, 20258 Views

    Facebook updates its algorithm to give users more control over which videos they see

    October 8, 20258 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Advertisement
    About Us
    About Us

    SynapseFlow brings you the latest updates in Technology, AI, and Gadgets from innovations and reviews to future trends. Stay smart, stay updated with the tech world every day!

    Our Picks

    Brain Implants Let Paralyzed People Type Nearly as Fast as Smartphone Users

    March 18, 2026

    Intel Arc update adds pre-compiled shaders to speed up game load times by up to 3x

    March 18, 2026

    Microsoft 365 network connectivity test tool fails

    March 18, 2026
    categories
    • AI News & Updates
    • Cybersecurity
    • Future Tech
    • Reviews
    • Software & Apps
    • Tech Gadgets
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 SynapseFlow All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.