Close Menu

    Subscribe to Updates

    Get the latest Tech news from SynapseFlow

    What's Hot

    vivo X Fold6 confirmed to feature a special edition Dimensity 9500 SoC

    June 12, 2026

    Your 4K Blu-ray disc has HDR — but not all HDR is the same

    June 12, 2026

    Telegram returns to Wear OS with full chats, voice notes and more

    June 12, 2026
    Facebook X (Twitter) Instagram
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    Facebook X (Twitter) Instagram YouTube
    synapseflow.co.uksynapseflow.co.uk
    • AI News & Updates
    • Cybersecurity
    • Future Tech
    • Reviews
    • Software & Apps
    • Tech Gadgets
    synapseflow.co.uksynapseflow.co.uk
    Home»Future Tech»An AI Council Just Aced the US Medical Licensing Exam
    An AI Council Just Aced the US Medical Licensing Exam
    Future Tech

    An AI Council Just Aced the US Medical Licensing Exam

    The Tech GuyBy The Tech GuyOctober 11, 2025No Comments3 Mins Read1 Views
    Share
    Facebook Twitter LinkedIn Pinterest Email
    Advertisement


    Despite their usefulness, large language models still have a reliability problem. A new study shows that a team of AIs working together can score up to 97 percent on US medical licensing exams, outperforming any single AI.

    Advertisement

    While recent progress in large language models (LLMs) has led to systems capable of passing professional and academic tests, their performance remains inconsistent. They’re still prone to hallucinations—plausible sounding but incorrect statements—which has limited their use in high-stakes area like medicine and finance.

    Nonetheless, LLMs have scored impressive results on medical exams, suggesting the technology could be useful in this area if their inconsistencies can be controlled. Now, researchers have shown that getting a “council” of five AI models to deliberate over their answers rather than working alone can lead to record-breaking scores in the US Medical Licensing Examination (USMLE).

    “Our study shows that when multiple AIs deliberate together, they achieve the highest-ever performance on medical licensing exams,” Yahya Shaikh, from John Hopkins University, said in a press release. “This demonstrates the power of collaboration and dialogue between AI systems to reach more accurate and reliable answers.”

    The researchers’ approach takes advantage of a quirk in the models, rooted in the non-deterministic way they come up with responses. Ask the same model the same medical question twice, and it might produce two different answers—sometimes correct, sometimes not.

    In a paper in PLOS Medicine, the team describes how they harnessed this characteristic to create their AI “council.” They spun up five instances of OpenAI’s GPT-4 and prompted them to discuss answers to each question in a structured exchange overseen by a facilitator algorithm.

    When their responses diverged, the facilitator summarized the differing rationales and got the group to reconsider the answer, repeating the process until consensus emerged.

    When tested on 325 publicly available questions from the three stages of the USMLE, the AI council achieved 97 percent, 93 percent, and 94 percent accuracy respectively. These scores not only exceed the performance of any individual GPT-4 instance but also surpass the average human passing thresholds for the same tests.

    “Our work provides the first clear evidence that AI systems can self-correct through structured dialogue, with a performance of the collective better that the performance of any single AI,” says Shaikh.

    In a testament to the effectiveness of the approach, when the models initially disagreed, the deliberation process corrected more than half of their earlier errors. Overall, the council ultimately reached the correct conclusion 83 percent of the time when there wasn’t a unanimous initial answer.

    “This study isn’t about evaluating AI’s USMLE test-taking prowess,” co-author Zishan Siddiqui notes, also from John Hopkins, said in the press release. “We describe a method that improves accuracy by treating AI’s natural response variability as a strength. It allows the system to take a few tries, compare notes, and self-correct, and it should be built into future tools for education and, where appropriate, clinical care.”

    The team notes that their results come from controlled testing, not real-world clinical environments, so there’s a long way before the AI council could be deployed in the real world. But they suggest that the approach could prove useful in other domains as well.

    It seems like the old adage that two heads are better than one remains true even when those heads aren’t human.

    Advertisement
    Share. Facebook Twitter Pinterest LinkedIn Tumblr Email
    The Tech Guy
    • Website

    Related Posts

    NASA Award Boosts Space Technology Research Capabilities

    June 12, 2026

    AI Is Advancing Faster Than Our Ability to Understand It, Researchers Warn

    June 11, 2026

    New York Accuses Company of Smuggling Injectable Substance Made From Cadavers

    June 11, 2026

    Why SpaceX Needed $75 Billion from the IPO and Changed Strategy for AI in 2027 and Beyond

    June 11, 2026

    Curiosity Blog: Sols 4913-4919: Planetary explorers, freewheeling to the Yardang unit!

    June 11, 2026

    After Decades of Failure, ‘Undruggable’ Cancers Begin to Give Way

    June 10, 2026
    Leave A Reply Cancel Reply

    Advertisement
    Top Posts

    You don’t need a NAS to self-host — I proved it with hardware from my closet

    June 7, 202625 Views

    The iPad Air brand makes no sense – it needs a rethink

    October 12, 202516 Views

    ChatGPT Group Chats are here … but not for everyone (yet)

    November 14, 20259 Views
    Stay In Touch
    • Facebook
    • YouTube
    • TikTok
    • WhatsApp
    • Twitter
    • Instagram
    Advertisement
    About Us
    About Us

    SynapseFlow brings you the latest updates in Technology, AI, and Gadgets from innovations and reviews to future trends. Stay smart, stay updated with the tech world every day!

    Our Picks

    vivo X Fold6 confirmed to feature a special edition Dimensity 9500 SoC

    June 12, 2026

    Your 4K Blu-ray disc has HDR — but not all HDR is the same

    June 12, 2026

    Telegram returns to Wear OS with full chats, voice notes and more

    June 12, 2026
    categories
    • AI News & Updates
    • Cybersecurity
    • Future Tech
    • Reviews
    • Software & Apps
    • Tech Gadgets
    Facebook X (Twitter) Instagram Pinterest YouTube Dribbble
    • Homepage
    • About Us
    • Contact Us
    • Privacy Policy
    © 2026 SynapseFlow All Rights Reserved.

    Type above and press Enter to search. Press Esc to cancel.

    Ad Blocker Enabled!
    Ad Blocker Enabled!
    Our website is made possible by displaying online advertisements to our visitors. Please support us by disabling your Ad Blocker.