AGI Needs World Models and State of World Models

Demis Hassabis, Google Deepmind CEO, just told the AI world that ChatGPT’s path needs a world model. OpenAI and Google and XAI and Anthropic are all using the LLM (Large Language Model Approach). Google Genie 3 system, released last August, generates interactive 3D environments from text.

Here is my review of the state of the art on world models. There is also the need continual learning and integrated memory. Tesla with FSD has for two years projected out split second by split second projected frames for the eight driving cameras. They do it for simulation and to project what might happen for driving.

Google Deepmind and Tesla/XAI have advanced world models and are working on integration.

Tesla and XAI have advanced them for optimus humanoid robots. Tesla has a massive portfolio in AI for autonomous driving (hundreds filed/granted, many in 2025 focusing on neural networks, data pipelines, vision processing, and simulation). Some touch on generative/synthetic data or scene understanding that could relate to world models (patents on neural network adaptation, visual data processing, or simulation).

Optimus (Tesla’s humanoid robot) uses similar principles and leverages vision-based end-to-end models for manipulation, navigation, and task planning, potentially extending FSD-like world models to embodied robotics. They are predicting physics in human-like environments. Need split second reaction time.

Humanoid bot competitors (1X with its “world model” for Neo robot, reducing teleop reliance) discuss world models openly, but Tesla does not publish papers or share code.

Nvidia and its world simulators also have significant work. Nvidia works with most of the other humanoid bot companies and provides them with simulators and tools.

Fei Fei Li and world labs have good theory.

CHAPTERS:

05:26 Demis Hassabis on AI’s Current Capabilities and Future
09:56 Demis Explains How AI Learns Physical Reality
12:09 Demis Predicts AGI in 5-10 Years, Addresses Energy
15:25 Societal Transformation, Disruption, and Personal Unwind
22:06 Hosts Debate AI’s Future: Language, Physics and Robots
26:28 Google’s Comeback in the Intense AI Race
28:28 Demis on Market Exuberance and Google’s Financial Strength
32:19 Geopolitics: China’s Rapid Catch-Up in AI
34:27 Hosts Analyze China’s AI and Google’s Financial Edge
38:08 How DeepMind Powers Google’s AI Products and Edge Devices
41:17 Demis Reflects on Google’s Vision and NVIDIA Partnership
45:05 Demis’s Vision for AI’s Golden Age of Discovery
47:13 Hosts Discuss Google AI’s Advantage and OpenAI Pressure

AI Agents inside simulated worlds can outperform regular agents by about 20-30% on reasoning tasks.

Hassabis’s views on world models, their necessity for AGI, his definition/vision of AGI, and related concepts (limitations of current LLMs, needed breakthroughs, timelines).

Tesla’s FSD (especially v12+ onward) relies heavily on end-to-end neural networks and what are often referred to internally/externally as world models — these are embodied AI systems that simulate/predict real-world physics, scenes, trajectories, and actions from vision data for driving planning and control.

Tesla has discussed world models in contexts like occupancy networks, synthetic data generation, and simulation for training (Ashok Elluswamy keynotes at CVPR 2023 and ICCV 2025 workshops, where Tesla explores world models for autonomous driving and potential robotics extensions.

Demis Hassabis on AI’s Current Capabilities and Future

Hassabis said scaling laws STILL remain effective. More compute, data, and larger models yield meaningful capability gains. The gains are slower than the peak years but are still far from zero returns.

Current AI systems show jagged intelligence—excelling in some areas (language, certain reasoning) but failing inconsistently on others (simple tasks if phrased differently).

Missing capabilities for true generality. continual/online learning, true originality/innovation, consistent performance, long-term planning, better reasoning.

To reach AGI, scaling alone may not suffice. One or two major innovations (like AlphaGo-level breakthroughs) are likely still needed beyond current architectures.

Demis Explains How AI Learns Physical Reality

World models are a core passion and a likely missing piece. AI must build internal simulations of the world’s physics, causality (how one thing affects another), and higher-level domains (biology, economics) to understand reality deeply.

LLMs/foundation models (Gemini) handle multimodal data (text, images, video, audio) but lack true understanding of physics, causality, long-term planning, or hypothesis testing via mental simulation.

Humans (especially scientists) use intuitive physics and mental simulations to test ideas; current AI cannot generate novel hypotheses or new scientific conjectures independently.

DeepMind’s work includes early/embryonic world models like Genie (interactive world generation) and video models (Veo), which imply understanding—if realistic generation is possible, the model grasps world dynamics.

Vision: Future AGI will converge foundation models (like Gemini) with world model capabilities for integrated, powerful systems (not world models fully superseding LLMs, but enhancing them).

Demis Predicts AGI in 5-10 Years

Addresses Energy and AGI definition/vision. A system exhibiting all human cognitive capabilities—true innovation/creativity, planning, reasoning, consistent/general performance across domains, continual learning, and the ability to understand/explain the world (new scientific theories via simulation/hypothesis testing). It goes beyond passive prediction to active understanding, invention, and autonomous action.

Still 5–10 years away (consistent with DeepMind’s 2010 founding vision of a ~20-year mission. Progress has accelerated dramatically).

Bottlenecks are Compute/chips shortages, energy constraints (intelligence increasingly tied to energy availability).

AI itself aids solutions (efficiency gains, better materials/solar, fusion plasma control via collaborations like Commonwealth Fusion, room-temperature superconductors).

Efficiency improvements for Models like Gemini Flash use distillation (big models teach smaller ones) for 10x better performance per watt annually.

Societal Transformation, Disruption, and Personal Unwind

AGI impact will be transformative like the Industrial Revolution but 10x bigger and 10x faster—massive benefits (curing diseases via Isomorphic Labs/AlphaFold spinouts, energy breakthroughs, solving climate/poverty/water/health/aging).

The AGI risks are economic disruption (need new models), bad actors misusing AI, loss of control in autonomous/agentic systems (guardrails essential).

Hassabis is cautiously optimisc. He believes in human ingenuity/safety focus (DeepMind planned for powerful systems from 2010, uses scientific method for understanding/deploying responsibly). No slowdown advocated due to geopolitical/corporate race dynamics; prioritize responsible frontier-pushing as role model.

Hosts Debate AI’s Future: Language, Physics and Robots

LLMs strong on language but weak on physical world understanding (causality, robotics needs).

World models rising in importance for robotics, autonomous driving. Convergence with LLMs could enable true generality.

Criticism of LLMs: Limitations in novelty/original ideas (echoing LeCun’s views). World models address this by enabling simulation-based hypothesis testing.

Robotics challenge and Training agents (world models key for autonomous operation beyond teleop puppets).

Google’s Comeback in the Intense AI Race

Google caught up/surpassed in some areas (Gemini series on leaderboards).
Google Reorg integrated research (Google Brain + DeepMind under Hassabis). Scrappier commercialization and tight loop with Sundar Pichai for roadmap alignment.

Demis on Market Exuberance and Google’s Financial Strength

AI bubble is not binary. There is some overvaluation and some seed rounds with little substance). However, fundamentals are strong. It is transformative technology like the internet and electricity.

Google has a strong balance sheet, cash flow, user products (Gemini integration across ecosystem) weather any correction.

China’s Rapid Catch-Up in AI

China is closer than thought (months behind frontier models). DeepSeek and Alibaba are leading open-source.

Innovation beyond frontier (new architectures like Transformers) is harder. Mentality/culture favors scaling over exploratory breakthroughs.

China fast-moving/experts. Chip restrictions may not halt progress long-term. Google resilient via cash/products.

How DeepMind Powers Google’s AI Products and Edge Devices

Google DeepMind is the engine room. All AI research diffused to products (fast Gemini shipping to Search).

Interest in efficient models for phones/glasses (universal assistants). Partnerships (Samsung, Warby Parker).

Demis Reflects on Google’s Vision and NVIDIA Partnership

He has no regrets on 2014 acquisition by Google of Deepmind. Google backing enabled breakthroughs (AlphaGo, AlphaFold). Natural fit with mission.

Demis admires Jensen Huang. AI-for-science is important. Google uses both TPUs (internal scaling) and GPUs (exploration).

Demis’s Vision for AI’s Golden Age of Discovery

He sees dozens of AlphaFold-like (nobel prize winning protein folding) revolutions in materials, physics, math, weather.

2026 will see reliable agentic/autonomous systems and robotics advances (Gemini Robotics). There will be on-device AI. Further world model efficiency for planning/integration.

There will be a golden age of science if progress/safety handled well.

Yann LeCun Has Been Saying World Models Are Needed for AGI

Yann LeCun, Meta’s former Chief AI Scientist and a pioneer in deep learning left Meta at the end of 2025 after over a decade there. He founded and directed FAIR for 5 years and served as Chief AI Scientist for 7 years. It was a voluntary exit, driven by fundamental disagreements with Meta’s AI direction. Zuckerberg made a heavier bet on large language models (LLMs), which LeCun has long criticized as a “dead end” for achieving human-level or advanced AI.

Yann advocates for world models. Meta sidelined much of his work at FAIR in favor of LLM-focused efforts. Tensions escalated around issues like alleged benchmark manipulation on Llama 4

Zuckerberg reportedly lost confidence in parts of the GenAI org, leading to restructurings.

In 2026, LeCun is now Executive Chairman at his new startup, Advanced Machine Intelligence Labs (aka AMI Labs). This venture directly continues his Advanced Machine Intelligence research program focusing on world models and related architectures for systems that”understand the physical world, have persistent memory, can reason, and can plan complex action sequences.

They hired Alex LeBrun (formerly CEO of Nabla) as CEO, and is eeking a high valuation in the $3–5B+ range in late 2025 discussions. Recent updates indicate the startup will be based in Paris.

LeCun’s specific world model pursuits (like JEPA architectures and energy-based models) lost priority and resources at Meta before his exit. Meta has expressed interest in partnering with his new firm, suggesting some ongoing collaboration rather than full severance.

World models represent one of the most active frontiers in AI research, widely viewed as a critical missing ingredient for advancing toward AGI (as emphasized by Demis Hassabis at DeepMind and Yann LeCun at AMI Labs). These models aim to learn latent representations of the physical world—capturing intuitive physics, causality, object permanence, dynamics, spatial reasoning, and higher-level abstractions (e.g., biology/economics)—enabling simulation, prediction, planning, and embodied action far beyond text-prediction in LLMs.

Current world models are mostly video/video-generation-based or latent dynamics learners (e.g., autoregressive frame prediction, diffusion in latent space, or JEPA-style predictive embeddings). They emerge implicit physics from data (videos, robotics trajectories) rather than explicit rules. True integrated, persistent, multi-modal, long-horizon world models for general planning/embodiment remain nascent, with major labs racing toward convergence (foundation models + world models + agents).

xAI Grok’s Position on Physics/World Models

xAI is actively pursuing world models as a strategic priority, explicitly to overcome LLM limitations in physical understanding. Reports from late 2025 indicate xAI hired ex-Nvidia specialists for this, with applications in gaming (AI-generated interactive 3D environments) and robotics (likely tied to Tesla’s ecosystem via shared data/compute). Grok models (Grok 4/5) incorporate multimodal data, including video/robotics trajectories, to build causal/physics awareness.Grok benefits indirectly from Tesla’s massive real-world data (FSD fleet videos, sensor streams), which train implicit world simulations for physics/causality in driving/robotics.

Elon Musk has claimed Grok could “discover new physics” by 2026, with Grok 5 (Jan 2026 release) positioned as potentially AGI-capable with strong real-world grounding.

No public standalone Grok World Model yet (unlike DeepMind’s Genie), but xAI’s focus is on large-scale, physics-grounded multimodal systems for agents/robots/games.

Tesla’s FSD and Optimus are leading in embodied physics. Tesla uses a unified neural world simulator (physics-real, general-purpose) generating synthetic data/videos for training both. It learns dynamics from fleet data, enabling transfer (Optimus navigation in simulated factories). This is state-of-the-art for real-world physics in robotics/autonomy—far ahead in deployment scale, though more narrow (vehicle/humanoid tasks) than general-purpose models.

Google DeepMind — Leading in interactive/general world models. Genie 3 (2025) is the first real-time, action-controllable 3D foundation world model (autoregressive, learns physics from observation, consistent for minutes at 24fps/720p). Used for agent training (SIMA). Veo 3.1 adds audio/video consistency. Strongest in scalable, emergent physics simulation.

OpenAI — Pioneered video-as-world-simulation with Sora (2024 onward). Sora/Sora 2/3 treat scaled video generation as “general purpose simulators of the physical world.” Rumors of Genie-like interactive extensions; strong implicit physics but criticized for inconsistencies in complex dynamics.

Anthropic — Lags in explicit world models/physics; Claude focuses on reasoning/safety in text/multimodal. Some vision/physics benchmarks improving (e.g., figure interpretation), but no dedicated world model push—more tool/LLM-centric.

Fei-Fei Li’s World Labs — Commercial leader with Marble (Nov 2025 launch): Multimodal (text/image/video/3D inputs) generative world model for persistent, editable/downloadable 3D environments (Unity/Unreal compatible, VR support). Focuses on spatial intelligence for storytelling/creativity/robotics; positions as “first step” toward true spatial reasoning.

Others Leading:Meta (pre-LeCun exit): V-JEPA 2 (Jan 2026) excels in visual understanding/robotics (65-80% pick-and-place success with minimal data).

Runway: GWM-1 (Dec 2025) for explorable environments/robotics training.
NVIDIA: Cosmos/GR00T open models/datasets lead robotics downloads on Hugging Face; focus on physical AI.
Yann LeCun’s AMI Labs (post-Meta): JEPA-based, physics-grounded systems; early but high-potential for predictive world understanding.

World Model Research

World models address core LLM flaws. LLMs lack of grounding, poor long-horizon causality, no mental simulation for hypothesis testing/planning. SOTA (Jan 2026) shows emergent intuitive physics (gravity, collisions, permanence) from video/robot data, but explicit reasoning/planning over long horizons remains weak—models “understand” via prediction, not symbolic manipulation.

Scaling video + robotics data + latent architectures (autoregressive transformers, diffusion, JEPA) drives progress; interaction (Genie-style) is the leap for agents.

2026 Expected Progress: Rapid commercialization (Marble-like tools in gaming/AR/VR)

Interactive horizons extend (5-10+ min consistent worlds). Integration with agents (Gemini + Genie for embodied tasks). Robotics breakthroughs via sim-to-real transfer (Tesla leads deployment). Chinese open models (Qwen/DeepSeek multimodal) surge on HF.

2027-2028

Convergence toward unified foundation world models (text + vision + action + persistent memory). Reliable long-horizon planning/simulation for AGI steps. Embodied AGI prototypes (humanoid-level manipulation in novel environments). If scaling + new architectures (eJEPA hybrids) succeed, physics/causality could reach near-human levels, enabling scientific discovery/robust autonomy.

Key Hugging Face/trending papers (2025-2026)

Include JEPA/V-JEPA evals, DreamerV3 (Nature agent imagination), IntPhys benchmarks (intuitive physics), and robotics-focused (NVIDIA/GR00T integrations). The field shifts from LLM scaling to”embodied/world-grounded scaling. DeepMind/Tesla lead deployment. World Labs/AMI innovate conceptually.

Brian Wang is a Futurist Thought Leader and a popular Science blogger with 1 million readers per month. His blog Nextbigfuture.com is ranked #1 Science News Blog. It covers many disruptive technology and trends including Space, Robotics, Artificial Intelligence, Medicine, Anti-aging Biotechnology, and Nanotechnology.

Known for identifying cutting edge technologies, he is currently a Co-Founder of a startup and fundraiser for high potential early-stage companies. He is the Head of Research for Allocations for deep technology investments and an Angel Investor at Space Angels.

A frequent speaker at corporations, he has been a TEDx speaker, a Singularity University speaker and guest at numerous interviews for radio and podcasts. He is open to public speaking and advising engagements.

What's Hot

Elon Musk Orders Sweeping Layoffs as xAI Fails to Catch Up

Your ROG Xbox Ally X is about to get a free performance upgrade soon

Laptop performance and FPS drop after BIOS update

AGI Needs World Models and State of World Models

Elon Musk Orders Sweeping Layoffs as xAI Fails to Catch Up

US Destroys All Military Targets on Kharg Island Which Is Iran’s Oil Export Hub

NASA Selects Finalists in Student Aircraft Maintenance Competition – NASA

The US Plans to Break Ground on a Permanent Moon Base by 2030. Here’s What It Will Take.

Robot Escorted Away By Cops After Terrorizing Old Woman

SpaceX Space AI Ramp | NextBigFuture.com

The iPad Air brand makes no sense – it needs a rethink

ChatGPT Group Chats are here … but not for everyone (yet)

Facebook updates its algorithm to give users more control over which videos they see

Our Picks

Elon Musk Orders Sweeping Layoffs as xAI Fails to Catch Up

Your ROG Xbox Ally X is about to get a free performance upgrade soon

Laptop performance and FPS drop after BIOS update

Subscribe to Updates

What's Hot

AGI Needs World Models and State of World Models

Related Posts