Yann LeCun hates LLMs

JEPA represents a shift from generative, token-based AI models to embedding-based predictive models that focus on learning abstract, reusable representations.
Avoiding representation collapse through techniques like Barlow Twins enables stable self-supervised learning without massive labeled datasets or complex contrastive setups.
Generative approaches excel in language but face severe challenges in video and vision due to enormous output spaces and uncertainty; JEPA circumvents this by predicting embeddings rather than raw pixels.
World models empowered by JEPA allow machines to predict consequences of actions, enabling planning, control, and more general intelligence—a capability absent in current LLMs.
Self-supervised vision models have rapidly closed the gap with supervised models, with recent works achieving near state-of-the-art accuracy purely from unlabeled data.
The future of AI likely involves integrating joint embedding architectures with world models to build autonomous, agentic systems capable of reasoning, planning, and safely interacting with the real world.

Posted

May 2, 2026

Greg P

Tags: