Yann LeCun hates LLMs

  • JEPA represents a shift from generative, token-based AI models to embedding-based predictive models that focus on learning abstract, reusable representations.
  • Avoiding representation collapse through techniques like Barlow Twins enables stable self-supervised learning without massive labeled datasets or complex contrastive setups.
  • Generative approaches excel in language but face severe challenges in video and vision due to enormous output spaces and uncertainty; JEPA circumvents this by predicting embeddings rather than raw pixels.
  • World models empowered by JEPA allow machines to predict consequences of actions, enabling planning, control, and more general intelligence—a capability absent in current LLMs.
  • Self-supervised vision models have rapidly closed the gap with supervised models, with recent works achieving near state-of-the-art accuracy purely from unlabeled data.
  • The future of AI likely involves integrating joint embedding architectures with world models to build autonomous, agentic systems capable of reasoning, planning, and safely interacting with the real world.