Real-Time Learning: The Leap LLMs Must Take
It has been a few years since purposefully writing on Artificial Intelligence (AI) where my last was titled โWhen AI Creates & Solves Proofsโ... Well a lot has changed since then. It was a little less than a year after OpenAI launched ChatGPT-3, where I realized that we (human society) had changed. Not the technology "per say" of Large Language Models (LLMs) or Generative AI (GenAI), but the human energy, prioritization, and dollars focused on developing LLM capabilities, as well as its utilization.
When I started to formally study Computer Science back in college during the 1990s, I wanted to explore/focus on AI and the surrounding eco-system. It was the era of Symbolic AI and to be frank, seemed like miles and miles (and miles) away from anything of what we see today. With Deep Learning and AlexNet really getting going back in 2010-2011, I was off creating adaptive algorithms and data structures at a startup called Deep Information Sciences where the external marketing team wanted to call this work, Machine Learning (ML). In other words, it was now ok (maybe even cool?) to say ML and not get laughed out of the board room.
Fast forward, when GenAI was at its height around 2023-2024, there was a big question if LLMs were the future architecture to achieve Artificial Generative Intelligence (AGI). As I explored these architectures and read the tsunami of research coming out each and every day, I came to a conclusion. Yes, if certain problems were resolved and certain capabilities were added.
My own research led me to publish, earlier this year, two academic papers on new equations describing such a learning system(s):
L = Learning, K = Knowledge, I = IQ = Intelligence, and C = Consciousness.
๐๐จ๐ง๐๐๐ฉ๐ญ๐ฎ๐๐ฅ ๐๐๐๐ฉ๐ญ๐๐ญ๐ข๐จ๐ง ๐๐ก๐๐จ๐ซ๐ฒ (CAT): Mind & Machine - https://doi.org/10.31219/osf.io/gbqdc_v1
๐๐ง๐ข๐๐ข๐๐ ๐๐ก๐๐จ๐ซ๐ฒ ๐จ๐ ๐ญ๐ก๐ ๐๐๐๐ซ๐ง๐ข๐ง๐ ๐๐ง๐ข๐ฏ๐๐ซ๐ฌ๐ (UTLU): Emergent Coherence - https://doi.org/10.5281/zenodo.15467528
I wrote these papers as a theory to not only solve for AGI, but to promote that LLMs are the substrate for future AGI architectures. Yet, before I started talking about CAT/UTLU publicly, I began sharing these papers with colleagues in this field. Some could see (or maybe feel) the path forward where others were a bit more skeptical, looking to other architectures like Yann Lecunโs JEPA.
I probably had 20+ 1:1 meet ups with folks that could see the possibilities, where others needed some key questions answered. The following are 3 ways to help, what I call the โnon-believersโ converting into โbelieversโ or something close to that.
1. The Limitations of Todayโs Training Techniques
The dominant methods for improving LLMs post pre-training are fine-tuning, reinforcement learning, prompt engineering, context window, and retrieval-augmented generation (RAG), which have been remarkable, but each is fundamentally constrained.
Pre-training provides vast static knowledge, but the moment training ends, its โworldviewโ is frozen. Good but...
Fine-tuning injects updates, but itโs slow, expensive, and can overwrite prior capabilities (catastrophic forgetting).
Prompt engineering is clever, but it doesnโt make the model learn, itโs a user workaround, not an architectural upgrade.
Large Context Window peaking before deteriorating, without weight updates, knowledge fades and true understanding stalls.
Retrieval Augmented Generation pulls in external facts, but the model remains unchanged, not growing its own knowledge base and encouraging hallucinations.
These are useful tools for managing models, not for enabling them to grow. Without a new paradigm, LLMs risk becoming encyclopedias: massive, impressive, but static. This is why the real breakthrough will come from LLM architectures that embed continuous, real-time learning into their core.
2. The โLanguageโ in LLM is Just "One" Input/Output
The โLanguageโ in Large Language Model isnโt a limitation, itโs an artifact from much earlier days. Transformers started with text because it was abundant and easy to tokenize, but in truth, language here just means โa sequence of symbols.โ
Seen this way, โLanguageโ can be replaced with variable "X", transforming the architecture into a Large Modality Model (LMM):
Text is merely one manifestation of structured data the model can learn from.
Audio can be represented as token sequences of waveforms or spectrograms.
Video can be represented as token sequences of frame embeddings, pattern captures.
Whatโs powerful is not the ability to ingest multiple types of data, itโs the ability to abstract and generalize across them. A model that sees an image of a cat, reads the word โcat,โ and hears a meow should not just memorize each form independently, it should merge them into a single, shared concept of โcatโ.
This is where emergent properties arise: the model moves from digital raw token inputs to conceptual representations that transcend any single modality. Once the concept exists in its generalized internal representation, the model can reason about โcatโ regardless of whether the input was text, image, audio, or videoโฆ and generate output in any of those formats as well.
That shift (from memorizing modality-specific examples to generalizing abstract concepts) is what makes multi-modal architectures (LMM) not just broader, but smarter. Pair that with continuous learning, and the system evolves its conceptual map of the world in continuous real-time, unlocking one of the essential prerequisites for AGI.
3. Continuous Real-Time Learning: The Missing Piece
Todayโs LLMs can answer questions, generate text, and even solve complex problems, but they cannot learn new knowledge on the fly. Updating them still requires retraining or fine-tuning, akin to making a student return to school for months just to learn a new skill.
Humans donโt work that way. We integrate new facts, skills, and feedback into our mental model as we experience them. Real-time learning in AI would:
Keep models current in fast-changing domains without massive retraining cycles.
Enable deep personalization, where the model evolves uniquely for each user.
Allow emergent reasoning to build on experience, not just pre-training.
Reduce hallucinations via active, feedback-driven updates.
In CAT/UTLU, this means embedding a learning loop that continually refines the modelโs parameters in response to interaction and environment, bridging the gap between static knowledge and growing intelligence. This is the missing piece that transforms LLMs from impressive tools into true Learning Agentic Systems (LAS)โฆ