Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Building GenAI Applications. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

Diffusion-Based LLMs: A New Era in Language Modeling

3 min read

The field of large language models (LLMs) is undergoing a revolutionary transformation with the introduction of diffusion-based models, breaking away from the conventional autoregressive architecture that has dominated the space. One of the most exciting developments in this regard comes from Inception Labs, which has unveiled Mercury, the first commercial-scale diffusion LLM. Unlike traditional Transformer-based models, Mercury utilizes diffusion to generate text, leading to significantly faster token generation and parallel processing.

The Need for a New Approach

Most existing LLMs, including OpenAI’s GPT-4o and Google’s Gemini models, rely on an autoregressive framework. This means they generate text sequentially, predicting one token at a time from left to right. While this approach has been successful, it imposes inherent speed limitations and computational inefficiencies. Diffusion-based models, on the other hand, take a completely different approach—they start with noise and iteratively refine it into coherent text, much like how image-generation models (e.g., DALL·E, Stable Diffusion) work.

How Diffusion-Based LLMs Work

Rather than sequentially generating tokens, a diffusion LLM like Mercury processes text holistically, denoising an initial noisy representation into a meaningful sequence. This process runs in parallel, allowing for substantial speed improvements. According to Inception Labs, Mercury can generate up to 1,000 tokens per second on existing Nvidia hardware—making it up to 10x faster than frontier autoregressive models.

Andrej Karpathy, a leading AI researcher, has also noted the uniqueness of this approach. Historically, while images and videos have successfully leveraged diffusion models, text and audio have remained resistant. The ability of Inception Labs to make diffusion work for text is a significant breakthrough that could redefine the landscape of LLMs.

Benchmark Performance and Comparisons

Mercury comes in two variants: Mercury Coder Mini and Mercury Coder Small. These models have been benchmarked against industry leaders, including:

  • Gemini 2.0 Flashlight
  • GPT-4o Mini
  • Claude 3.5 sonnet
  • Qwen 2.0 Coder
  • Deepseek Coder V2 Light

The results indicate that while Mercury models match the performance of these models in terms of output quality, they far surpass them in speed.

Practical Demonstrations

Several real-world tests have showcased Mercury’s capabilities in coding-related tasks. A user-generated example involved creating a simple webpage with a button that, upon clicking, selects a random joke, changes the background color, and adds an animation. While early diffusion-based models struggled with such interactive tasks, Mercury managed to generate functional code on its first attempt.

Another test involved generating an animation of falling letters in JavaScript with realistic physics, collision detection, and adaptive screen size. Mercury efficiently produced the required output, demonstrating its effectiveness in handling complex coding tasks. The model iteratively refined its outputs in real-time, offering an innovative way to generate and optimize code.

Limitations and Future Prospects

Despite its impressive speed and efficiency, Mercury still faces limitations:

  • Rate Limits: Currently, users are limited to 10 requests per hour, likely due to resource constraints.
  • Early Stage Development: As a new approach, diffusion-based LLMs will require further fine-tuning to match the reasoning capabilities of mature autoregressive models.
  • Limited Variants: Currently, only coding-focused models are available, but future versions will likely extend to general text generation and multimodal capabilities (text + image + video).

The Bigger Picture: The Future of AI Architectures

The introduction of diffusion-based LLMs is part of a broader trend of exploring alternative architectures. Another notable effort comes from Liquid AI, which has introduced Liquid Foundation Models (LFM), an alternative to Transformers. However, real-world performance tests of LFMs have not been as promising as diffusion-based LLMs like Mercury.

As Inception Labs moves towards API releases, the adoption of Mercury could set the stage for a new era of ultra-fast, parallelized language models. If successful, diffusion-based LLMs may reshape how AI interacts with humans—offering instantaneous responses, improved multi-modal capabilities, and new agentic workflows.

Conclusion

Mercury’s launch marks a pivotal moment in AI research. The ability to generate text through diffusion could revolutionize LLM architectures, making models significantly faster while maintaining competitive performance. While challenges remain, this innovation opens the door to a future where AI-generated content is more efficient and scalable than ever before.

For those eager to experiment, Mercury is currently available for testing at chat.inceptionlabs.ai. You can also read more about the official announcement on Inception Labs’ website. Additionally, for those looking for an open-source alternative, Large Language Diffusion Models are worth exploring.

As AI researchers and engineers continue to push boundaries, diffusion-based LLMs could be a game-changer in the AI landscape.

Rajandran R Creator of OpenAlgo - OpenSource Algo Trading framework for Indian Traders. Building GenAI Applications. Telecom Engineer turned Full-time Derivative Trader. Mostly Trading Nifty, Banknifty, High Liquid Stock Derivatives. Trading the Markets Since 2006 onwards. Using Market Profile and Orderflow for more than a decade. Designed and published 100+ open source trading systems on various trading tools. Strongly believe that market understanding and robust trading frameworks are the key to the trading success. Building Algo Platforms, Writing about Markets, Trading System Design, Market Sentiment, Trading Softwares & Trading Nuances since 2007 onwards. Author of Marketcalls.in

Get Notifications, Alerts on Market Updates, Trading Tools, Automation & More