Oct 6, 2025
IBM is launching Granite 4, the next generation of IBM language models—featuring a newhybrid Mamba/transformer architecture that greatlyreduces memory requirementswithout sacrificing performance.
According to IBM, these models can be run on significantly cheaper GPUs and at significantly reduced costs compared to conventional LLMs.
The new Granite 4.0 offerings, open sourced under a standardApache 2.0 license, are the world’s first open models to receiveISO 42001 certificationand arecryptographically signed,confirming their adherence to internationally recognized best practices for security, governance, and transparency, said IBM.
Granite 4.0 models are available on IBM watsonx.ai, as well as through platform partners including Dell Technologies on Dell Pro AI Studio and Dell Enterprise Hub, Docker Hub, Hugging Face, Kaggle, LM Studio, NVIDIA NIM, Ollama, OPAQUE, and Replicate. Access through Amazon SageMaker JumpStart and Microsoft Azure AI Foundry is coming soon.
The launch of Granite 4.0 initiates a new era for IBM’s family of enterprise-ready large language models, leveraging novel architectural advancements to double down on small, efficient language models that provide competitive performance at reduced costs and latency.
The Granite 4.0 models were developed with a particular emphasis on essential tasks for agentic workflows, both in standalone deployments and as cost-efficient building blocks in complex systems alongside larger reasoning models, said IBM.
The Granite 4.0 collection comprises multiple model sizes and architecture styles to provide optimal production across a wide array of hardware constraints, including:
- Granite-4.0-H-Small, a hybrid mixture of experts (MoE) model with 32B total parameters (9B active)
- Granite-4.0-H-Tiny,a hybrid MoE with 7B total parameters (1B active)
- Granite-4.0-H-Micro,a dense hybrid model with 3B parameters.
- This release also includesGranite-4.0-Micro, a 3B dense model with a conventional attention-driven transformer architecture, to accommodate platforms and communities that do not yet support hybrid architectures.
Granite 4.0 benchmark performance shows substantial improvements over prior generations—even the smallest Granite 4.0 models significantly outperform Granite 3.3 8B, despite being less than half its size—but their most notable strength is aremarkable increase in inference efficiency.
Relative to conventional LLMs, the hybrid Granite 4.0 models require significantly less RAM to run, especially for tasks involving long context lengths (such as ingesting a large codebase or extensive documentation) and multiple sessions at the same time (such as a customer service agent handling many detailed user inquiries simultaneously).
Most importantly, this dramatic reduction in Granite 4.0’s memory requirements entails a similarly dramatic reduction in the cost of hardware needed to run heavy workloads at high inference speeds.
IBM’s aim is to lower barriers to entry by providing enterprises and open-source developers alike with cost-effective access to highly competitive LLMs, the company said.
For more information about this news, visit www.ibm.com.