As language models become more advanced, their training and deployment demand increasingly significant resources. Large-scale models, while impressive in performance, are often out of reach for many organizations facing infrastructure and cost constraints. This disparity between capabilities and practical deployment poses a real challenge, particularly for businesses aiming to integrate language models into real-time systems or environments where cost efficiency is crucial.
Recently, small language models (SLMs) have gained attention as a feasible alternative, offering lower memory and computational needs without greatly sacrificing performance. However, many SLMs struggle with consistency across various tasks and are designed with compromises that may affect generalization or usability.
ServiceNow AI Unveils Apriel-5B: Advancing Practical AI at Scale
To tackle these issues, ServiceNow AI has introduced Apriel-5B, a new series of small language models aimed at optimizing inference speed, training efficiency, and versatility across domains. With 4.8 billion parameters, Apriel-5B is compact enough for deployment on basic hardware while still holding its ground in a variety of instruction-following and reasoning tasks.
The Apriel models come in two versions:
- Apriel-5B-Base, a pretrained model suitable for further tuning or integration into workflows.
- Apriel-5B-Instruct, an instruction-tuned variant designed for chat, reasoning, and task execution.
Both models are available under the MIT license, encouraging open research and wider adoption in both academic and commercial applications.
Design Highlights and Technical Features
Apriel-5B was trained on a dataset exceeding 4.5 trillion tokens, designed to cover multiple task domains like natural language understanding, reasoning, and multilingual capabilities. The model uses dense architecture aimed at efficient inference, incorporating key technical elements such as:
- Rotary positional embeddings (RoPE) with an 8,192-token context window, facilitating long-sequence tasks.
- FlashAttention-2, for faster attention computation and enhanced memory usage.
- Grouped-query attention (GQA), reducing memory use during autoregressive decoding.
- Training in BFloat16, compatible with modern accelerators while ensuring numerical stability.
These design choices enable Apriel-5B to maintain speed and responsiveness without needing specialized hardware or extensive parallel processes. The instruction-tuned model was polished using curated datasets and supervised methods, enhancing its ability to handle instruction-following tasks with minimal input.
Evaluation and Benchmark Performance
Apriel-5B-Instruct has been tested against popular models such as Meta’s LLaMA 3.1–8B, Allen AI’s OLMo-2–7B, and Mistral-Nemo-12B. Despite its smaller size, Apriel performs well across several benchmarks:
- Outperforms OLMo-2–7B-Instruct and Mistral-Nemo-12B-Instruct on general tasks overall.
- Surpasses LLaMA-3.1–8B-Instruct in math-related tasks and IF Eval, a test of instruction-following consistency.
- Uses considerably less computing power—2.3x less GPU hours—than OLMo-2–7B, highlighting its training efficiency.
These results suggest that Apriel-5B strikes a productive balance between lightweight deployment and task versatility, especially where real-time performance and limited resources are necessary.
Conclusion: Enhancing the Model Landscape
Apriel-5B reflects a well-considered approach to small model development, focusing on balance over sheer scale. By concentrating on inference throughput, training efficiency, and instruction-following performance, ServiceNow AI has developed a model family that’s easy to deploy, adaptable for various uses, and openly accessible for integration.
Its strong performance on mathematics and reasoning benchmarks, combined with a permissive license and resource-efficient design, makes Apriel-5B an attractive option for teams integrating AI into products, digital agents, or workflows. In a field increasingly defined by accessibility and practical application, Apriel-5B represents a significant step forward.
Explore ServiceNow-AI/Apriel-