Microsoft Introduces Multimodal SLMs Trained on NVIDIA GPUs

Microsoft Introduces Multimodal SLMs Trained on NVIDIA GPUs

James Ding Feb 26, 2025 15:38

Microsoft unveils new Phi SLMs, including the multimodal Phi-4, trained on NVIDIA GPUs, enhancing AI capabilities with efficient resource usage.

Microsoft Introduces Multimodal SLMs Trained on NVIDIA GPUs

Microsoft has announced the latest additions to its Phi family of small language models (SLMs), featuring the new Phi-4-multimodal and Phi-4-mini models, both trained using NVIDIA GPUs. This development marks a significant step in the evolution of language models, focusing on efficiency and versatility, according to NVIDIA.

Advancements in Small Language Models

SLMs have emerged as a practical solution to the challenges posed by large language models (LLMs), which, despite their capabilities, require substantial computational resources. SLMs are designed to operate efficiently within constrained environments, making them suitable for deployment on devices with limited memory and computational power.

Microsoft’s new Phi-4-multimodal model is particularly noteworthy for its ability to process multiple types of data, including text, audio, and images. This capability opens up new possibilities for applications such as automated speech recognition, translation, and visual reasoning. The model’s training involved 512 NVIDIA A100-80GB GPUs over 21 days, underscoring the intensive computational efforts required to achieve its capabilities.

Phi-4-multimodal and Phi-4-mini

The Phi-4-multimodal model boasts 5.6 billion parameters and has demonstrated superior performance in automated speech recognition, ranking first on the Huggingface OpenASR leaderboard with a word error rate of 6.14%. This achievement highlights the model’s potential in enhancing speech recognition technologies.

Alongside Phi-4-multimodal, Microsoft also introduced Phi-4-mini, a text-only model optimized for chat applications. With 3.8 billion parameters, Phi-4-mini is designed to handle long-form content efficiently, offering a context window of 128K tokens. Its training involved 1024 NVIDIA A100 80GB GPUs over 14 days, reflecting the model’s focus on high-quality educational data and code.

Deployment and Accessibility

Both models are available on Microsoft’s Azure AI Foundry, providing a platform for designing, customizing, and managing AI applications. Users can also explore these models through the NVIDIA API Catalog, which offers a sandbox environment for testing and integrating these models into various applications.

NVIDIA’s collaboration with Microsoft extends beyond just training these models. The partnership includes optimizing software and models like Phi to promote AI transparency and support open-source projects. This collaboration aims to advance AI technology across industries, from healthcare to life sciences.

For more detailed information, visit the NVIDIA blog.

Image source: Shutterstock