Microsoft releases three AI models for speech, picture, and transcription.

Nisha
April 02, 2026

Microsoft releases three AI models for speech, picture, and transcription.

Microsoft has announced three new artificial intelligence models under its Microsoft AI (MAI) family, strengthening its push into multimodal AI capabilities for developers. The newly launched models — MAI-Transcribe-1, MAI-Voice-1, and MAI-Image-2 — focus on transcription, speech generation, and image creation.

These models are now available through Microsoft Foundry and the MAI Playground. Foundry, previously known as Azure AI Studio, is the company’s unified platform for building, customizing, and scaling generative AI applications and agents. The MAI Playground serves as a public testing environment where users can experiment with features and provide feedback.

According to Mustafa Suleyman, the models have been developed with a strong focus on safety and responsible AI practices. He stated that they have undergone rigorous testing and “red-teaming,” and that Foundry includes built-in guardrails, governance tools, and enterprise-grade controls to ensure secure and compliant deployment at scale.

MAI-Transcribe-1 is a speech-to-text model that supports transcription in 25 widely used languages, including Hindi. Microsoft claims the model achieves lower word error rates compared to competing systems like Gemini and GPT-Transcribe. It can process batch transcriptions up to 2.5 times faster than the company’s earlier Azure Fast offering, with pricing starting at $0.36 per hour.

MAI-Voice-1 enables developers to create custom voices using just a few seconds of input audio. The model is capable of generating up to 60 seconds of audio in a single second, making it suitable for real-time applications. Pricing begins at $22 per one million characters.

Meanwhile, MAI-Image-2, Microsoft’s latest image generation model, is now widely available after initially being introduced in the MAI Playground. The company says it delivers at least twice the generation speed of earlier versions while maintaining output quality. Pricing starts at $5 per one million text tokens and $33 per one million image tokens.

Microsoft is also integrating these models into its existing ecosystem, including products like Copilot, Bing, and PowerPoint. With enterprise adoption already underway, the launch highlights Microsoft’s continued investment in expanding AI capabilities across multiple formats, including text, voice, and images.

Tata Consultancy Services revives lateral hiring drive after Infosys targets specialist programmers in aggressive talent push

Apr 04, 2026 Elena

Israel deploys AI to enhance precision of air raid alert system amid evolving security needs

Apr 04, 2026 Elena

Tata Consultancy Services resumes lateral hiring push after Infosys intensifies search for specialist programmers

Apr 04, 2026 Elena

Wipro launches AI-native business unit amid leadership reshuffle to accelerate digital transformation

Apr 04, 2026 Nisha

Myntra CEO Nandita Sinha to step down, Sharon Pais likely replacement: sources

Apr 02, 2026 Elena

View All Posts

Microsoft releases three AI models for speech, picture, and transcription.

Tata Consultancy Services revives lateral hiring drive after Infosys targets specialist programmers in aggressive talent push

Israel deploys AI to enhance precision of air raid alert system amid evolving security needs

Tata Consultancy Services resumes lateral hiring push after Infosys intensifies search for specialist programmers

Wipro launches AI-native business unit amid leadership reshuffle to accelerate digital transformation

Myntra CEO Nandita Sinha to step down, Sharon Pais likely replacement: sources

Tata Consultancy Services resumes lateral hiring push after Infosys intensifies search for specialist programmers

Wipro launches AI-native business unit amid leadership reshuffle to accelerate digital transformation

Myntra CEO Nandita Sinha to step down, Sharon Pais likely replacement: sources

The Trump administration appeals a decision that prevented the Pentagon from taking action against Anthropic over an AI disagreement.

OpenAI purchases the tech talk show TBPN: WSJ

Trump administration defends Anthropic blacklisting in US court

Why Oracle Corporation Is Laying Off Employees in 2026: Inside the Strategy Shift

Oracle Layoffs 2026: (30000 Employee)Tech Giant Cuts Thousands of Jobs Amid Massive AI Push

L&T Technology Services to Sell Smart World Unit for ₹452 Crore to Sharpen AI Focus

Muted Q4 Awaits IT Large Caps as AI Gains Take Longer Than Expected

News Details

Microsoft releases three AI models for speech, picture, and transcription.

Related News

Tata Consultancy Services resumes lateral hiring push after Infosys intensifies search for specialist programmers

Wipro launches AI-native business unit amid leadership reshuffle to accelerate digital transformation

Myntra CEO Nandita Sinha to step down, Sharon Pais likely replacement: sources

The Trump administration appeals a decision that prevented the Pentagon from taking action against Anthropic over an AI disagreement.

OpenAI purchases the tech talk show TBPN: WSJ

Trump administration defends Anthropic blacklisting in US court

Why Oracle Corporation Is Laying Off Employees in 2026: Inside the Strategy Shift

Oracle Layoffs 2026: (30000 Employee)Tech Giant Cuts Thousands of Jobs Amid Massive AI Push

L&T Technology Services to Sell Smart World Unit for ₹452 Crore to Sharpen AI Focus

Muted Q4 Awaits IT Large Caps as AI Gains Take Longer Than Expected