A new AI assistant is set to challenge ChatGPT, having outpaced OpenAI in launching an AI voice assistant. Moshi, developed by Kyutai, an independent non-profit AI research lab in France, boasts a multimodal model that integrates seeing, hearing, and speaking capabilities with 70 distinct emotions and conversation styles.
Kyutai recently demonstrated Moshi in Paris, showcasing it as the world’s first public real-time generative voice AI. The team spent six months developing Moshi, which can offer advice on topics like climbing Mount Everest and recite poems with a French accent. Kyutai plans to release the model and its research in the coming weeks.
Moshi is positioned against OpenAI‘s GPT-4o, another voice AI model capable of real-time inference and responses. However, GPT-4o’s full voice capabilities won’t be available until the fall. “We believe that Moshi has great potential to change the way we communicate with machines and through machines,” said Patrick Perez, CEO of Kyutai.
Despite expert warnings about AI dangers, numerous startups and tech giants like Anthropic, Cohere, and Google are racing to compete with OpenAI’s GPT-4. In May, OpenAI introduced ChatGPT Plus, a voice assistant with image recognition and fast responses, originally slated for release in a few weeks but delayed until the fall due to feature adjustments.
OpenAI faced backlash for using a voice resembling actress Scarlett Johansson in an AI demo, which they withdrew following legal action from the actress.
Kyutai’s Perez announced plans to release Moshi’s models and research as open source, with the code freely available. He described Moshi as “the first public real-time voice AI assistant.” A statement from Kyutai on Wednesday reiterated the service’s experimental prototype status and promised the release of models and research soon.
Founded in November with €300 million in funding from notable figures like Xavier Niel, Rodolphe Saade, and former Google CEO Eric Schmidt, Kyutai has recruited researchers from Google’s DeepMind and Meta. Chief Science Officer Herve Jegou addressed security concerns, stating that the lab will use indexing and watermarking tools to track audio.
This groundbreaking development positions Moshi as a significant player in the AI voice assistant market, promising to reshape human-machine interactions.
Related topics:
Decoding the World of Machine Learning: A Comprehensive Guide to Classification
Is Deep Learning Unsupervised Learning? Unraveling the Complex Relationship
Machine Learning VS Deep Learning: Understanding the Core Differences