Where is your company on the AI adoption curve? Take our AI survey to find out.
WellSaid Labs, a startup developing synthetic voice technology, announced today that it has raised $ 10 million in a Series A round led by Fuse, with participation from Voyager, Qualcomm Ventures and GoodFriends. The oversubscribed round will support the company’s R&D and grow its team, according to CEO Matt Hocking.
Creating a natural-sounding speech from text is considered a huge challenge in the field of AI and has been a research goal for decades. Content creators and product designers have long faced tradeoffs between quality and scalability when using text-to-speech tools versus human utterances. But with AI, creators, product developers, and brands have the potential to power experiences with a wide variety of voice styles, accents, and languages at scale. Startups creating virtual beings, or artificial people powered by AI, have collectively raised more than $ 320 million in venture capital to date.
WellSaid was launched in 2018 as a research project at the Allen Institute for Artificial Intelligence, a laboratory started by Microsoft co-founder Paul Allen with a mission to conduct fundamental artificial intelligence research and engineering. The WellSaid team set out to create the most realistic synthetic voices, with CTO Michael Petrochuck leading R&D to build the key AI.
“What started as a research project … is now a growth stage startup with thousands of clients in media and advertising, technology, manufacturing, defense, pharmaceuticals, healthcare and education,” Hocking told VentureBeat by mail. electronic. “In terms of the fundamentals of the business, [due to the pandemic] our medium and business clients [have] sped up and shifted a substantial amount of its voiceovers and media productions from in-person presence to remote locations. This added more moving parts and quality issues to their productions. “
Speech powered by AI
With WellSaid, companies can choose from a variety of voice avatars and create voiceovers directly from a script, with one or more voices depending on style, genre and type of production. They can make edits to the copy, change the pause or use a different voice, and teach the platform to speak terms with unique spellings and pronunciations. WellSaid also allows users to share projects and files with team members, as well as create voice avatars for branded content, creating avatars from the voice of a real person with just a few hours of recordings.
Over two years, WellSaid gradually improved the naturalness of their synthetic voices, aiming for “human parity,” according to Hocking. In a July 2019 study, the company asked participants to listen to a set of random recordings created by WellSaid and human voice actors and rate them on a scale of 1 to 5, with 5 being the highest quality. The voice actors achieved an average rating of around 4.5, while the Well Said voices scored close to their human counterparts (4,282).
The current focus of WellSaid, based in Seattle, Washington, which has 12 employees, is to improve the platform’s handling of different text lengths and styles, as well as speed up speech generation. The company said it takes about 4 seconds to create a 10-second audio file.
“Companies use WellSaid Studio to create voice over for training and corporate content. They choose WellSaid to optimize their workflows due to the high-quality voices available and to gain profitability, ”continued Hocking. “Product developers integrate [our] API to your experiences to enable voice in your user experience. They depend on voice quality, infrastructure scalability, and real-time rendering unmatched by other vendors. [As for] brands and creators, [they] use WellSaid to create your own exclusive AI voice avatars as per specification. We partner with them to design, build, host and implement their unique AI voices according to their needs and production specifications. “
WellSaid’s technology and comparable offerings from Microsoft, Amazon, Resemble AI, Synthesia, Deepdub, Papercup, and others have fueled concerns about misuse and deep fakes, or synthetic media used for nefarious purposes such as copycat executives during earnings calls. But Hocking said WellSaid doesn’t create voice avatars without the actors’ permission and subscribes to the “Hippocratic Oath for AI“Proposed by Microsoft executives Brad Smith and Harry Shum.
“With WellSaid, companies that might not have been ready to implement synthetic media can now invest in the technology, as it gives them the ability to continue producing and publishing mission-critical content without sacrificing quality,” said Hocking. “We are proud of what we have accomplished and grateful for the business we have built.”
This latest round brings WellSaid’s total raised to date to $ 12 million.
VentureBeat’s mission is to be a digital urban plaza for technical decision makers to gain insight into transformative technology and transact. Our site offers essential information on data technologies and strategies to guide you as you run your organizations. We invite you to become a member of our community, to access:
- updated information on the topics of your interest
- our newsletters
- Exclusive content from thought leaders and discounted access to our treasured events, such as Transform 2021: Learn more
- network features and more
Become a member