Will the future of the internet be sonic? Global Voice Network Proposal

The World Wide Web (WWW) and the WWW Browser have permeated our lives and revolutionized how we get information and entertainment, how we socialize, and how we run our business.

Using new tools that make developing voice-based agents easy and inexpensive, researchers at Stanford University are now proposing to create the Worldwide Voice Web (WWvW), a new version of the World Wide Web that people will be able to navigate fully. using sound.

About 90 million Americans already use smart speakers to broadcast music and news, as well as to carry out tasks such as ordering groceries, scheduling appointments, and controlling lights. But two companies primarily control these voice network voice gates, at least in the US – Amazon, which pioneered Alexa; And Google that developed the Google Assistant. In fact, the two services are walled gardens. These oligopolies create large imbalances that allow technology owners to prefer their own products over those of competing firms. They control what content should be made available, and what fees should be charged to act as intermediaries between companies and their customers. On top of all that, their smart speakers put privacy at risk as they eavesdrop on conversations for as long as they are connected.

The Stanford team, led by computer science professor Monica Lam at the Stanford Open Virtual Assistant Laboratory (OVAL), has developed an open source privacy-preserving virtual assistant called Genie and cost-effective voice proxy development tools that can offer an alternative to proprietary platforms. The scientists also hosted a workshop on November 10 discussing their work and proposing the design of the Worldwide Audio Web (watch the full event).

What is WWvW?

Just like the World Wide Web, the new WWvW is decentralized. Organizations post information about their voice proxies on their websites, which can be accessed by any virtual assistant. Lamm says WWvW’s voice proxies are like web pages, providing information about their services and apps, and the virtual assistant is the browser. These voice agents can also be made available as chatbots or call center agents, making them available on the computer or over the phone as well.

“WWvW has the potential to reach more people than WWW does, including those who are not technically savvy, who do not read and write well, or who may not speak a written language,” says Lamm. For example, Stanford associate professor of computer science Chris Beech is working with graduate students Moses Domboya and Lisa Einstein to develop audio technology for three African languages ​​that could help bridge the gap between illiteracy and access to valuable resources including agricultural information. and medical care. “Unlike the commercial audio web led by Amazon and Google, which is only available in select markets and languages, the decentralized WWvW enables the community to provide audio information and services in every language and for every use, including education and other humanitarian causes that Lam says.

Why were these tools not created before? It’s very difficult to create audio technology, says the Stanford team. Amazon and Google have invested massive amounts of money and resources to provide artificial intelligence techniques for natural language processing to their assistants and hire thousands of people to annotate training data. “The technology development process has been very expensive and labor intensive, creating a significant barrier to entry for anyone trying to offer commercial-grade intelligent voice assistants,” Lamm says.

Unleash the Genie

Over the past six years, Lamm has worked with Stanford doctoral student Giovanni Campana, computer science professor James Landai, and Christopher Manning, professor of computer science and linguistics, at OVAL to develop a new methodology for developing a two-order sound factor. More efficient in using samples than current solutions. Pre-trained, open-source Genie Agent Genie software provides significantly reduced costs and resources in developing voice agents in different languages.

Interoperability is a key component of ensuring that devices can interact with each other seamlessly, Lamm points out. At the core of Genie’s technology is a distributed programming language they created for virtual assistants called ThingTalk. It enables the interoperability of many virtual assistants, web services, and IoT devices. Stanford University is currently offering the first course on ThingTalk, Conversational virtual assistants using deep learningthis fall.

As of today, Genie has agents pre-trained for the most popular audio skills such as playing music, podcasts, news, restaurant recommendations, reminders, and timers, as well as support for more than 700 IoT devices. These agents are publicly available and can be applied to other similar services.

World Audio Web Conference

The OVAL team presented these concepts at a workshop focused on the World Audio Web on November 10.

The conference featured speakers from academia and industry with expertise in machine learning, natural language processing, human-computer interaction, and IoT devices, and panelists discussed building a voice system, pre-trained agents, and the social value of the voice network. The Stanford team also conducted a live show for Genie.

“We want other people to join us in building the global audio web,” says Lamm, who is also a faculty member at the Stanford Institute for Human-Centered Artificial Intelligence. “The original World Wide Web grew slowly at first, but once it caught on there was nothing stopping it. We hope to see the same with the World Wide Web.”

Jenny is an ongoing research project funded by the National Science Foundation, the Alfred P. Sloan Foundation, the Verdant Foundation, and Stanford Hay.