Bytes

Nvidia reveals Omniverse avatars that look real, talk real, act real

“This was pretty much impossible five years ago and barely so today.” Jensen Huang, CEO

Published

November 11, 2021

Nvidia has given us our first look at Omniverse Avatar, a technology platform for generating interactive AI avatars. We’ve become accustomed to thinking of avatars as digital representations of ourselves to play dress-up, but in Nvidia’s world – make that Omniverse – they can see, speak, converse on a wide range of subjects, and understand naturally spoken intent.

Avatars created in the platform are interactive characters with ray-traced 3D graphics that could not only make for truly engaging non-player characters in a role-playing game, but also provide customer service in retail, tech support in engineering and more.

“The dawn of intelligent virtual assistants has arrived.”
Jensen Huang, Nvidia

Building blocks

Omniverse Avatar connects the company’s technologies in speech AI, computer vision, natural language understanding, recommendation engines and simulation technologies.

Technologies used:

Speech recognition: Nvidia Riva
Natural language understanding: Megatron 530B
Recommendation engine: Nvidia Merlin
Perception capabilities: Nvidia Metropolis
Avatar animation: Nvidia Video2Face and Audio2Face

These technologies are composed into an application and processed in real time using the Nvidia Unified Compute Framework. Packaged as scalable, customisable microservices, the skills can be securely deployed, managed and orchestrated across multiple locations by Nvidia Fleet Command.

“The dawn of intelligent virtual assistants has arrived,” said Jensen Huang, founder and CEO of Nvidia. “Omniverse Avatar combines Nvidia’s foundational graphics, simulation and AI technologies to make some of the most complex real-time applications ever created. The use cases of collaborative robots and virtual assistants are incredible and far-reaching”.

“In the near future, there will be billions of robots to help us do things.”
Jensen Huang, Nvidia

Tokkio and Maxine

“In the near future, there will be billions of robots to help us do things. Some will be physical robots, most will be digital virtual robots. Some virtual robots will be fully autonomous. Others semi-autonomous, or even teleoperated,” Said Huang in his keynote address at Nvidia GTC.

Arguably the most interesting of those ‘robots’ was Project Tokkio – pronounced ‘Tokyo’ – that showed colleagues engaging in a real-time conversation with an avatar crafted as a toy replica of the CEO himself, conversing on such topics as biology and climate science.

Project Tokkio shows colleagues interacting with an avatar in real-time

Huang also updated the audience on Project Maxine that combines tech previously shown by Nvidia to add state-of-the-art video and audio features to virtual collaboration and content creation applications.

“This was pretty much impossible five years ago and barely so today.”
Jensen Huang, Nvidia

“The fundamental technologies of Maxine are just becoming possible – computer vision, neurographics, animation, speech AI, dialogue manager, natural language understanding, recommenders. These are foundational technologies we’ve been talking about for some time. This was pretty much impossible five years ago and barely so today”.

Project Maxine can be stylised or realistic (Excerpt starts at 1:18:56)

You can read more of Huang’s thoughts on Omniverse in general and Nvidia’s plans to create a digital twin of planet Earth in our earlier GTC article.

Written By Steve Takle

Steve is an award-winning editor and copywriter with more than 20 years’ experience specialising in consumer technology and video games. With a career spanning from the first PlayStation to the latest in VR, he's proud to be a lifelong gamer.