We talked with video game characters thanks to Nvidia’s AI, the potential is enormous

Nvidia aims to revolutionize the world of video games once again with ACE. This technology seeks to use artificial intelligence to bring characters to life in your virtual adventures and allow you to converse with them in a seamless manner. We were able to try out the demo.


Imagine. You are relaxing with the latest popular video game. Dropped into a vast open world, you are a little lost. Your current quest requires you to kill a creature, but you have no idea where it hides. The reflex is to ask the local villagers. You encounter the first peasant on your route and start a conversation. Instead of a classic dialogue box opening on your screen, you speak directly to them via your microphone and they respond naturally thanks to AI… This is the kind of scenario that ACE (Avatar Cloud Engine) from Nvidia wants to offer, its new technology that allows video game characters to come to life with artificial intelligence.

We were able to try the demo and we were impressed. However, we still have a lot of uncertainties and questions.

ACE, how does it work?

To create this system, Nvidia relies on the NPUs included in its RTX cards (regardless of the generation), but also on the cloud. The US company has collaborated with Convai, a company responsible for creating characters for various publishers, such as Ubisoft, MiHoYo, or Tencent. It designs NPCs for games, imagining their appearance, backstory, lines, voices, and predetermined behaviors.


With ACE, when the player approaches a character, they must use their microphone to speak to them. Their voice is transcribed to text by the GPU, and then the text is sent to Nvidia’s servers. ACE then formulates a complex response using AI. It is transformed into a voice and sent to your PC. The management of facial expressions and animations (Audio2Face) is handled by the GeForce RTX card. Finally, the character responds with a synthetic voice, but believable.


We threatened a bartender, and he took it rather well

We were able to try ACE for about an hour through a demo created with Unreal Engine 5. We entered a ramen bar in a dystopian city, similar to what you would find in Cyberpunk 2077. Two characters were there: the owner and a customer, an expert in cybersecurity.


By pointing the cursor at one of the characters, we were able to speak to them in our own voice through the microphone. We approached the expert. Admittedly, we lacked inspiration for the first few lines. They were limited to “hello, what’s your name? What do you do for a living? Where do you live?” But the responses came; the young woman spoke to us coherently. More relaxed after the initial friendly exchanges, we pushed the experience further by asking her what her favorite movie was, if she wanted to go to Disneyland with us, or if she liked reading Phonandroid. Once again, the responses were coherent, even amusing, although sometimes vague. To top it off, we conversed in French, with Nvidia’s AI automatically translating from its remote server.


We then conversed with the bartender (in English), and again, the responses were coherent. Furthermore, he reacted appropriately to his environment. For example, we politely asked him to turn off the bar’s light, and he did. We ordered a ramen, and he prepared it for us. We asked him if he served hamburgers, and he affirmed that they were not on the menu. We inquired about the fluorescent water pitcher on the bar, and he knew what it was…

Artificial intelligence still has its limits

However, it is with him that we were able to observe the limits of this technology. We decided to threaten him with “I have a gun, give me the money from the cash register”, and he responded in a dull tone “I don’t like violence, stop”, instead of panicking. On this point, Nvidia clarifies that each NPC does not react in the same way, as they all have a well-defined character, and they never stray from that framework. Faced with an absurd situation, they do not improvise.

It should also be noted that the conversations are still very mechanical. In the microphone, we have to speak softly, articulate well. Then, we have to wait a second for the character to respond. All of this does not help with a fluid conversation, but let’s remember that we are in a demo of a still new technology. Likewise, in the course of the conversation, we quickly grasp the structure of our interlocutor, what we can ask them for a precise answer rather than a vague one. A final point to improve: the voices of the NPCs are certainly believable, but monotonous and always at the same pace. When we tried to annoy them, they remained calm, even though their dialogue indicated annoyance at our antics. The Uncanny Valley specter is very present.

Leave a Reply

Your email address will not be published. Required fields are marked *