« Back to Posts

Speak, Memory: What Are the Implications of Conversational VR?

Last week I found myself on the other side of camera, being interviewed for a piece that allowed people to ask questions and hear answers in VR. While most of the project is still under wraps, it provided an interesting glimpse into a future of interactivity in the medium — one that has amazing immersion and education possibilities, but is also rife with ethical implications.

In my case, the answers I gave will be presented as I stated them, and the person asking the question will be posed the query I was asked by the director. For this use case, the interviewer is in a more immersive and intimate environment, but it’s clear that my conversation was pre-recorded and I’m not speaking in real time.

Viewing some of the other interviews that were produced, I did find it more absorbing than watching a flat interview. And even though I was not coming up with the questions on my own, I felt a greater sense of agency. There’s always going to be a human element in all of this as well — I found myself annoyed with some of the ditzier subjects, but I had the fortune of being able to take off the headset and end the “conversation” without offending anyone; a big improvement over real life, as anyone who has been cornered by an idiot at a party knows.

But more than the current state of play, this experience and the accompanying technology offered a fascinating glimpse into what could come next. Only a few days before I sat before a green screen, news broke about a voice imitation algorithm called Lyrebird. The company claims to be able to “mimic the speech of a real person but shift its emotional cadence — and do all this with just a tiny snippet of real world audio.”

Imagine the implications for a moment. If a VR agency were to build a replica of a person in a game engine and use this API, it might be possible to have conversations with almost anyone that could seem totally real. On one hand, this could be a massive boon for education. Anyone who has a recorded voice could be interviewed in real time. Presumably their speech could be programmed to mimic other recorded or written words to find patterns and predict answers. In the not-too-distant future, a student learning about history wouldn’t need to read a book. They could just slip on a headset and chat with a historical figure.

The downside of this should be carefully considered, though.

It would make it extraordinarily easy to manipulate this technology to spread misinformation — the Lyrebird release even included snippets a fake speeches from real political leaders. Allowing anyone to just program public figures’ statements would lead VR directly into the fake news fire consuming much of social media. Because VR is much more immersive, the consequences could be even more devastating.

Conversational VR is likely to be a major trend in content, and the possibilities for education and empathy are massive. But we need to keep a careful eye on how we use the technology, and make sure that it isn’t used to spread misinformation and fear.