For my final project, I wanted to explore the idea of disembodied voices in our current social environments- chats that amount to chatter through shared media and collective perspectives. The earliest computer voice I encountered was probably the answering machine on our landline, which guided us through various voicemail messages. In the future, I anticipate spending more and more time in a headset, so I wanted to find a bridge between “real” reality and virtual reality by taking this historic voicemail interaction to a speculative environment. I also wanted to explore different ways in which we can create feelings of co-presence in virtual scenes.
What I ended up with is an interface for people to view my stream in real-time and send me messages by commenting on my channel. Their messages would generate message objects in my scene, which they could then see and hear in the same stream.
Twitch is a streaming platform that is predominantly used by gamers in order to follow each other’s gameplay and comment in real time. I decided to use this as both a streaming tool to share my perspective from my headset to my own twitch channel. By using the Twitch API with Socket IO for Unity, I set up my project to instantiate spheres to fall every time someone would comment on my channel. I then used the IBM Watson Text to Speech SDK to say “new message” with each new sphere that would appear in my scene, and read out the chat message when I collided with it. I streamed out my game view or headset display by using OBS to stream directly to my twitch channel.
I ran into a few technical hurdles trying to get Socket IO for Unity to work with the Watson API without breaking my project. After some trial and error I found a version of a Socket IO package that was compatible with other integrations. There was also limited documentation on how to use the Text to Speech function in the Watson SDK for Unity, but finally got it working with some help.
I definitely want to keep expanding on this project and interaction flow, now I’ve set up the technical framework. There are so many different directions to go with this- for example using the username data from Twitch to assign voices, or having the content of the messages instantiate different objects. While testing out my project, I asked my sister who’s in college 3 hours away, to send me messages while looking at my stream, and it was a really sweet way to connect with each other in real time. I look forward to more experiments!
During my first class at ITP, my physical computing professor asked us to form groups and come up with an idea for an ultimate “fantasy machine”, a machine that didn’t have to adhere to the laws of physics, or technical and monetary restrictions. My group devised a “Tea with Mum portal”, inspired by our longing to connect with our family overseas in a more meaningful way. It allowed two users to be scanned before entering their own booths, each equipped with scanned tea cups. They could then see each other and their cups as holograms and change their scenery to wherever they’d decide. The ability to see each other’s drink was key – beyond heightening the feeling of presence or enabling interactions like “cheersing”, it would set the tone for a communication based on a fundamental and cross-cultural experience of having tea, or coffee, or a drink of any kind with someone.
Based on this initial inspiration, for my culinary physics final, I want to explore social and cultural constructs around consuming food and beverages based on a shared assumption of reality. By taking a simple interaction of sharing a drink with another, I’m interested in shifting the context of reality by changing the setting to a virtual environment in order to see if any sense of social cohesion could carry over.
I want to explore concepts we touched upon with the Citizens app, where anyone can stream out their perspective and essentially become a camera. To take it a step further, users who are following someone else’s stream would be able to affect the streamer’s environment through sending a message, ideally by voicing their message in a localized space in a virtual environment. In order to do this, I plan on using Twitch to stream out a video feed of a VR headset, and use the Twitch API with Socket.io for Unity. The environment will be a simple space with a mirror in it, and the streaming user in the headset (most likely myself), will embody an avatar, seen through the mirror.
key things I’m going for:
a sense of copresence virtually through voice
share someone’s virtual perspective synchronously and be able to affect that environment via text to speech
explore any behavioral shifts that might occur from having chat texts voiced out loud
For my avatar final, I would like to explore a pipeline that’s traditionally used for gamers to stream their experience, in order to allow people I know in my real social life to control my behavior in a virtual social space. I’m interested in fostering a collaborative and collective experience through one avatar, as well as what it means to be able to switch between experiences to follow.
I have been exploring some twitch feeds of VRChat users and find the interaction between the streamer / user and Twitch “followers” fascinating. The Twitch follower can be both a passive observer, as well as as a potential active participant. The streamer is always notified of any new followers or comments they receive and can choose to acknowledge them.
I’d like to upload an avatar to either the High Fidelity or VRChat platforms, and invite my friends to participate in a time-controlled event (or series of events) to experience my field of view and control my actions. Ideally I would like to have another friend in the virtual space to interact with, in order to heighten the level of immersion for the followers- and as a bonus, switch between my feed and my friend’s. I will view any incoming twitch messages on my HMD and either respond or react to them accordingly.
It’s important to me that this interaction takes place within my existing social circle, in order to contain the experience somewhat and hopefully make it more cohesive, even if this means far less followers (I’m ok with this experiment just involving even one twitch follower).
Despite multiple attempts, this was the best scan that I was able to get. The structure sensor kept having issues around the second turn at the back of my right leg, so some of the arm and leg are textured with the wooden floor. Unfortunately the ER didn’t have any LED lights available at the time, which I’m sure would’ve improved the results. next time!
Below is a link to my rather basic voice chat running on Node.js on Digital Ocean. I took the example chat application and added Speech Synthesis with the Web Speech API to make the text input “talk”. I wanted to make the voice change randomly, but sadly couldn’t get it to work!
For my first assignment I tried to make a speaking chat thread, where users could type in their message and the text would speak for itself. I wanted to create a hypothetical space, or voice for The Difference where two unidentified text producers (human or machine) could communicate in real time to reveal a narrative. Though this particular work would be antithetical to the author’s intent, I thought it could be interesting to create the environment all the same, with the voice constantly changing.
I tried to make each chat entry to be voiced in a randomized voice, so as to not assign any particular voice or characteristics- but got stuck in the process- one of the issues was when I tried to create a variable for the array for window.speechSynthesis.getVoices, nothing would return.
Making my avatar with Fuse was an emotionally taxing process. First off, I started over multiple times because I was indecisive about whether to go for one of the more realistic models, or the animated. I settled on the animated one finally, and ended up spending far too much time trying to get the facial features right, only to realize I could actually rotate the avatar to look at it from sideways- which of course, looked crazy.
After trying to fix the face from all angles, I realized I was never going to get the eyes right, or the nose, or the lips for that matter.I thought about going for a more abstract representation, but it was past the point of no return. In hindsight, I wish I had gone with the more realistic model to see if it could produce a more accurate representation.As a cop out, I change my skin color and hair, and body metalness (which I realized I could change much later in the process) to push the eery over to scary.
So here’s my Fuse avatar that sort of vaguely might resemble me but not quite:
My bitmoji on the other hand, was much easier in comparison. Having preset options as opposed to a seemingly infinite combination of sliders was much simpler. And in the end, I think the bitmoji looks much more like me, perhaps because it’s so abstracted:
discussion thoughts based on readings:
what happens when someone’s preferred avatar is a representation that is traumatizing for someone else? (i.e a hitler avatar) and if that behavior is illegal in certain countries, how do we regulate a global community of players (and/or should we)?
Ethical issues on commenting on physical appearances of avatars- do harassment laws apply?
Will gender fluidity in games influence cultural expressions of identity?
I downloaded and tried out an app called Citizen , which is a synchronous platform that shows its users real-time crime or emergency activity in any given neighborhood. Users can either search incidents based on their current location, or look up a neighborhood for a list of recent or even “trending” activity. The incidents seem to be generated by police reports and 911 calls; on their site they write, “Citizen monitors a variety of public data sources using proprietary technology, allowing us to provide real-time alerts for crime and other emergency incidents”.
There are a few interesting features on the app, including a chat for each incident, as well as video streaming capabilities. There are chats for both specific incidents, which can add a level of user input and information (or judgemental commentary), as well as neighborhoods in general, which can create a stronger sense of community (or animosity). The location based video function feels similar to Snap Map, where snapchat stories that have geotags are visually represented on an actual map- some of my friends have found functional applications for this in their research for their upcoming vacations.
A few months back, I had the thought of how some time in the future we might be able to switch our perspectives between different people’s cameras when everyone wears a device that can capture 360 video and feed it into our eyes. I was sitting on a fire escape watching a row of fire trucks turn the corner and disappear, but I could hear them stop just up the street. It would be incredibly convenient, but also terrifying to be able to tune into the perspective of some pedestrian up the street in order to know what’s going on.
Anyhow, the video function on this emergency information app definitely seems like a step toward user-generated live media with a focus on our real physical community and surroundings, which is cool. I also like that the live video is used more so as a tool with an outward sort of emphasis, rather than the often times insular instagram stories, which rely heavily on the front facing camera.