Note: This is a group project, and I am only responsible for a portion of the work. Citations have been removed to adjust for website rendering. The original PDF is at the end of this page.
ChatGPT’s potential for human-like communication is noteworthy, but the mental health implications of integrating a real human identity remain understudied. This research focuses on introducing an AvatarGPT, a human-like avatar, to the ChatGPT interface to delve into these effects. A between-subject study (N=10) was conducted to investigate users’ responses, evaluate, and compare the effectiveness of AvatarGPT and ChatGPT before and after a conversation using the UCLA Loneliness Scale. Results show that neither using an avatar (p≈0.39) nor conversing without the avatar (p≈0.11) significantly improved loneliness scores. Additionally, using the avatar did not enhance willingness to speak, as measured by increased word counts, or significantly reducing loneliness score in percentage (p≈0.59). We believe that with a larger participant pool and a longer experimental period, we would be able to observe a more significant increase in emotional change.
Introduction
In recent years, exciting technological advances and research in robotics have led to a wide range of applications in many areas. One kind of robotics called Socially Assistive Robots(SAR) is widely used in health care and emotional support areas for it has the ability to provide emotional, cognitive and social support. These robots have been used to combat cognitive decline and depression through various methods such as making jokes and dancing. However, they provide emotional support only through basic reflective listening techniques and simple interactions, such as repeating or paraphrasing people’s words and their conversation patterns could use a lot of improvement.
Meanwhile, advancements in natural language processing and AI have given rise to innovative tools that offer a new dimension of human-computer interaction. ChatGPT, a language model developed by OpenAI, possesses the ability to understand and generate natural language, enabling smooth conversational interactions with users. Its natural language processing capabilities present immense potential for applications in emotional support and mental health.
Integrating an avatar interface with ChatGPT is primarily motivated by the limitations of text-only input in traditional chatbot interactions. By incorporating an avatar-based interactive interface, it is possible to enhance user engagement and enrich the overall experience. This approach is expected to foster greater initiative in user interactions with the chatbot, making the exchanges more intuitive and personable and improving user interaction and satisfaction.
The integration of LLMs like ChatGPT represents a further leap in mental health care. LLMs offer more natural, deeper conversational interactions, crucial for personalized emotional support and therapy. Incorporating LLMs into SAR with virtual avatars enriches the user experience, providing a sense of real social interaction and deeper emotional resonance and opening new possibilities for personalized and emotional support.
Thus, we implemented a new chat interface called AvatarGPT, ChatGPT with a vivid avatar interface to ’speak out’ the texts answered by ChatGPT, to provide mental health care. We then conducted a between-subject experiment with 10 participants in which some participants were using AvatarGPT (the treatment group) and some were given a basic ChatGPT (the control group). We formulated our research question as follows: Will the presence of a human-like avatar in the interface of ChatGPT increase the feeling of human connection during a conversation and, consequently, help reduce the symptoms of depression, especially loneliness?
To address this research question, we considered the following hypotheses:
H1:
Conversing with ChatGPT with an avatar would reduce user loneliness
H2:
Use avatar would increase speaking willingness
H3:
Conversations with more words would indicate greater reduced loneliness
Contribution
Surprisingly, we found that neither ChatGPT nor AvatarGPT significantly contributes to people’s mental health in terms of reducing loneliness. In addition, we didn’t observe an increased level of willingness to speak or a more significant reduction in loneliness percentage across the two conditions.
Related work
In the field of mental health care, the application of Socially Assistive Robots (SAR) is increasingly being emphasized, especially with the development of AI. Mental health issues pose a significant burden on individuals and society. In the US, about 25% of the population meets the criteria for a diagnosable psychiatric condition each year. Against this backdrop, SAR demonstrates tremendous potential. These robots have already assumed various roles in mental healthcare, including companions, coaches, and play partners.
For example, a robot called Paro has been used as a companion providing emotional support and reducing loneliness, akin to trained therapy animals. Also, simple robots with basic conversational abilities have proven beneficial for otherwise healthy older adults facing social isolation and loneliness.
AI now also become a valuable tool in this field, aiding professionals and enhancing patient care. Some AI systems, like Horyzons, provide accessible social therapy for youths with psychosis, allowing them to share and express emotions in a safe, moderated online environment. This platform reduces the intimidation associated with traditional group therapy by facilitating anonymous interaction. Also, Online Mental Health Communities use AI-mediated communication to better form relationships and communicate between people.
Method
We conducted a between-subjects experiment to investigate the influence of interactions with ChatGPT and AvatarGPT on participants’ self-reported loneliness levels. Participants, recruited from a campus setting, were randomly assigned to one of two groups. Each group engaged in interactive tasks with either ChatGPT or AvatarGPT(modified to include a speaking humanoid avatar integrated into the ChatGPT interface). The experiment included a pre-survey to establish baseline loneliness, 2 interactive tasks for each group, and a post-survey for reassessment. Our study aims to find out how these interactions impact participants’ emotional states and feelings of loneliness.
Pre-Survey: Establish Baseline Level of Loneliness
Prior to the start of the experiment, participants were asked to complete the modifed UCLA Loneliness Scale survey independently, without any integration with ChatGPT. By comparing the pre-survey responses with subsequent post-survey responses, we can effectively measure any changes in participants’ feelings of loneliness resulting from their interactions with AvatarGPT.
Task 1: Question-Answering Task
Participants in this task will engage in a conversation with either ChatGPT or AvatarGPT, during which they will respond to questions related to their personal life, hobbies, or daily challenges. The primary goal is to encourage participants to share their feelings and thoughts openly, as well as to prompt them to pose additional questions. To initiate the task, ChatGPT or AvatarGPT is given the following prompt: “I need you to act like the interviewer, so speak like a human. I need you to ask me questions about life, hobbies, or daily challenges. Engage in a conversation with me. I need you to talk to me and encourage me to share my feelings and pose additional questions. Ask specific questions. Start off the conversation with ’Thank you for coming!’”
The participants are instructed to chat for a duration of 5 minutes. Importantly, during this time, the experimenter should not intervene. After the 5-minute conversation, the task will conclude, and participants will proceed to Task 2.
Task 2: Scenario Generation Task
In this task, participants will engage in collaborative storytelling with either ChatGPT or AvatarGPT, focusing on themes of friendship and support. The objective is to jointly create a story plot that embodies these themes, while also integrating personal experiences and emotional elements to add authenticity and depth to the narrative. Participants are encouraged to work together to construct a story that flows seamlessly and feels interconnected. To initiate the task, participants are instructed as follows: “The goal of this task is to collaboratively create a story plot that revolves around themes of friendship and support in the next 5 minutes. We encourage you to infuse your personal experiences and emotions into the narrative to make it authentic and meaningful. Remember, this is a collaborative effort. Feel free to build on each other’s ideas, creating a narrative that flows seamlessly. Collaboration is key to making our story rich and interconnected. Let’s start with a simple prompt to kick things off. Imagine a setting or a situation where the characters need friendship and support. It could be anything – a challenging moment, a celebration, or even a quiet, reflective scene. Build upon this prompt in your contributions.” After the 5-minute storytelling session concludes, the conversation will stop, and participants will proceed to the post-survey.
Post-Survey: Reassessment of Participant Responses
Following completion of the experimental tasks, participants will be asked to take the same survey that they completed as the pre-survey. The purpose of this post-survey is to assess whether there have been any changes in participant responses or perceptions after engaging in the interactive tasks with ChatGPT or AvatarGPT. By comparing pre-survey and post-survey responses, we aim to examine the potential impact of these interactions on participants’ feelings, experiences, and attitudes. This comparison will enable us to gain insights into the effectiveness of the interactions and their potential influence on various aspects of participants’ well-being and emotional state.
Participants and recruitment
We recruit 10 people via interactions with students on campus. All of them self-identified as 1) at least 18 years old.
Experiment description
In our experiment, we introduced the AvatarGPT by adding an avatar to the right bottom corner of the website ChatGPT interface in the browser window, as shown in 1. The modified interface will be represented to participants in Task 1 and Task 2.
Specifically, we employed a JavaScript program through Tampermonkey to integrate a humanoid face into the corner of the ChatGPT interface. This humanoid face was designed to simulate speaking by moving its mouth synchronously with the textual responses generated by ChatGPT. The primary objective was to create an association in the user’s mind, where they would attribute the words produced by ChatGPT as being spoken by the avatar.
A sample running environment for AvatarGPT
Data collection
In our study, we used a slightly modified UCLA Loneliness Scale to measure participants’ loneliness levels [bit.ly/cse216survey1]. The UCLA Loneliness scale is a well-established questionnaire for assessing loneliness that employs a frequency-based response format. However, we adapted it by retaining the same questions but using an agreeance-based response scale. Participants were required to indicated their agreement levels with 20 negatively phrased statements. This allowed us to gauge participants’ current loneliness within the context of our short-term experiment. Lower scores indicate a more optimistic emotional state, while higher scores reflect a more pessimistic one.
In more detail regarding the evaluation and calculation of scores, the participant’s responses are scored on a scale from 1 to 5 for each question. A score of 1 represents “Strongly Disagree,” indicating that the participant completely disagrees with the negative statement, suggesting a more optimistic emotional state. Conversely, a score of 5 represents “Strongly Agree,” indicating a high level of agreement with the negative statement, which may suggest a more pessimistic emotional state. Once we finish converting all answers from categories to scores, we simply use the average overall responses as the final representation of one’s loneliness scale.
Result
10 people are randomly assigned to two groups, either AvatarGPT group or ChatGPT group. Among 10 participants, 6 (2 Male, 3 Female, 1 prefer not to say) were assigned to ChatGPT group. The age is between 19-25 (M=21.3, Median=21). 4 participants (2 Male, 1 Female, 1 prefer not to say) were assigned to AvatarGPT group. The age is in between 21-24 (M=21.7, Median=21.5).
We collected the score based on 3.2 and analyzed the effect of interactions on participant loneliness for each group by addressing each hypothesis one by one.
H1: Conversing with ChatGPT with an avatar would reduce user loneliness
We calculated the score based on the method mentioned in 3.2, representing the final result as shown in 2 for both groups. We then conducted a Student paired t-test to assess the impact of the interactions for each condition. In ChtGPT group, where participants engaged with the original interface of ChatGPT, we did not observe a significant difference in scores before and after the chat (t≈2.2, p≈0.11). Similarly, in AvatarGPT group, where participants interacted with the avatar version of ChatGPT, we also did not find a significant difference in scores before and after the chat (t≈2.4, p≈0.39). Lower scores indicate lower levels of measured loneliness.
A Student paired t-test shows neither ChtGPT nor AvatarGPT has a significant contribution to reducing an individual’s loneliness score.
H2: Use avatar would increase speaking willingness
From the recorded conversations, we analyzed both the word counts from the subjects and those generated by ChatGPT or AvatarGPT. The result is shown in 3 for each condition.
We conducted a Student paired t-test to analyze the relationship between total word counts and avatar usage. Our results indicated no significant difference between the usage of the avatar and participants’ willingness to type (t$\approx\approx$0.83).
A Student paired t-test shows using an avatar would not increase the willingness to speak significantly.
H3: Conversations with more words would indicate greater reduced loneliness
To quantify the changes in feelings of loneliness, we assess these variations using the percentage change in scores from the Pre-Survey. This method involves comparing the scores of the same participant across two separate surveys and calculating the difference between these scores. Specifically, we record the scores of a participant from two different instances of the survey. Then, we calculate the difference between these two scores. This difference is then divided by the score from the Pre-Survey, resulting in a value that represents the change in feelings of loneliness. This value reflects the extent of change in loneliness between the two tests, essentially representing the proportion of the change in scores relative to the score of the first test. The result for each condition is shown in 4.
We conducted an ANOVA test to examine the relationship between word counts and the change in scores expressed as a percentage. Our analysis revealed no significant difference between these variables (F(1, 8) $\approx\approx$0.59, $\eta^2\approx$0.04). It’s worth noting that a negative score indicates that participants reported heightened feelings of loneliness after the conversation.
An ANOVA test shows using an avatar would not reduce loneliness in terms of the ratio of the score difference.
Limitation
One notable limitation of our study is the lack of validation for the modified version of the UCLA Loneliness Scale employed in this research. While the UCLA Loneliness Scale is well-established measuring loneliness, our adaptation was designed to assess transient or short-term loneliness as a mood rather than a traditional emotional state. Given that there was no widely recognized scale available specifically tailored to the measurement of transient loneliness in the context of our study, this adaptation was a necessary step. However, this modification raises questions about the scale’s validity and whether it accurately captures the targeted construct. future research should consider conducting validation studies to assess the reliability of our adapted scale for measuring short-term loneliness.
Another limitation of our study is the specific avatar chosen for the interactions with participants. We selected an arbitrary humanoid woman avatar presented in black and white, which was aligned with the color scheme of ChatGPT. Individual perceptions of avatars can vary widely, and some participants reported the chosen avatar to be “scary” or “distracting”. A more systematic approach to avatar selection should be considered.
Future work
Exploring the impact of personalized avatar designs and behaviors on user engagement is important as customization options for avatars could enhance interactions. Investigating the long-term effects of both ChatGPT and AvatarGPT on user well-being and attachment to avatars. Comparative studies with other mental health support tools can provide insights into AvatarGPT’s unique advantages and limitations in different contexts, contributing to more effective and responsible AI-driven mental health care systems.
Conclusion
Even though both robotics and large language models are trying to simulate human behavior, our current research result reveals the distance between them and real humans. In this research, our results show that neither using an avatar nor conversing without the avatar significantly improved loneliness scores. Additionally, using the avatar did not enhance willingness to speak, or significantly reduce loneliness score in percentage. We believe that with a larger participant pool and a longer experimental period, we would be able to observe a more significant increase in emotional change.
The original PDF file