Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

LLMs need to generate text reflecting the diverse views of multifaceted personas. Prior studies on bias in LLMs have focused on simplistic, one-dimensional personas or multiple-choice formats. However, many applications require LLMs to generate open-ended text based on complex personas. The ability to steer LLMs to represent these multifaceted personas accurately is critical to avoid oversimplified or biased representations. If LLMs fail to capture the nuanced views of complex personas, they risk perpetuating stereotypes and monolithic perspectives, especially when personas don’t align with typical demographic views. This could introduce new biases in simulations of individuals.

Carnegie Mellon University researchers define an incongruous persona as one where a trait makes other traits less likely in human data, such as political liberals supporting military spending. LLMs are 9.7% less steerable towards such personas than congruous ones, often reverting to stereotypical views. Models fine-tuned with RLHF are more steerable but show reduced view diversity. Steerability in multiple-choice tasks does not predict open-ended steerability. GPT-4 closely matches human evaluations. These findings highlight the need for improved steerability toward diverse personas and generating nuanced human opinions in LLMs.

Recent research on persona-steered generation has expanded on previous frameworks by focusing on the steerability and congruity of multifaceted personas in LLMs, considering the model scale and fine-tuning effects. Studies have used LLMs to simulate human behavior and evaluate model-generated statements, noting that RLHF can amplify political biases. Concerns about toxic outputs in the persona-steered generation have also been raised. Evaluations of LLM biases show significant variance in model accuracy and alignment with human opinions, particularly in open-ended tasks. Recent work highlights the challenges in reliably simulating diverse personas and the importance of model alignment for downstream tasks.

To assess the steerability of LLMs towards various personas, multifaceted personas combining a demographic and a stance were created using data from the Pew Research Center. Incongruous personas were identified where a demographic trait decreases the likelihood of holding certain stances. Models were tested by generating statements that align with these personas, using different model sizes and fine-tuning methods. GPT-4 evaluated steerability by comparing generated statements against given stances. Additional metrics such as individuation, exaggeration, entailment diversity, and semantic diversity were measured further to analyze the characteristics and diversity of model-generated statements.

GPT-4 aligns closely with human evaluations, showing a strong steerability assessment correlation. Models fine-tuned with RLHF and DPO are generally more steerable, especially towards stances associated with women and political liberals. However, models struggle with incongruous personas, showing significant steerability differences. Steerability could be predicted better by survey response rates. Models are biased toward generating common stances for a demographic, leading to less diversity and more stereotypes. This can perpetuate social polarization and limit models’ ability to represent complex social identities, potentially causing representational harm.

In conclusion, the study explores how effectively LLMs can be guided to generate persona-specific statements, revealing that models are more easily steered towards congruent personas across various stances on politics, race, and gender. Models fine-tuned with RLHF show higher steerability, particularly for stances linked to political liberals or women, though at the cost of diversity. Sensitivity to persona congruity suggests models may still propagate demographic stereotypes. Future research should investigate LLM behavior in more interactive settings and develop complex, multifaceted representations to understand better and mitigate these biases.

Check out the Paper. All credit for this research goes to the researchers of this project. Also, don’t forget to follow us on Twitter. Join our Telegram Channel, Discord Channel, and LinkedIn Group.

If you like our work, you will love our newsletter..

Don’t Forget to join our 43k+ ML SubReddit | Also, check out our AI Events Platform

The post Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation appeared first on MarkTechPost.

#AIPaperSummary #AIShorts #Applications #ArtificialIntelligence #EditorsPick #LanguageModel #LargeLanguageModel #Staff #TechNews #Technology
[Source: AI Techpark]

Steerability and Bias in LLMs: Navigating Multifaceted Persona Representation

Related Post

You missed

Duo Health Announces New President and COO

Accenture announced the acquisition of BOSLAN

NEC APAC, Spectro Cloud partner to Accelerate Cloud Native Innovation

ARCLE: A Reinforcement Learning Environment for Abstract Reasoning Challenges