FOCI GenAI/LLM Users Group: "Bias in Bias Evaluation: The Need for Multifactor Bias Benchmarking of LLMs" (21 Feb)

FOCI LLM/GenAI Users Group

Hannah Powers

Ph.D. Student

Computer Science

Amos Eaton 214, Rensselaer Polytechnic Institute

Wed, February 21, 2024 at 6:00 PM

Pizza at 5:30p

WHAT: "The Need for Multifactor Bias Benchmarking of LLMs "
LEADER: Hannah Powers
VIDEO: TBD
EVENT PAGE: https://bit.ly/foci_llm_users_21feb2024
WHEN: 6p, 21 Feb
CONTACT: Aaron Green <greena12@rpi.edu>

DESCRIPTION: LLMs have shown a capacity for producing toxic and biased responses to even innocuous prompts. Bias benchmarks exist to evaluate models for trustworthiness and identify at-risk subgroups from model responses. However, these benchmarks exhibit gaps that bias the analysis. Furthermore, existing analyses lack the statistical foundations needed to make definitive conclusions about the model's biases. We propose a method of identifying gaps in existing benchmarks and a multi-factor bias analysis of LLMs to identify key factors behind model behavior.

Recordings of previous FOCI GenAI Users Group sessions:

Hannah Powers is a second year PhD student in the computer science department at RPI. Her current research involves evaluating the trustworthiness of large language models. She has an interest in ethical machine learning and artificial intelligence.

Search

FOCI GenAI/LLM Users Group: "Bias in Bias Evaluation: The Need for Multifactor Bias Benchmarking of LLMs" (21 Feb)