FOCI GenAI/LLM Users Group: "LLMs for Audio Applications" (01 May 2024)

Posted April 26, 2024

6p Weds, 01 May 2024
Amos Eaton 214

WHAT: "LLMs for Audio Applications"
LEADER: Abraham Sanders
VIDEO: https://youtu.be/GcwauiJI_Ck
SLIDES: https://idea.rpi.edu/sites/default/files/2024-05/FOCI%20LLM%20May%202024.pptx.pdf
EVENT PAGE: https://bit.ly/foci_llm_users_01may2024
WHEN: 6p, 1 May
CONTACT: Aaron Green <greena12@rpi.edu>

DESCRIPTION: In this talk we explore how Audio Language Models work and how to use them. Specifically, we look at how audio waveforms are converted into sequences of discrete tokens that can be handled by an autoregressive language model and converted back again into raw audio. We then review how such audio language models can be applied to common audio tasks such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and Speech-To-Speech Machine Translation. We conclude with a discussion of future-focused applications including text-guided music generation and full-duplex spoken dialogue agents.

BIO: Abraham is a third-year PhD student in Cognitive Science at RPI, working in the LACAI lab with Dr. Tomek Strzalkowski. His research interests include open-domain and goal-oriented conversational agents along with multimodal, natural spoken dialogue systems. Previously, he was a lead software engineer at Nextech Systems where he worked on electronic health record system interoperability.

Slides and a recording of this talk will be available after 01 May

Recordings of previous FOCI GenAI Users Group sessions:

Search

FOCI GenAI/LLM Users Group: "LLMs for Audio Applications" (01 May 2024)