Amos Eaton 214
WHAT: "LLMs for Audio Applications"
LEADER: Abraham Sanders
VIDEO: https://youtu.be/GcwauiJI_Ck
SLIDES: https://idea.rpi.edu/sites/default/files/2024-05/FOCI%20LLM%20May%202024.pptx.pdf
EVENT PAGE: https://bit.ly/foci_llm_users_01may2024
WHEN: 6p, 1 May
CONTACT: Aaron Green <greena12@rpi.edu>
DESCRIPTION: In this talk we explore how Audio Language Models work and how to use them. Specifically, we look at how audio waveforms are converted into sequences of discrete tokens that can be handled by an autoregressive language model and converted back again into raw audio. We then review how such audio language models can be applied to common audio tasks such as Automatic Speech Recognition (ASR), Text-To-Speech (TTS) and Speech-To-Speech Machine Translation. We conclude with a discussion of future-focused applications including text-guided music generation and full-duplex spoken dialogue agents.
BIO: Abraham is a third-year PhD student in Cognitive Science at RPI, working in the LACAI lab with Dr. Tomek Strzalkowski. His research interests include open-domain and goal-oriented conversational agents along with multimodal, natural spoken dialogue systems. Previously, he was a lead software engineer at Nextech Systems where he worked on electronic health record system interoperability.
Slides and a recording of this talk will be available after 01 May
Recordings of previous FOCI GenAI Users Group sessions:
- 20 Sep 2023: Introduction to Large Language Models
- 18 Oct 2023: A Guide into Open-Source Large Language Models and Fine-Tuning Techniques
- 15 Nov 2023: "Beyond Autocomplete: Instruction Following & CoT Reasoning in LLM Agents"
- 31 Jan 2024: "The Large Language Model for Mixed Reality (LLMR)"
- 21 Feb 2024: "The Need for Multifactor Bias Benchmarking of LLMs"
- 01 May 2024: "LLMs for Audio Applications"