Keynotes – IEEE International Workshop on Machine Learning for Signal Processing (MLSP) 2024

Title: Learning from Unreliable Labels via Crowdsourcing

Georgios B. Giannakis, University of Minnesota

Abstract: Crowdsourcing, as the name suggests, harnesses the information provided by crowds of human annotators to perform learning tasks, such as word tagging in natural language processing, crowdsensing, and ChatGPT, among others. Even though crowdsourcing can be efficient and relatively inexpensive, combining the noisy, scarce, and potentially adversarial responses provided by multiple annotators of unknown expertise can be challenging, especially in unsupervised setups, where no ground-truth data is available.

Focusing on the classification task, the first part of this talk will touch upon models and algorithms for label fusion along with their performance. Approaches will be also discussed for data-aware crowdsourcing, and links will be outlined with deep-, self-supervised, and meta-learning. Aiming to robustify crowdsourced classification against adversarial attacks, the last part will cover spectrum based algorithms to flag and mitigate the effect of spammers. If time allows, means of dealing with dependednt annotators will be discussed briefly.

BIO: Georgios B. GIANNAKIS received his Diploma in Electrical Engr. (EE) from the Ntl. Tech. U. of Athens, Greece, 1981. From 1982 to 1986 he was with the U. of Southern California (USC), where he received his MSc. in EE, 1983, MSc. in Mathematics, 1986, and Ph.D. in EE, 1986. He was with the U. of Virginia from 1987 to 1998, and since 1999 he has been with the U. of Minnesota (UMN), where he held an Endowed Chair of Telecommunications, served as director of the Digital Technology Center 2008-21, and since 2016 he holds a UMN Presidential Chair in ECE.

His interests span the areas of statistical learning, communications, and networking – subjects on which he has published more than 495 journal papers, 805 conference papers, 26 book chapters, two edited books and two research monographs. Current research focuses on Data Science with applications to IoT, and power networks with renewables. He is the (co-) inventor of 36 issued patents, and the (co-)recipient of 10 best journal paper awards from the IEEE Signal Processing (SP) and Communications Societies, including the G. Marconi Prize. He received the IEEE-SPS Norbert Wiener Society Award (2019); EURASIP’s A. Papoulis Society Award (2020); Technical Achievement Awards from the IEEE-SPS (2000) and from EURASIP (2005); the IEEE ComSoc Education Award (2019); and the IEEE Fourier Technical Field Award (2015). He is a member of the Academia Europaea, Greece’s Academy of Athens, and Fellow of the National Academy of Inventors, the European Academy of Sciences, UK’s Royal Academy of Engineering, Life Fellow of IEEE, and EURASIP. He has served the IEEE in several posts, including that of a Distinguished Lecturer for the IEEE-SPS.

Title: Score-based Diffusion Models: Data Generation and Inverse Problems

Yuejie Chi, Carnegie Mellon University

Abstract: Diffusion models, which convert noise into new data instances by learning to reverse a Markov diffusion process, have become a cornerstone in generative AI. While their practical power has now been widely recognized, the theoretical underpinnings remain far from mature. We first develop a suite of non-asymptotic theory towards understanding the data generation process of diffusion models in discrete time for both deterministic and stochastic samplers, highlighting fast convergence under mild data assumptions. Motivated by this theory, we then advocate diffusion models as an expressive data prior in solving ill-posed inverse problems, and introduce a plug-and-play method (DPnP) to perform posterior sampling. DPnP alternatively calls two samplers, a proximal consistency sampler solely based on the forward model, and a denoising diffusion sampler solely based on the score functions of the data prior. Performance guarantees and numerical examples will be demonstrated to illustrate the promise of DPnP.

Bio: Dr. Yuejie Chi is the Sense of Wonder Group Endowed Professor of Electrical and Computer Engineering in AI Systems at Carnegie Mellon University, with courtesy appointments in the Machine Learning department and CyLab. She received her Ph.D. and M.A. from Princeton University, and B. Eng. (Hon.) from Tsinghua University, all in Electrical Engineering. Her research interests lie in the theoretical and algorithmic foundations of data science, signal processing, machine learning and inverse problems, with applications in sensing, imaging, decision making, and AI systems. Among others, Dr. Chi received the Presidential Early Career Award for Scientists and Engineers (PECASE), SIAM Activity Group on Imaging Science Best Paper Prize, IEEE Signal Processing Society Young Author Best Paper Award, and the inaugural IEEE Signal Processing Society Early Career Technical Achievement Award for contributions to high-dimensional structured signal processing. She is an IEEE Fellow (Class of 2023) for contributions to statistical signal processing with low-dimensional structures.

Title: Out-of-Distribution Detection via Multiple Testing

Venu Veeravalli, University of Illinois at Urbana-Champaign

Abstract: Out-of-Distribution (OOD) detection in machine learning refers to the problem of detecting whether the machine learning model’s output can be trusted at inference time. This problem has been described qualitatively in the literature, and a number of ad hoc tests for OOD detection have been proposed. In this talk we outline a principled approach to the OOD detection problem, by first defining the problem through a hypothesis test that includes both the input distribution and the learning algorithm. Our definition provides insights for the construction of good tests for OOD detection. We then propose a multiple testing inspired procedure to systematically combine any number of different OOD test statistics using conformal p-values. Our approach allows us to provide strong guarantees on the probability of incorrectly classifying an in-distribution sample as OOD. In our experiments, we find that the tests proposed in prior work perform well in specific settings, but not uniformly well across different types of OOD instances. In contrast, our proposed method that combines multiple test statistics performs uniformly well across different datasets, neural networks and OOD instances.

Bio: Prof. Veeravalli received the Ph.D. degree in Electrical Engineering from the University of Illinois at Urbana-Champaign in 1992. He is currently the Henry Magnuski Professor in the Department of Electrical and Computer Engineering (ECE) at the University of Illinois at Urbana-Champaign, where he also holds appointments with the Coordinated Science Laboratory (CSL), the Department of Statistics, and the Discovery Partners Institute. He was on the faculty of the School of ECE at Cornell University before he joined Illinois in 2000. He served as a program director for communications research at the U.S. National Science Foundation in Arlington, VA during 2003-2005. His research interests span the theoretical areas of statistical inference, machine learning, and information theory, with applications to data science, wireless communications, and sensor networks. He is currently the Editor-in-Chief of the IEEE Transactions on Information Theory. He is a Fellow of the IEEE and a Fellow of the Institute of Mathematical Statistics (IMS). Among the awards he has received for research and teaching are the IEEE Browder J. Thompson Best Paper Award, the U.S. Presidential Early Career Award for Scientists and Engineers (PECASE), the Abraham Wald Prize in Sequential Analysis (twice), and the Fulbright-Nokia Chair in Information and Communication Technologies.

Title: Uncertainty Quantification for Detecting Hallucinations in Large Language Models

András György, Google DeepMind

Abstract: Detecting hallucinated answers is an important task to ensure factuality of large language models (LLMs). When no external information is available, hallucinations are often identified based on the uncertainty of the predictions. On the other hand, uncertainty can be epistemic or aleatoric, where the former comes from the lack of knowledge about the ground truth (such as about facts or the language), and the latter comes from irreducible randomness (such as multiple possible answers), and only epistemic uncertainty is related to hallucinations. In this talk I will overview some methods to estimate uncertainty in general (in particular, taking into account that the same things can be expressed in natural language multiple ways), and present some new results on uncertainty estimation with theoretical guarantees, with special considerations to estimating if the epistemic uncertainty is large. The latter approach allows detecting hallucinations for both single- and multi-answer queries, which is in contrast to many standard uncertainty quantification strategies which are not able to detect hallucinations in the multi-answer case.

Bio: András György is a Senior Staff Research Scientist at Google DeepMind, London, UK. He received his Ph.D. from the Budapest University of Technology and Economics, Hungary. He held research positions at the Institute for Computer Science and Control (SZTAKI), Hungary, leading the Machine Learning Research Group, and at the University of Alberta, Canada. He was also a faculty member at the Department of Electrical and Electronic Engineering, Imperial College London, UK. His research interests include machine learning, statistical learning theory, online learning, optimization and, more recently, large language models. Among others, Dr. György received a best paper award at the 7th IEEE Global Conference on Signal and Information Processing (GLOBALSIP2019) in 2019, a best paper runner-up award at the 34th Annual Conference on Learning Theory (COLT 2021), the Gyula Farkas prize of the János Bolyai Mathematical Society in 2001, and the Academic Golden Ring of the President of the Republic of Hungary in 2003.

Title: The Performance of Meaning: Breathing Life into Text

Andrew Breen, Amazon

Abstract: Generative AI-based technologies such as Alexa, Amazon Q, Claude v3, GPT-4o, and Gemini v1.5, are revolutionizing how we interact with machines through advances in LLMs and Foundation Models. These AI systems, trained on massive amounts of text data, can now hold conversations, translate languages, write creatively, and answer questions in a natural way. Initially, such interactions were text-based, but increasingly, these systems are offering spoken language interfaces. Text, though powerful and natural, lacks the nuance and ease of spoken language, which has the power to move us deeply, conveying unspoken thoughts, desires, and passions. It is the very essence of human interaction, from everyday conversations to artistic expression. When combined with other modalities, spoken language offers an unparalleled medium for communication and creativity. A machine capable of replicating this is the holy grail of human-machine communication. By unlocking the full expressive power of spoken language, LLMs are fundamentally transforming how we interact with machines. Virtual assistants such as Alexa have pioneered natural spoken language interactions with machines since 2014. Such systems showcase direct interaction with the real world, grounding knowledge in reality, and taking actions (e.g., manage shopping lists, control smart home appliances, etc.), but remain limited in the complexity of human-machine discourse they can support. The advent of LLMs has broken through that “glass ceiling”, promising truly human-like discourse.

This talk will explore the history of speech generation, its transformation from a niche field to an everyday technology. We’ll delve into the groundbreaking impact of neural speech generation in 2016, which revolutionized what was thought possible just two years prior, and show how LLMs are poised to offer equally astonishing breakthroughs. The talk concludes by offering a glimpse into the exciting future of human-machine interaction powered by spoken language.

Bio: Andrew Breen has a B.Sc. Hons, in Physics with Computing Physics from University College Swansea, an M.Sc. (Eng.) by research from Liverpool University, and a Ph.D. in Speech Science from University College London. He is a long standing member of the Institution of Engineering and Technology (MIET). He was awarded the IEE J. Langham Thomson premium in 1993, and has received business awards from BT, MCI and Nuance. Andrew has been an industrial representative on two European funded projects. He was a founder of SSW (Speech Synthesis Workshop), and has been on the organising committee for InterSpeech. Andrew worked for a number of years at BT Labs., initially on Automatic Speech Recognition (ASR), and then lead teams on Text-To-Speech (TTS), Avatars and multi-modal distributed systems. While at BT he invented the Laureate TTS system. In 1999 he joined the University of East Anglia as a Sr. Lecturer, but two years later join Nuance as founder of their TTS organisation. After Nuance’s acquisition by ScanSoft, he took on a number of roles, including head of TTS research and languages, Head of TTS research and product, Director of embedded TTS for automotive, and Director of TTS Research and Product Development in India and China. He has extensive experience of managing remote teams, having lead teams across the US, Europe, India and China. In 2017 he joined Amazon as head of TTS research, managing the team which produced Amazon’s first Neural TTS system. He is currently research director of speech and audio generation in Amazon’s AGI organisation.

Title: Lossy Image Compression with Diffusion

Lucas Theis, Google DeepMind

Abstract: Lossy compression of image, audio, and video signals has traditionally been performed through transform coding. With the advent of deep learning, more powerful transforms have been proposed whose parameters are learned from data instead of being designed manually. However, these methods still follow the transform coding paradigm and can be understood from a decades-old rate-distortion perspective. More recently, new approaches have been proposed that no longer fit this framework, enabled by new theoretical insights into perceptual quality and new techniques in generative AI and information theory. In this presentation, I will discuss a conceptually simple approach based on diffusion and channel simulation, and how it can be used to achieve better realism-distortion trade-offs. We find that it works surprisingly well despite the lack of an analysis transform.

Bio: Lucas Theis is the founder of a startup currently in stealth. He was previously a Senior Research Scientist at Google DeepMind working on compression using neural networks. Before that, he worked at Twitter after the acquisition of a London-based startup in 2016, where he worked on video compression. Lucas received his PhD from the Max Planck Research School in Tübingen, where he worked on deep generative models in the lab of Matthias Bethge.