Weak Supervision for Diverse Datatypes - Fred Sala | Stanford MLSys #51
Published at : January 27, 2022
Episode 51 of the Stanford MLSys Seminar Series!
Efficiently Constructing Datasets for Diverse Datatypes
Speaker: Fred Sala
Abstract:
Building large datasets for data-hungry models is a key challenge in modern machine learning. Weak supervision frameworks have become a popular way to bypass this bottleneck. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. In this talk, I introduce a technique that fuses weak supervision with structured prediction, enabling WS techniques to be applied to extremely diverse types of data. This approach allows for labels that can be continuous, manifold-valued (including, for example, points in hyperbolic space), rankings, sequences, graphs, and more. I will discuss theoretical guarantees for this universal weak supervision technique, connecting the consistency of weak supervision estimators to low-distortion embeddings of metric spaces. I will show experimental results in a variety of problems, including learning to rank, geodesic regression, and semantic dependency parsing. Finally I will present and discuss future opportunities for automated dataset construction.
Bio:
Frederic Sala is an Assistant Professor in the Computer Sciences Department at the University of Wisconsin-Madison and a research scientist at Snorkel AI. His research studies the foundations of data-driven systems, with a focus on machine learning systems. Previously, he was a postdoctoral researcher in the Stanford CS department. He received his Ph.D. in Electrical Engineering from UCLA.
--
0:00 Presentation
30:00 Discussion
Stanford MLSys Seminar hosts: Dan Fu, Karan Goel, Fiodar Kazhamiaka, and Piero Molino
Executive Producers: Matei Zaharia, Chris RĂ©
Twitter:
https://twitter.com/realDanFu
https://twitter.com/krandiash
https://twitter.com/w4nderlus7
--
Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!forum/stanford-mlsys-seminars/join
#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford #weaksupervision #snorkel #wisconsin #ucla #diversedata
Efficiently Constructing Datasets for Diverse Datatypes
Speaker: Fred Sala
Abstract:
Building large datasets for data-hungry models is a key challenge in modern machine learning. Weak supervision frameworks have become a popular way to bypass this bottleneck. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. In this talk, I introduce a technique that fuses weak supervision with structured prediction, enabling WS techniques to be applied to extremely diverse types of data. This approach allows for labels that can be continuous, manifold-valued (including, for example, points in hyperbolic space), rankings, sequences, graphs, and more. I will discuss theoretical guarantees for this universal weak supervision technique, connecting the consistency of weak supervision estimators to low-distortion embeddings of metric spaces. I will show experimental results in a variety of problems, including learning to rank, geodesic regression, and semantic dependency parsing. Finally I will present and discuss future opportunities for automated dataset construction.
Bio:
Frederic Sala is an Assistant Professor in the Computer Sciences Department at the University of Wisconsin-Madison and a research scientist at Snorkel AI. His research studies the foundations of data-driven systems, with a focus on machine learning systems. Previously, he was a postdoctoral researcher in the Stanford CS department. He received his Ph.D. in Electrical Engineering from UCLA.
--
0:00 Presentation
30:00 Discussion
Stanford MLSys Seminar hosts: Dan Fu, Karan Goel, Fiodar Kazhamiaka, and Piero Molino
Executive Producers: Matei Zaharia, Chris RĂ©
Twitter:
https://twitter.com/realDanFu
https://twitter.com/krandiash
https://twitter.com/w4nderlus7
--
Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!forum/stanford-mlsys-seminars/join
#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford #weaksupervision #snorkel #wisconsin #ucla #diversedata

SupervisionDiverseDatatypes