Weak Supervision for Diverse Datatypes - Fred Sala | Stanford MLSys #51

Share this & earn $10
Published at : January 27, 2022

Episode 51 of the Stanford MLSys Seminar Series!

Efficiently Constructing Datasets for Diverse Datatypes
Speaker: Fred Sala

Abstract:
Building large datasets for data-hungry models is a key challenge in modern machine learning. Weak supervision frameworks have become a popular way to bypass this bottleneck. These approaches synthesize multiple noisy but cheaply-acquired estimates of labels into a set of high-quality pseudolabels for downstream training. In this talk, I introduce a technique that fuses weak supervision with structured prediction, enabling WS techniques to be applied to extremely diverse types of data. This approach allows for labels that can be continuous, manifold-valued (including, for example, points in hyperbolic space), rankings, sequences, graphs, and more. I will discuss theoretical guarantees for this universal weak supervision technique, connecting the consistency of weak supervision estimators to low-distortion embeddings of metric spaces. I will show experimental results in a variety of problems, including learning to rank, geodesic regression, and semantic dependency parsing. Finally I will present and discuss future opportunities for automated dataset construction.

Bio:
Frederic Sala is an Assistant Professor in the Computer Sciences Department at the University of Wisconsin-Madison and a research scientist at Snorkel AI. His research studies the foundations of data-driven systems, with a focus on machine learning systems. Previously, he was a postdoctoral researcher in the Stanford CS department. He received his Ph.D. in Electrical Engineering from UCLA.

--

0:00 Presentation
30:00 Discussion

Stanford MLSys Seminar hosts: Dan Fu, Karan Goel, Fiodar Kazhamiaka, and Piero Molino
Executive Producers: Matei Zaharia, Chris RĂ©

Twitter:
https://twitter.com/realDanFu​
https://twitter.com/krandiash​
https://twitter.com/w4nderlus7

--

Check out our website for the schedule: http://mlsys.stanford.edu
Join our mailing list to get weekly updates: https://groups.google.com/forum/#!forum/stanford-mlsys-seminars/join

#machinelearning #ai #artificialintelligence #systems #mlsys #computerscience #stanford #weaksupervision #snorkel #wisconsin #ucla #diversedata Weak Supervision for Diverse Datatypes - Fred Sala | Stanford MLSys #51
SupervisionDiverseDatatypes