The Research and Applied AI Summit (RAAIS) is a community for entrepreneurs and researchers who accelerate the science and applications of AI technology. In the lead up to our 5th annual event on June 28th 2019 in London, we’re running a series of speaker profiles to shed more light on what you can expect to learn on the day!
At this year’s RAAIS, we are excited to support a talented group of young AI researchers who we believe will be driving the field forward in years to come. Alex Ratner is a 5th year PhD student in Computer Science at Stanford University working with Chris Ré in the DAWN, Info, and StatsML labs at Stanford, where he leads the Snorkel project (snorkel.stanford.edu). You will remember this project from Chris Ré’s talk at RAAIS 2018 on software 2.0. Alex’s research focuses on applying data management and statistical learning principles to emerging machine learning workflows. These include creating and managing training data and applying this to real-world problems in medical imaging and monitoring, knowledge base construction, and more.
Alex’s work has received a number of awards including VLDB 2018 “Best Of”, and his graduate work is supported by a Stanford Bio-X SIGF fellowship. Before beginning his PhD at Stanford, Alex earned an AB in Physics from Harvard College.
One of the key bottlenecks in building machine learning systems today is creating and managing large labelled training datasets. In his talk at RAAIS 2019, Alex will describe his group’s work on systems to support and accelerate ways of creating training data in higher-level, programmatic, but noisier ways---often referred to as weak supervision. This work is motivated by the observation that ML developers spend an increasing amount of their time doing training data engineering---i.e. labelling, augmenting, reshaping, cleaning, and maintaining training datasets---and that we can better support these emerging workflows with both data management and statistical learning tools and principles. Alex will describe Snorkel, their open-source system for training data labelling (snorkel.stanford.edu), that can reduce training data creation time from months to days. He will also present recent work around data augmentation and multi-task supervision; and applications of this work in domains ranging from medical imaging to unstructured data extraction.