About | Schedule | Accepted Papers | Invited Speakers | Important Dates | CfP | Topics | Submission | People

About

The DEEM workshop will be held on the 14th of May in Chicago, US in conjunction with SIGMOD/PODS 2017. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.

The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments. .

Follow us on twitter @deem_workshop or contact us via email at info [at] deem-workshop [dot] org.

Schedule
Sunday, May 14th


 9:00 -  9:15
Welcome

 9:15 - 10:30
Democratizing Advanced Analytics Beyond Just Plumbing [Academic Keynote]
Arun Kumar (UC San Diego)



Coffee Break


11:00 - 11:30
Data Integration for Machine Learning & Machine Learning for Data Integration [Invited Talk]
Xin Luna Dong (Amazon)

11:30 - 12:00
Model-based Pricing: Do Not Pay for More than What You Learn! [Paper]
Lingjiao Chen, Paraschos Koutris (University of Wisconsin-Madison), Arun Kumar (UC San Diego)

12:00 - 12:30
Towards Automatically Setting Language Bias in Relational Learning [Paper]
Jose Picado, Arash Termehchy, Alan Fern, Sudhanshu Pathak (Oregon State University)


 
Lunch Break


14:00 - 15:00
Machine Learning for Recommender Systems at Twitter [Industry Keynote]
Quannan Li (Twitter)

15:00 - 15:30
EMT: End To End Model Training for MSR Machine Translation [Paper]
Vishal Chowdhary, Scott Greenwood (Microsoft Research)


 
Coffee Break


16:00 - 16:30
Snorkel: Creating Noisy Training Data to Overcome Machine Learning's Biggest Bottleneck [Invited Talk]
Stephen Bach (Stanford)

16:30 - 17:00
Versioning for end-to-end machine learning pipelines [Paper]
Tom van der Weide, Dimitris Papadopoulos, Oleg Smirnov, Michal Zielinski, Tim van Kasteren (Schibsted Media Group)

17:00 - 17:30
On Model Discovery For Hosted Data Science Projects [Paper]
Hui Miao, Ang Li, Larry Davis, Amol Deshpande (University of Maryland)

17:30 - 18:00
Using Word Embedding to Enable Semantic Queries in Relational Databases [Paper]
Rajesh Bordawekar (IBM Research), Oded Shmueli (Technion Haifa)



↑ top

Accepted Papers

Vishal Chowdhary, Scott Greenwood (Microsoft Research)
EMT: End To End Model Training for MSR Machine Translation

Jose Picado, Arash Termehchy, Alan Fern, Sudhanshu Pathak (Oregon State University)
Towards Automatically Setting Language Bias in Relational Learning

Tom van der Weide, Dimitris Papadopoulos, Oleg Smirnov, Michal Zielinski, Tim van Kasteren (Schibsted Media Group)
Versioning for end-to-end machine learning pipelines

Lingjiao Chen, Paraschos Koutris (University of Wisconsin-Madison), Arun Kumar (University of California San Diego)
Model-based Pricing: Do Not Pay for More than What You Learn!

Rajesh Bordawekar, (IBM Research), Oded Shmueli (Technion Haifa)
Using Word Embedding to Enable Semantic Queries in Relational Databases

Hui Miao, Ang Li, Larry Davis, Amol Deshpande (University of Maryland)
On Model Discovery For Hosted Data Science Projects

Invited Speakers
Academic Keynote: Arun Kumar

Arun Kumar is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. He obtained his PhD from the University of Wisconsin-Madison in 2016. His primary research interests are in data management, especially the intersection of data management and machine learning, with a focus on problems related to usability, developability, performance, and scalability. Systems and ideas based on his research have been released as part of the MADlib open-source library, shipped as part of products from EMC, Oracle, Cloudera, and IBM, and used internally by Facebook, LogicBlox, and Microsoft. A paper he co-authored was accorded the Best Paper Award at ACM SIGMOD 2014. He was awarded the 2016 Graduate Student Research Award for the best dissertation research in UW-Madison CS and the Anthony C. Klug NCR Fellowship in Database Systems in 2015.


Industry Keynote: Quannan Li

Quannan Li is a staff software engineer at Twitter, building the realtime recommendation engine MagicRecs. He obtained his Ph.D. on Computer Vision and Machine Learning at UC Los Angeles in 2013 under the supervision of Professor Zhuowen Tu.





Invited Speaker: Xin Luna Dong

Xin Luna Dong is a principal scientist at Amazon since July 2016, leading the efforts to build the Amazon Product Graph. Prior to joining Amazon, she worked for Google and AT&T Labs - Research. She received her Ph.D. in Computer Science and Engineering at University of Washington. Before coming to the United States, she obtained a M.S. in Computer Science at Peking University, and a B.S. in Computer Science at Nankai University in China. Her research interests include data integration, data cleaning, and knowledge management. She recently won the VLDB Early Career Research Contribution Award for "advancing the state of the art of knowledge fusion".


Invited Speaker: Stephen Bach

Stephen Bach is a postdoctoral scholar in the Stanford computer science department. His research focuses on weakly supervised machine learning, statistical relational learning, and information extraction. His goal is to design algorithms and systems that empower people to use machine learning with minimal intervention from computer scientists. He co-leads the development of the Snorkel framework for training data creation using generative models. He was recognized with the Larry S. Davis Doctoral Dissertation Award from the University of Maryland, College Park department of computer science.


↑ top

Important Dates
Submission Deadline: February 1, 2017 extended to Friday, February 17, 2017
Submission Website: https://cmt3.research.microsoft.com/DEEM2017/
Notification of Acceptance: March 17, 2017
Final papers due: March 31st, 2017
Workshop: May 14th, 2017

↑ top

Call for Papers

Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the database community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML, which require the attention of the data management community.

For example, data preprocessing and feature extraction workloads result in complex pipelines that often require the simultaneous execution of relational and linear algebraic operations. Next, the class of the ML model to use needs to be chosen, for that often a set of popular approaches such as linear models, decision trees and deep neural networks have to be tried out on the problem at hand. The prediction quality of such ML models heavily depends on the choice of features and hyperparameters, which are typically selected in a costly offline evaluation process, that poses huge opportunities for parallelization and optimization. Afterwards, the resulting models must be deployed and integrated into existing business workflows in a way that enables fast and efficient predictions, while still allowing for the lifecycle of models (that become stale over time) to be managed. As a further complication, the resulting systems need to take the target audience of ML applications into account; this audience is very heterogenous, ranging from analysts without programming skills that possibly prefer an easy-to-use cloud-based solution on the one hand, to teams of data processing experts and statisticians developing and deploying custom-tailored algorithms on the other hand.

DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios. The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments.

↑ top

Topics of Interest
Areas of particular interest for the workshop include (but are not limited to):

↑ top

Submission

The workshop will have two tracks for regular research papers and industrial papers. Submissions can be short papers (4 pages) or long papers (up to 10 pages). Authors are requested to prepare submissions following the ACM proceedings format.

Submission Website: https://cmt3.research.microsoft.com/DEEM2017/

↑ top

Organisation / People
Workshop Chairs: Steering Committee: Program Committee:

↑ top

Supported By

Matroid: Bringing Machine Learning to Life

Privacy Policy