The DEEM workshop will be held on the 14th of May in Chicago, US in conjunction with SIGMOD/PODS 2017. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.
The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments. .
Vishal Chowdhary, Scott Greenwood (Microsoft Research)
EMT: End To End Model Training for MSR Machine Translation
Jose Picado, Arash Termehchy, Alan Fern, Sudhanshu Pathak (Oregon State University)
Towards Automatically Setting Language Bias in Relational Learning
Tom van der Weide, Dimitris Papadopoulos, Oleg Smirnov, Michal Zielinski, Tim van Kasteren (Schibsted Media Group)
Versioning for end-to-end machine learning pipelines
Lingjiao Chen, Paraschos Koutris (University of Wisconsin-Madison), Arun Kumar (University of California San Diego)
Model-based Pricing: Do Not Pay for More than What You Learn!
Rajesh Bordawekar, (IBM Research), Oded Shmueli (Technion Haifa)
Using Word Embedding to Enable Semantic Queries in Relational Databases
Hui Miao, Ang Li, Larry Davis, Amol Deshpande (University of Maryland)
On Model Discovery For Hosted Data Science Projects
Arun Kumar is an Assistant Professor in the Department of Computer Science and Engineering at the University of California, San Diego. He obtained his PhD from the University of Wisconsin-Madison in 2016. His primary research interests are in data management, especially the intersection of data management and machine learning, with a focus on problems related to usability, developability, performance, and scalability. Systems and ideas based on his research have been released as part of the MADlib open-source library, shipped as part of products from EMC, Oracle, Cloudera, and IBM, and used internally by Facebook, LogicBlox, and Microsoft. A paper he co-authored was accorded the Best Paper Award at ACM SIGMOD 2014. He was awarded the 2016 Graduate Student Research Award for the best dissertation research in UW-Madison CS and the Anthony C. Klug NCR Fellowship in Database Systems in 2015.
Quannan Li is a staff software engineer at Twitter, building the realtime recommendation engine MagicRecs. He obtained his Ph.D. on Computer Vision and Machine Learning at UC Los Angeles in 2013 under the supervision of Professor Zhuowen Tu.
Xin Luna Dong is a principal scientist at Amazon since July 2016, leading the efforts to build the Amazon Product Graph. Prior to joining Amazon, she worked for Google and AT&T Labs - Research. She received her Ph.D. in Computer Science and Engineering at University of Washington. Before coming to the United States, she obtained a M.S. in Computer Science at Peking University, and a B.S. in Computer Science at Nankai University in China. Her research interests include data integration, data cleaning, and knowledge management. She recently won the VLDB Early Career Research Contribution Award for "advancing the state of the art of knowledge fusion".
Stephen Bach is a postdoctoral scholar in the Stanford computer science department. His research focuses on weakly supervised machine learning, statistical relational learning, and information extraction. His goal is to design algorithms and systems that empower people to use machine learning with minimal intervention from computer scientists. He co-leads the development of the Snorkel framework for training data creation using generative models. He was recognized with the Larry S. Davis Doctoral Dissertation Award from the University of Maryland, College Park department of computer science.
Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the database community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML, which require the attention of the data management community.
For example, data preprocessing and feature extraction workloads result in complex pipelines that often require the simultaneous execution of relational and linear algebraic operations. Next, the class of the ML model to use needs to be chosen, for that often a set of popular approaches such as linear models, decision trees and deep neural networks have to be tried out on the problem at hand. The prediction quality of such ML models heavily depends on the choice of features and hyperparameters, which are typically selected in a costly offline evaluation process, that poses huge opportunities for parallelization and optimization. Afterwards, the resulting models must be deployed and integrated into existing business workflows in a way that enables fast and efficient predictions, while still allowing for the lifecycle of models (that become stale over time) to be managed. As a further complication, the resulting systems need to take the target audience of ML applications into account; this audience is very heterogenous, ranging from analysts without programming skills that possibly prefer an easy-to-use cloud-based solution on the one hand, to teams of data processing experts and statisticians developing and deploying custom-tailored algorithms on the other hand.
DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios. The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments.
The workshop will have two tracks for regular research papers and industrial papers. Submissions can be short papers (4 pages) or long papers (up to 10 pages). Authors are requested to prepare submissions following the ACM proceedings format.
Submission Website: https://cmt3.research.microsoft.com/DEEM2017/