The DEEM workshop will be held on Sunday, 14th of June in Portland, OR in conjunction with SIGMOD/PODS 2020. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.
The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments. Submissions can be short papers (4 pages) or long papers (up to 10 pages) following the ACM proceedings format. Please use the latest ACM paper format (2017) and change the font size to 10 pts (analogous to SIGMOD).
Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the database community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML, which require the attention of the data management community.
For example, data preprocessing and feature extraction workloads result in complex pipelines that often require the simultaneous execution of relational and linear algebraic operations. Next, the class of the ML model to use needs to be chosen, for that often a set of popular approaches such as linear models, decision trees and deep neural networks have to be tried out on the problem at hand. The prediction quality of such ML models heavily depends on the choice of features and hyperparameters, which are typically selected in a costly offline evaluation process, that poses huge opportunities for parallelization and optimization. Afterwards, the resulting models must be deployed and integrated into existing business workflows in a way that enables fast and efficient predictions, while still allowing for the lifecycle of models (that become stale over time) to be managed. Managing this lifecycle requires careful bookkeeping of metadata and lineage (“which data was used to train this model?”, “which models are affected by changes in this feature?”) and involves methods for continuous analysis, validation, and monitoring of data and models in production. As a further complication, the resulting systems need to take the target audience of ML applications into account; this audience is very heterogeneous, ranging from analysts without programming skills that possibly prefer an easy-to-use cloud-based solution on the one hand, to teams of data processing experts and statisticians developing and deploying custom-tailored algorithms on the other hand.
Additionally, the importance of incorporating ethics and legal compliance into machine-assisted decision-making is being broadly recognized. Critical opportunities for improving data quality and representativeness, controlling for bias, and allowing humans to oversee and impact computational processes are missed if we do not consider the lifecycle stages upstream from model training and deployment. DEEM welcomes research on providing system-level support to data scientists who wish to develop and deploy responsible machine learning methods.
DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios. The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments.
The workshop will have two tracks for regular research papers and industrial papers. Submissions can be short papers (4 pages) or long papers (up to 10 pages). Authors are requested to prepare submissions following the ACM proceedings format. Please use the latest ACM paper format (2017) and change the font size to 10 pts (analogous to SIGMOD). DEEM is a single-blind workshop, authors must include their names and affiliations on the manuscript cover page.
Submission Website: TBD
Bill Howe is an Associate Professor in the Information School and Adjunct Associate Professor in Computer Science & Engineering at University of Washington. He leads an interdisciplinary group in Responsible Data Science with emphasis on urban applications. His group's research aims to make the techniques and technologies of data science dramatically more accessible and reliable, particularly at scale. Their applied methods are rooted in database models and languages, though they sometimes work in machine learning, visualization, HCI, and high-performance computing. His group is an applied, systems-oriented group, frequently sourcing projects through collaborations in the physical, life, and social sciences.
Amit Sabne is a software engineer at Google Brain. He works on high performance compilers for Tensor Processing Units (TPUs), named XLA. Before that, he was a software engineer at Microsoft, working in the Visual C++ compiler team. He researched and designed novel optimization techniques to improve program performance while lowering binary size and compilation time. He earned a PhD in Computer Engineering from the School of Electrical and Computer Engineering, Purdue. His PhD research area was High Performance Computing, with a focus on heterogeneous computing systems. His dissertation proposed and developed efficient programming models for accelerators, and also provided compiler and runtime support for these models and formulated fast autotuning mechanisms for accelerator programs.
Manasi Vartak is the founder and CEO of Verta.AI, which is based on her PhD work at MIT CSAIL on systems for software to streamline the process of data science and machine learning. Previously, she was a PhD student in the Database Group at MIT. She worked on systems for the analysis of large scale data, specifically on making machine learning and visual analysis faster, interactive, and more efficient. She worked and interned at Twitter, Google, Facebook and Microsoft, and is a recipient of the Facebook PhD Fellowship and Google Anita Borg Fellowship.
Matthias Boehm is a BMVIT-endowed professor for data management at Graz University of Technology, Austria, and a research area manager for data management at the colocated Know-Center GmbH, Austria. Prior to joining TU Graz in 2018, he was a research staff member at IBM Research - Almaden, CA, USA, with a major focus on compilation and runtime techniques for declarative, large-scale machine learning in Apache SystemML. Matthias received his Ph.D. from Dresden University of Technology, Germany in 2011 with a dissertation on cost-based optimization of integration flows. His previous research also includes systems support for time series forecasting as well as in-memory indexing and query processing. Matthias is a recipient of the 2016 VLDB Best Paper Award, a 2016 SIGMOD Research Highlight Award, and a 2016 IBM Pat Goldberg Memorial Best Paper Award.