About | Schedule | Important Dates | CfP | Topics | Submission | Accepted Papers | Invited Speakers | People

About

The DEEM workshop will be held on Sunday, June 20th, in conjunction with SIGMOD/PODS 2021. The workshop will be held online. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.

The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments. Submissions can be short papers (4 pages) or long papers (up to 10 pages, plus unlimited references) following the ACM proceedings format (last update 11/2020).

Follow us on twitter or contact us via email at . We also provide archived websites of previous versions of the workshop: DEEM 2017, DEEM 2018, DEEM 2019, and DEEM 2020.

DEEM 2021 Proceedings: ACM DL Link

Schedule

Sunday, June 20th (all times are in EDT)

DEEM 2021 is run, similar to 2020, as an online event via Zoom. The meeting link of all DEEM sessions can be accessed through Whova: https://sigmod2021.events.whova.com/ or directly via Zoom Link.

8:00 - 9:00

Session 1 - Keynote 1 (Chair: Steven Whang)

8:00 - 8:10

Welcome

8:10 - 9:00

Connecting ML Applications to Real World Data [Academic Keynote]
Sebastian Schelter (University of Amsterdam)

9:30 - 10:30

Session 2 - ML Systems (Chair: Matthias Boehm)

9:30 - 10:00

Towards Understanding End-to-end Learning in the Context of Data: Machine Learning Dancing over Semirings and Codd's Table [Invited Academic Talk, Paper & Video]
Ce Zhang (ETH Zurich)

10:00 - 10:15

Machine Learning in SQL by Translation to TensorFlow [Paper & Video]
Nantia Makrynioti (CWI, Athens University of Economics and Business); Ruy Ley-Wild (Google, LogicBlox); Vasilis Vassalos (Athens University of Economics and Business)

10:15 - 10:30

Understanding and Optimizing Packed Neural Network Training for Hyper-Parameter Tuning [Paper & Video]
Rui Liu; Sanjan Krishnan; Aaron J. Elmore; Michael J. Franklin (University of Chicago)

10:35 - 11:25

Session 3 - Keynote 2 (Chair: Julia Stoyanovich)

10:35 - 11:25

Explainable Opinion Summarization [Industrial Keynote]
Wang-Chiew Tan (Facebook AI)

Abstract: Subjective data refers to data that contains opinions and experiences. Such data is ubiquitous in product reviews, tweets, and discussion forums on social media. We present an abstractive opinion summarization framework, which does not rely on gold-standard summaries for training. The opinion summarizer extracts opinion phrases from reviews and trains a Transformer model to reconstruct the original reviews from these extractions. Automatic evaluation on Yelp data shows that our summarizer outperforms competitive baselines. Human studies on two corpora verify that our opinion summarizer produces informative summaries and shows promising customization capabilities. We show how the idea of reconstructing summaries from extracted opinions also allows us to generate explanations for the generated summaries. This is joint work with Yoshihiko Suhara, Xiaolan Wang, and Stefanos Angelidis, Zhengjie Miao, and Yuliang Li

11:30 - 12:30

Session 4 - Fairness and Applications (Chair: Julia Stoyanovich)

11:30 - 12:00

Semantic Enrichment of Data for AI Applications [Invited Industry Talk, Paper]
Fatma Ozcan (Google Cloud)

12:00 - 12:15

FairRover: Explorative Model Building for Fair and Responsible Machine Learning [Paper & Video]
Hantian Zhang (Georgia Institute of Technology); Nima Shahbazi (University of Illinois at Chicago); Xu Chu (Georgia Institute of Technology); Abolfazl Asudeh (University of Illinois at Chicago)

12:15 - 12:30

NNCompare: A framework for dataset selection, data augmentation and comparison of different neural networks for medical image analysis [Paper & Video]
Lena Wiese (Fraunhofer Institute for Toxicology and Experimental Medicine, Goethe University Frankfurt); Deborah Höltje (Fraunhofer Institute for Toxicology and Experimental Medicine)

12:30 - 12:30

Closing

↑ top

Important Dates

Submission deadline: March 15, 2021, 5pm Pacific Time
Submission website: https://cmt3.research.microsoft.com/DEEM2021
Notification of acceptance: April 19, 2021
Final papers due: ~~May 24, 2021~~ May 31, 2021
Workshop: Sunday, June 20, 2021

↑ top

Call for Papers

Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the database community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML, which require the attention of the data management community.

For example, data preprocessing and feature extraction workloads may be complicated and require simultaneous execution of relational and linear algebraic operations. Next, model selection may involve searching many combinations of model architectures, features, and hyperparameters to find the best-performing model. After model training, the resulting model may have to be deployed and integrated into business workflows and require lifecycle management using metadata and lineage. As a further complication, the resulting system may have to take into account a heterogeneous audience, ranging from domain experts without programming skills to data engineers and statisticians who develop custom algorithms.

Additionally, the importance of incorporating ethics and legal compliance into machine-assisted decision-making is being broadly recognized. Critical opportunities for improving data quality and representativeness, controlling for bias, and allowing humans to oversee and impact computational processes are missed if we do not consider the lifecycle stages upstream from model training and deployment. DEEM welcomes research on providing system-level support to data scientists who wish to develop and deploy responsible machine learning methods.

DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios. The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments.

↑ top

Topics of Interest

Areas of particular interest for the workshop include (but are not limited to):

Data Management in Machine Learning Applications
Definition, Execution and Optimization of Complex Machine Learning Pipelines
Systems for Managing the Lifecycle of Machine Learning Models
Systems for Efficient Hyperparameter Search and Feature Selection
Machine Learning Services in the Cloud
Modeling, Storage and Provenance of Machine Learning Artifacts
Integration of Machine Learning and Dataflow Systems
Integration of Machine Learning and ETL Processing
Definition and Execution of Complex Ensemble Predictors
Sourcing, Labeling, Integrating, and Cleaning Data for Machine Learning
Data Validation and Model Debugging Techniques
Privacy-preserving Machine Learning
Benchmarking of Machine Learning Applications
Responsible Data Management
Transparency and Accountability of Machine-Assisted Decision Making
Impact of Data Quality and Data Preprocessing on the Fairness of ML Predictions

↑ top

Submission

Submissions can be short papers (4 pages) or long papers (up to 10 pages, plus unlimited references). Authors are requested to prepare submissions following the ACM proceedings format. Please use the latest ACM paper format (last update 11/2020). DEEM is a single-blind workshop, authors must include their names and affiliations on the manuscript cover page.

Submission Website: https://cmt3.research.microsoft.com/DEEM2021
Inclusion and Diversity in Writing: http://2021.sigmod.org/calls_papers_inclusion_and_diversity.shtml

↑ top

Organisation / People

Workshop Chairs:

Matthias Boehm
Graz University of Technology

Julia Stoyanovich
New York University

Steven Whang
Korea Advanced Institute of Science and Technology

Steering Committee:

Juliana Freire (New York University)
Bill Howe (University of Washington)
H.V. Jagadish (University of Michigan)
Volker Markl (TU Berlin)
Sebastian Schelter (University of Amsterdam)
Stefan Seufert (Amazon Research)
Markus Weimer (Microsoft AI)

Program Committee:

Alekh Jindal (Microsoft)
Alex Ratner (Stanford)
Andrey Gubichev (Google)
Arash Termehchy (Oregon State University)
Arun Kumar (University of California, San Diego)
Bolin Ding (Alibaba Group)
Carsten Binnig (TU Darmstadt)
Doris Xin (UC Berkeley)
Georgia Koutrika (Athena Research Center)
Guoliang Li (Tsinghua University)
Jae-Gil Lee (KAIST)
Ke Yang (New York University)
Maya Ramanath (IIT Delhi)
Meihui Zhang (Beijing Institute of Technology)
Neoklis Polyzotis (Google)
Nesime Tatbul (Intel Labs and MIT)
Rainer Gemulla (Universität Mannheim)
Rajesh Bordawekar (IBM T. J. Watson Research Center)
Srikanta Bedathur (IIT Delhi)
Sudip Roy (Google)
Uwe Roehm (The University of Sydney)

↑ top