About | Schedule | Important Dates | CfP | Topics | Submission | Accepted Papers | Invited Speakers | People

About

The DEEM workshop will be held on Sunday, June 12th, in conjunction with SIGMOD/PODS 2022. The workshop will be held in hybrid (in-person and virtual) form. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.

The workshop solicits regular research papers (up to 10 pages, plus unlimited references) describing preliminary and ongoing research results, including industrial experience reports of end-to-end ML deployments, related to DEEM topics. In addition, DEEM 2022 establishes a new paper category for reports on applications and tools (4 pages) as a forum for sharing interesting use cases, problems, datasets, benchmarks, visionary ideas, system designs, and descriptions of system components and tools related to end-to-end ML pipelines. Submissions should follow the ACM proceedings format (last update 11/2020).

Follow us on twitter or contact us via email at . We also provide archived websites of previous versions of the workshop: DEEM 2017, DEEM 2018, DEEM 2019, DEEM 2020, and DEEM 2021.

DEEM 2022 Proceedings: ACM DL Link

Schedule

Sunday, June 12th (all times are in EDT)

1:00 - 2:00

Session 1 - Keynote (Chairs: Matthias Boehm, Paroma Varma, Doris Xin)

1:00pm

Welcome

1:05pm

Fairness in Ranking: from values to technical choices and back [Academic Keynote]
Julia Stoyanovich (New York University)

Abstract: Algorithmic rankers take a collection of candidates as input and produce a ranking (permutation) of the candidates as output. In the past few years, there has been much work on incorporating fairness requirements into rankers, with contributions from the data management, algorithms, information retrieval, and recommender systems communities. Many fair ranking methods establish their own understanding of fairness, yet they stop short of categorizing themselves with respect to the normative frameworks they embed. This makes it difficult to understand what values these methods encode, in what specific scenarios they are applicable, and how they relate to one another. Further, methods differ on which technical choices they make, including, notably, their data representation choices. These technical differences are not value-neutral, and have profound implications on the applicability of the methods. In my talk, I will give an overview of the field of fairness in raking, offering a perspective that connects formalizations and algorithmic approaches across subfields. My perspective will be based on the interplay between the value frameworks that motivate specific fairness-enhancing interventions, and the technical choices that impact the properties of the methods and of their results. I will compare and contrast several representative fair ranking methods, and will provide concrete recommendations for those wishing to incorporate fairness objectives into algorithmic rankers. My talk will be based on a two-part survey that I co-authored with Meike Zehlike and Ke Yang. The survey was recently published in ACM Computing Surveys, and is available at these links: Part I and Part II.

2:00 - 3:00

Session 2 - Data Quality (Chair: Doris Xin)

2.00pm

How I stopped worrying about training data bugs and started complaining
Lampros Flokas (Columbia University), Weiyuan Wu (Simon Fraser University), Jiannan Wang (Simon Fraser University), Nakul Verma (Columbia University), Eugene Wu (Columbia University)

2.20pm

GouDa - Generation of universal Data Sets
Valerie Restat (University of Hagen), Gerrit Boerner (University of Hagen), André Conrad (University of Hagen), Uta Störl (University of Hagen)

2.40pm

Towards Data-Centric What-If Analysis for Native Machine Learning Pipelines
Stefan Grafberger (University of Amsterdam), Paul Groth (University of Amsterdam), Sebastian Schelter (University of Amsterdam)

3:00 - 3:30

Coffee Break

3:30 - 4:30

Session 3 - ML Systems (Chair: Matthias Boehm)

3.30pm

Evaluating Model Serving Strategies over Streaming Data
Sonia Horchidan (KTH Stockholm), Emmanouil Kritharakis (Boston University), Vasiliki Kalavri (Boston University), Paris Carbone (KTH Stockholm)

3.50pm

LLVM Code Optimisation for Automatic Differentiation
Maximilian E Schüle (TU Munich), Maximilian Springer (TU Munich), Alfons Kemper (TU Munich), Thomas Neumann (TU Munich)

4.10pm

Accelerating Container-based Deep Learning Hyperparameter Optimization Workloads
Rui Liu (University of Chicago), David Wong (DocuSign), Dave Lange (DocuSign), Patrik Larsson (DocuSign), Vinay Jethava (DocuSign), Qing Zheng (DocuSign)

4:30 - 5:30

Session 4 - ML Pipelines (Chair: Paroma Varma)

4.30pm

Minun: Evaluating Counterfactual Explanations for Entity Matching
Jin Wang (Megagon Labs), Yuliang Li (Megagon Labs)

4.50pm

Learning-to-learn efficiently with self-learning
Shruti Kunde (TCS Research - Mumbai), Sharod Choudhury (TCS Research - Mumbai), Amey Pandit (TCS Research - Mumbai), Rekha Singhal (TCS Research - Mumbai)

5.10pm

dcbench: A Benchmark for Data-Centric AI Systems
Sabri Eyuboglu (Stanford University), Bojan Karlaš (ETH Zurich), Ce Zhang (ETH Zurich), James Zou (Stanford University)

5:30

Closing

↑ top

Important Dates

Submission deadline: March 01 14 (extended), 2022, 5pm Pacific Time
Submission website: https://cmt3.research.microsoft.com/DEEM2022
Notification of acceptance: April 05 15, 2022
Final papers due: May 03 10, 2022
Workshop: Sunday, June 12, 2022

↑ top

Call for Papers

Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the data management community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML.

For example, data preprocessing and feature extraction workloads may be complicated and require simultaneous execution of relational and linear algebraic operations. Next, model selection may involve searching many combinations of model architectures, features, and hyper-parameters to find the best-performing model. After model training, the resulting model may have to be deployed and integrated into business workflows and require lifecycle management using metadata and lineage. As a further complication, the resulting system may have to take into account a heterogeneous audience, ranging from domain experts without programming skills to data engineers and statisticians who develop custom algorithms.

Additionally, the importance of incorporating ethics and legal compliance into machine-assisted decision-making is being broadly recognized. Critical opportunities for improving data quality and representativeness, controlling for bias, and allowing humans to oversee and impact computational processes are missed if we do not consider the lifecycle stages upstream from model training and deployment. DEEM welcomes research on providing system-level support to data scientists who wish to develop and deploy responsible machine learning methods.

DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.

↑ top

Topics of Interest

Areas of particular interest for the workshop include (but are not limited to):

Data Management in Machine Learning Applications
Definition, Execution and Optimization of Complex Machine Learning Pipelines
Systems for Managing the Lifecycle of Machine Learning Models
Systems for Efficient Hyper-parameter Search and Feature Selection
Machine Learning Services in the Cloud
Modeling, Storage, and Provenance of Machine Learning Artifacts
Integration of Machine Learning and Dataflow Systems
Integration of Machine Learning and ETL Processing
Definition and Execution of Complex Ensemble Predictors
Sourcing, Labeling, Integrating, and Cleaning Data for Machine Learning
Data Validation and Model Debugging Techniques
Privacy-preserving Machine Learning
Benchmarking of Machine Learning Applications
Responsible Data Management
Transparency and Accountability of Machine-Assisted Decision Making
Impact of Data Quality and Data Preprocessing on the Fairness of ML Predictions

↑ top

Submission

We invite submissions in following two tracks:

Regular Papers (research and industrial papers; up to 10 pages, plus unlimited references)
Reports on Applications and Tools (interesting use cases, problems, datasets, benchmarks, visionary ideas, system designs, and descriptions of system components and tools; 4 pages)

Authors are requested to prepare submissions following the ACM proceedings format. Please use the latest ACM paper format (last update 11/2020). DEEM is a single-blind workshop, authors must include their names and affiliations on the manuscript cover page.

Submission Website: https://cmt3.research.microsoft.com/DEEM2022
Inclusion and Diversity in Writing: http://2022.sigmod.org/calls_papers_inclusion_and_diversity.shtml

↑ top

Invited Speakers

Academic Keynote: Julia Stoyanovich (New York University)

Julia Stoyanovich is an Institute Associate Professor of Computer Science & Engineering at the Tandon School of Engineering, Associate Professor of Data Science at the Center for Data Science, and Director of the Center for Responsible AI at New York University (NYU). Her research focuses on responsible data management and analysis: on operationalizing fairness, diversity, transparency, and data protection in all stages of the data science lifecycle. She established the "Data, Responsibly" consortium and served on the New York City Automated Decision Systems Task Force, by mayoral appointment. Julia developed and has been teaching courses on Responsible Data Science at NYU, and is a co-creator of an award-winning comic book series on this topic. In addition to data ethics, Julia works on the management and analysis of preference and voting data, and on querying large evolving graphs. She holds M.S. and Ph.D. degrees in Computer Science from Columbia University, and a B.S. in Computer Science and in Mathematics & Statistics from the University of Massachusetts at Amherst. She is a recipient of an NSF CAREER award and a Senior Member of the ACM.

↑ top

Organization / People

Workshop Chairs:

Matthias Boehm
Graz University of Technology, Austria

Paroma Varma
Snorkel AI, USA

Doris Xin
UC Berkeley & Linea, USA

Steering Committee:

Juliana Freire (New York University)
Bill Howe (University of Washington)
H.V. Jagadish (University of Michigan)
Volker Markl (TU Berlin)
Stefan Seufert (Amazon Research)
Markus Weimer (Microsoft AI)

Program Committee:

Khaled Ammar (University of Waterloo, Thomson Reuters Labs)
Abolfazl Asudeh (University of Illinois at Chicago)
Srikanta Bedathur (IIT Delhi)
Renata Borovica-Gajic (University of Melbourne)
Patrick Damme (TU Graz, Know-Center GmbH)
Ahmed Elgohary (Microsoft)
Edward Gan (Stanford University)
Rainer Gemulla (University of Mannheim)
Chris Jermaine (Rice University)
Zoi Kaoudi (TU Berlin)
Sanjay Krishnan (University of Chicago)
Arun Kumar (UC San Diego)
Milos Nikolic (University of Edinburgh)
Tilmann Rabl (HPI, University of Potsdam)
Berthold Reinwald (IBM Research - Almaden)
Maximilian Schleich (University of Washington)
Christin Seifert (University of Duisburg-Essen)
Vraj Shah (UC San Diego)
Nesime Tatbul (Intel Labs and MIT)
Shirish Tatikonda (Target Corporation)
Ce Zhang (ETH Zurich)

↑ top