The DEEM workshop will be held on Friday, 15th of June in Houston, TX in conjunction with SIGMOD/PODS 2018. DEEM brings together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios.
The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments. Submissions can be short papers (4 pages) or long papers (up to 10 pages) following the ACM proceedings format.
Jens Dittrich is a Full Professor of Computer Science in the area of Databases, Data Management, and Big Data at Saarland University, Germany. Previous affiliations include U Marburg, SAP AG, and ETH Zurich. He received an Outrageous Ideas and Vision Paper Award at CIDR 2011, a BMBF VIP Grant in 2011, a best paper award at VLDB 2014, two CS teaching awards in 2011 and 2013, as well as several presentation awards including a qualification for the interdisciplinary German science slam finals in 2012 and three presentation awards at CIDR (2011, 2013, and 2015). He has been a PC member and area chair/group leader of prestigious international database conferences and journals such as PVLDB/VLDB, SIGMOD, ICDE, and VLDB Journal. He is on the scientific advisory board of Software AG. He was a keynote speaker at VLDB 2017: “Deep Learning (m)eats Databases“. At Saarland University he co-organizes the Data Science Summer School. His research focuses on fast access to big data including in particular: data analytics on large datasets, scalability, main-memory databases, database indexing, reproducability, and deep learning. He enjoys coding data science problems in Python, in particular using the keras and tensorflow libraries for Deep Learning. Since 2016 he has been working on a start-up at the intersection of deep learning and databases.
Martin Zinkevich is a Research Scientist at Google. He received his Ph.D. from Carnegie Mellon University and has been conducting research at Brown University, University of Alberta and the Machine Learning Group at Yahoo Research. His works have been published in numerous conference such as NIPS, ICML, KDD, WWW, CIKM, AAAI, COLT as well as the Journal of the ACM and the Journal of Machine Learning Research. Additionally, Martin contributes to the discussion on data management and engineering aspects of ML with his online book on Rules of Machine Learning: Best Practices for ML Engineering and a tutorial on Data Management Challenges in Production Machine Learning at SIGMOD 2017.
Joaquin Van Schoren is an assistant professor of Machine Learning at the Eindhoven University of Technology. His research focuses on the automation of machine learning and networked science. He founded OpenML.org, a collaborative machine learning platform where scientists can automatically log and share data, code, and experiments, and which automatically learns from all this data to help people perform machine learning better and easier. His other passion is large-scale data analysis on all types of data (social, streams, geo-spatial, sensors, networks, text).
Applying Machine Learning (ML) in real-world scenarios is a challenging task. In recent years, the main focus of the database community has been on creating systems and abstractions for the efficient training of ML models on large datasets. However, model training is only one of many steps in an end-to-end ML application, and a number of orthogonal data management problems arise from the large-scale use of ML, which require the attention of the data management community.
For example, data preprocessing and feature extraction workloads result in complex pipelines that often require the simultaneous execution of relational and linear algebraic operations. Next, the class of the ML model to use needs to be chosen, for that often a set of popular approaches such as linear models, decision trees and deep neural networks have to be tried out on the problem at hand. The prediction quality of such ML models heavily depends on the choice of features and hyperparameters, which are typically selected in a costly offline evaluation process, that poses huge opportunities for parallelization and optimization. Afterwards, the resulting models must be deployed and integrated into existing business workflows in a way that enables fast and efficient predictions, while still allowing for the lifecycle of models (that become stale over time) to be managed. As a further complication, the resulting systems need to take the target audience of ML applications into account; this audience is very heterogenous, ranging from analysts without programming skills that possibly prefer an easy-to-use cloud-based solution on the one hand, to teams of data processing experts and statisticians developing and deploying custom-tailored algorithms on the other hand.
DEEM aims to bring together researchers and practitioners at the intersection of applied machine learning, data management and systems research, with the goal to discuss the arising data management issues in ML application scenarios. The workshop solicits regular research papers describing preliminary and ongoing research results. In addition, the workshop encourages the submission of industrial experience reports of end-to-end ML deployments.