The International Workshop on Distributed Cloud Computing (DCC) is interdisciplinary and touches both distributed systems and algorithms as well as networking and cloud computing. It is intended as a forum where people with different backgrounds can share their expertise. DCC 2020 is co-located with SIGMETRICS 2020. The workshop regularly attracts both researchers and practitioners working on the foundations of the distributed cloud.
The boundaries of application of Machine Learning (ML) and Artificial Intelligence (AI) are continuously extending. The scenario of a single and powerful cluster of machines storing all data and making inferences in batch is now being replaced by more agile solutions that distribute learning and inference to geographically distributed nodes, often resource constrained devices. This will allow ML to be more pervasive, responsive and offers higher privacy guarantees.
The topics include:
Online workshop with async/sync talks 8th June 2020.
We envision 20 min talks that can be recorded and uploaded to our website. Later we will invite the community to submit questions related to talks. Answers to questions will be provided at the website as a transcripted interview.
Albert Bifet is Professor at Telecom Paris and University of Waikato. Previously he worked at Huawei Noah's Ark Lab in Hong Kong, Yahoo Labs in Barcelona, and UPC BarcelonaTech. He is the co-author of a book on Machine Learning from Data Streams published at MIT Press. He is one of the leaders of MOA, scikit-multiflow and Apache SAMOA software environments for implementing algorithms and running experiments for online learning from evolving data streams. He was serving as Co-Chair of the Industrial track of IEEE MDM 2016, ECML PKDD 2015, and as Co-Chair of KDD BigMine (2019-2012), and ACM SAC Data Streams Track (2020-2012).
Talk title: Distributed Analytics for Data Streams
Abstract: Big Data and the Internet of Things (IoT) have the potential to fundamentally shift the way we interact with our surroundings. The challenge of deriving insights from the Internet of Things (IoT) has been recognized as one of the most exciting and key opportunities for both academia and industry. Advanced analytics of big data streams from sensors and devices is bound to become a key area of data mining research as the number of applications requiring such processing increases. Dealing with the evolution over time of such data streams, i.e., with concepts that drift or change completely, is one of the core issues in stream mining. In this talk, I will present an overview of data stream analytics, and I will introduce some popular open source tools for data stream analytics.
Aurélien Bellet is a researcher at Inria. He obtained his Ph.D. from the University of Saint-Etienne (France) in 2012 and, prior to joining INRIA, he was a postdoctoral researcher at the University of Southern California (USA) and at Télécom ParisTech (France). His main line of research is statistical machine learning, with particular interests in developing large-scale algorithms which allow good trade-offs between computational complexity (or other "resources", such as privacy or communication) and statistical performance. His recent work focuses on designing and analyzing decentralized and privacy-preserving machine learning algorithms. His work has been published in top machine learning venues such as ICML, NIPS and AISTATS.
Talk title: Fully Decentralized Joint Learning of Personalized Models and Collaboration Graphs
Abstract: We consider the fully decentralized machine learning scenario where many users with personal datasets collaborate to learn models through local peer-to-peer exchanges, without a central coordinator. We propose to train personalized models that leverage a collaboration graph describing the relationships between the users' personal tasks, which we learn jointly with the models. Our fully decentralized optimization procedure alternates between training nonlinear models given the graph in a greedy boosting manner, and updating the collaboration graph (with controlled sparsity) given the models. Throughout the process, users exchange messages only with a small number of peers (their direct neighbors in the graph and a few random users), ensuring that the procedure naturally scales to large numbers of users. We analyze the convergence rate, memory and communication complexity of our approach, and demonstrate its benefits compared to competing techniques on synthetic and real datasets.
Nguyen H. Tran received the BS degree from Hochiminh City University of Technology and Ph.D degree from Kyung Hee University, in electrical and computer engineering, in 2005 and 2011, respectively. He was an Assistant Professor with Department of Computer Science and Engineering, Kyung Hee University, from 2012 to 2017. Since 2018, he has been with the School of Computer Science, The University of Sydney, where he is currently a Senior Lecturer. His research interests include distributed computing, machine learning, and networking.
Talk title: Distributed and Democratized Machine Learning at the Edge
Abstract: Distributed artificial intelligence (AI) have proliferated in the recent years by the potential of federated learning (FL) in cross-device distributed machine learning applications. FL leverages a collaborative model training approach where distributed learning agents build a common global model without sharing their raw dataset. In practice, FL often deals with large-scale, highly personalized, and unbalanced data. This triggers the challenge of FL in improving the personalized learning performance together with a high level of generalization capability for their agents. This article explores the democratized learning (Dem-AI) paradigm to design a decentralized control and aggregation methodology in regions by leveraging distributed multi-access edge computing (MEC) platforms in the future large-scale distributed learning systems. The use of regional MEC servers to construct regional models, instead at the remote cloud, not only minimizes the overhead and communication loads, but also ensures scalability of future distributed learning applications. Research opportunities and challenges are discussed, highlighting on edge-assisted Dem-AI operations to fill the shortcomings in FL for large-scale distributed learning systems.
Manish Parashar is Distinguished Professor of Computer Science at Rutgers University. He is also the founding Director of the Rutgers Discovery Informatics Institute (RDI2). His research interests are in the broad areas of Parallel and Distributed Computing and Computational and Data-Enabled Science and Engineering. Manish is the founding chair of the IEEE Technical Consortium on High Performance Computing (TCHPC), Editor-in-Chief of the IEEE Transactions on Parallel and Distributed Systems. He is Fellow of AAAS, Fellow of IEEE/IEEE Computer Society and ACM Distinguished Scientist. For more information please visit http://parashar.rutgers.edu/.
Talk title: Harnessing the Computing Continuum for Urgent Science
Abstract: Urgent science describes time-critical, data-driven scientific workflows that can leverage distributed data sources in a timely way to facilitate important decision making. In spite of the exponential growth of available digital data sources and the ubiquity of non-trivial computational power for processing this data, realizing such urgent science workflows remains challenging -- while our capacity for generating data is expanding dramatically, our ability for managing, analyzing, and transforming this data into knowledge in a timely manner has not kept pace. In this talk I will explore how the computing continuum, spanning resources at the edges, in the core and in-between, can be harnessed to support urgent science. Using an Early Earthquake Warning (EEW) workflow, which combines data streams from geo-distributed seismometers and GPS sensors to detect tsunamis, as a driver, I will explore a system stack that can enable the fluid integration of distributed analytics across a dynamic infrastructure spanning the computing continuum and discuss associated research challenges. I will describe recent research in programming abstractions that can express what data should be processed and when and where it should be processed, middleware services that automate the discovery of resources and the orchestration of computations across these resources. I will also discuss open research challenges.