An Information-theoretic approach towards Communication-efficient Distributed Machine Learning
Distributed computing systems for large-scale data-sets have gained significant recent interest as they enable the processing of data-intensive tasks for machine learning, model tracing, and data analysis over a large number of commodity machines, and servers (e.g., Apache Spark, and MapReduce). Generally speaking, a master node, which has the entire dataset, sends data blocks to be processed at distributed worker nodes. The workers subsequently respond with locally computed functions to the master node for the desired data analysis. This enables the processing of many terabytes of data over thousands of distributed servers to provide speedup. However, intermediate communication across distributed machines emerges as of the key bottlenecks in achieving ideal speedups. In this talk, I will present recent approaches that have shown how a novel application of codes can be used to reduce the communication footprint of distributed learning algorithms. I will talk about fundamental information-theoretic tradeoffs arising in such problems, discuss recent progress and directions for future work.