Mehdi Bennis, University of Oulu
Published: 15 Jul 2019
CTN Issue: July 2019
A note from the editor:
Federated Learning has become a hot topic at the intersection of Machine Learning and Wireless Networks. Simply put, ML in the cloud is not the ideal scenario for latency-sensitive wireless networks. The Big Data of wireless is inconveniently located at the users leading to bandwidth, latency and also privacy concerns with a traditional cloud based ML solution. But traditional ML is centralized. So what to do? This month we are delighted to have Mehdi Bennis from the University of Oulu to paint us the simplest picture possible of this mathematically complex topic. Comments always welcome.
Trends and Challenges of Federated Learning in the 5G Network
Figure 1: Vision of edge ML where both ML inference and training processes are pushed down into the network edge (bottom), highlighting two research directions: (1) ML for communication (MLC, from left to right) and (2) communication for ML (CML, from right to left) [PSBD18].
While it is clear that 5G will continue to be the innovation platform for the next decade, 5G connectivity alone is insufficient to reap its full benefits. The proliferation of a new breed of autonomous devices, sensing, communicating and acting within their environments is posing unpreceded challenges in terms of generated data at the network edge. This massive amount of data cannot be shipped back to the cloud for training and inference. To solve this massive scalability challenge while addressing privacy, latency, reliability and bandwidth efficiency, intelligence must be pushed down onto the network edge.
Thus far, progress in machine learning (ML) has been fueled primarily by the availability of data and more computing power, whereby a central node (usually in a data center) has full access to a global dataset and a massive amount of storage and computing power. Currently, many deep learning algorithms reside in the cloud, enabled by popular toolkits such as Caffe and TensorFlow, as well as specialized hardware such as tensor processing units. Nonetheless, this centralized approach is ill suited for applications that require low latency, such as flying a drone, controlling a self-driving car, or sending instructions to a robotic surgeon. To perform these mission-critical tasks, future wireless systems must make more decisions at the network edge, more quickly and more reliably that before – even when network connectivity is lost.
This has sparked a groundswell interest in distributed machine learning, a new paradigm in which training data is stored across a large number of geographically dispersed nodes [KMYS16]. On-device machine learning is essentially about training a high-quality centralized model in a decentralized manner, whereby training data is unevenly distributed and every device has access to a tiny fraction of it. There are clear advantages to doing it this way: Unlike cloud-based artificial intelligence, on-device ML preserves privacy since training data is kept locally on each device. Training is also done locally and updates are aggregated and shared with other nodes over wireless links or via an infrastructure (federating) server.
On the flipside, there are several challenges that must be addressed within federated learning. Among these: 1) a learning model may have a million parameters (e.g., self-driving vehicle), and hence a model update can be bandwidth consuming especially for 1000X devices; 2) Straggler nodes can undermine the training process due to poor computing capabilities or poor path-loss to the federating server; 3) while vanilla FL is about training a global model, adapting to local dynamics and generalizing to other tasks is key; add to that moving nodes, noisy and interfered links; 4) What’s more? although appealing vanilla FL is still a centralized solution with a single point of failure, calling for a fully distributed (serverless) approach. Moreover, since devices have limited resources, on-device machine learning must optimize the model running on the device (tweaking the number of layers, the number of neurons per layer, and other hyperparameters) and power usage, while also considering prediction accuracy and privacy constraints. To overcome the issue of model size, my group has proposed a new technique called federated distillation where devices exchange their model output as opposed to the full model [OKP18]. A helper then aggregates the stored local average logits and calculates the global average logit per label. This is used by every device locally to perform distillation and select the teacher/helper’s logit as a regularizer. In doing so compared to FL, FD yields 26X smaller communication payload size.
Another fundamental challenge for enabling on-device ML pertains to system design. While classical machine learning is centered on maximizing the average reward (or average cost function) for every node, on-device ML is more prone to uncertainty and randomness due to limited access to training data (higher likelihood for overfitting), unreliable links between devices, and the latency that is added when a device offloads a task to the cloud or its peers. This calls for theoretical and algorithmic advances centered on characterizing the full distribution of the reward. This is referred to as distributional machine learning, and it bears a striking resemblance to ultra-reliable and low-latency communication (URLLC), currently a very hot topic in 3GPP Release-16 [BDP18].
Finally, with the advent of control-centric applications (e.g., collaborative robots) [MB1] autonomous control powered by AI will be the next frontier for both ML and wireless. This mandates a joint and end-to-end communication and control codesign while leveraging machine learning to help tame the complexity of solving multi-dimensional non-linear control problems. My research group (ICON) is currently investigating distributed ML over wireless from a theoretical and algorithmic standpoint. Our preliminary results are encouraging but we feel that we have just scratched the tip of the iceberg and more fundamental work is needed, as evidenced by the flurry of recent works [AG19] [S17] [CCSYD17].
Example: federated learning for reliable V2V [SBSD18].
A direct application of federated learning as an URLLC enabler is depicted in the figure below whereby vehicles need to learn the network latency distribution in a distributed and online manner over wireless so as to minimize their transmit power. Here, owing to the URLLC requirements, the vehicles needs to model and learn the tail distribution of the network latency. Instead of fully relying on a centralized unit (road-side-unit), based on its local queue samples every vehicle builds locally a model and sends it to the RSU, which serves as a federating server aggregating the models and sending them back to every vehicle. As shown in the results, the learning solution yields the same reliability as the centralized setting with much less communication bandwidth.
- [PSBD18] J. Park, S. Samarakoon, M. Bennis, M. Debbah, "Wireless Network Intelligence at the edge,’’ IEEE Proceedings, Dec. 2018. Available: http://arxiv.org/abs/1812.0285
- [KMYS16] J. K. and H. Brendan McMahan, F. X. Yu, P. Richtarik, A. T. Suresh, and D. Bacon, “Federated learning: Strategies for improving communication efﬁciency,” in Proc. NIPS Workshop on Private Multi-Party Machine Learning, Dec. 2016.
- [BDP18] M. Bennis, M. Debbah and H. V. Poor, "Ultra-reliable and low-latency communication: Tail, Risk and Scale," in Proceedings of the IEEE, vol. 106, no. 10, pp. 1834-1853, Oct. 2018.
- [SBSD18] S. Samarakoon, M. Bennis, W. Saad, M. Debbah, "Federated Learning for Ultra-Reliable Low-Latency V2V Communication," in Proc. of the IEEE Globecom 2018.
- [JOKP18] E. Jeong, S. Oh, H. Kim, J. Park, M. Bennis, S.-L. Kim, "Communication-Efficient On-Device Machine Learning: Federated Distillation and Augmentation under Non-IID Private Data," NIPS Workshop, Montreal, Canada, 2018.
- [CCSYD17] M. Chen, U. Challita, W. Saad, C. Yin, and M. Debbah, "Machine Learning for Wireless Networks with Artificial Intelligence: A Tutorial on Neural Networks", arXiv:1710.02913, 2017.
- [S17] O. Simeone A brief introduction to machine learning for engineers 2017.
- [AG19] M. M. Amiri D. Gündüz "Machine learning at the wireless edge: Distributed stochastic gradient descent over-the-air" CoRR vol. abs/1901.00844 2019 [online] Available: http://arxiv.org/abs/1901.00844.