Urbashi Mitra, Dhruva Kartik and Libin Liu, University of Southern California
Published: 13 Nov 2019
While neural networks and machine learning have made strong gains in the performance of algorithms solving some challenging perceptual problems such as speech recognition or image classification, these methods are still in their nascence when it comes to solving wireless communications problems. Herein, two case studies of the use of neural networks in the design of dynamic wireless systems are considered: optimal policy design for transmission protocols and experiment selection for data-driven active decision making systems. In both cases, neural network architectures alone provided inferior performance. However, when coupled with important structural information such as the form of the optimal policy or the use of new metrics informed by the computationally expensive optimal solution, performance gains were achievable. Key challenges of neural network and machine learning methods are sensitivity to hyperparameter tuning, limited design rules for architectures and often significant training times. To this end, the use of structural side information and optimal-solution specific metrics can mitigate these negative features.
Neural networks and machine learning have made a big impact on the performance of many perceptual systems such as natural language processing and image and video classification. There has been a flurry of activity to apply machine learning and neural network methods to wireless communications: unknown channel models [1,2], systems as auto-encoders , sequential code design , autoencoders for wireless siting , deep reinforcement learning for load balancing and learning [18,19]. While modern machine learning strategies for image and speech processing tasks have exploited domain knowledge (speech patterns, geometric invariances, etc.), the classical approach to wireless communication systems has long benefited from the production of some well-accepted models such as Rayleigh fading and multipath models for certain wireless channels or an understanding of the operational behavior of components of wireless systems (amplifiers, oscillators, etc.). The above works have, to some extent, considered such side information, although it is not the predominant approach. Herein we provide two case-studies outlining some of the advantages and disadvantages of the use of neural network and machine learning methods for determining optimal control structures for applications that admit a Markov Decision Process model. Modern wireless communication networks encompassing the Internet-of-Things and cyber-physical systems (CPS) operate in inherently dynamic environments. Common to these network frameworks is the need for environmental and mission semantics, perception/sensing, actuation/control, and communication/information exchange, all done in real-time or allowing for adaptation. We had initially studied neural network architectures for multi-user detection receiver design for code-division multiple-access in the 1990s [20,21].
To provide a more universal framework, we put forth the following graphical model which will allow the interconnection of environmental and physical conditions, actuation, communication as well as sensing. We have employed a state-based model which can capture wireless channels, mobility models, actuation by mobile nodes such as unmanned aerial vehicles or drones, as well as control schemes for wireless networks (see Fig. 1). Our algorithms optimize different metrics of interest (e.g., overall throughput of the communications system, power for actuation, memory, etc.), with an eye to the optimization of the implementation as well. Many selected applications can be described by combining multiple Markov chains describing the behavior of key subsystems resulting in Markov Decision process (MDPs) or partially observable Markov Decision process (POMDPs) models. POMDPs are usually solved by transforming them into a Markov decision process (MDP) with the posterior belief as the state. Solving these belief state problems is notoriously hard because of the curse of dimensionality and the curse of history. Machine learning methods are poised to cope with these challenges. As an example, consider a particular logical graph describing the behavior of a wireless network (Fig. 3): the interaction of Automatic Re-transmission reQuest (ARQ) for packet re-transmission yields a finite state machine model as does the buffering of packets at a terminal . The result is an interdependence between the two sub-systems.
The two specific problem frameworks that we will describe are: (1) machine learning methods for optimal wireless network control with the novel use of graph signal processing methods to enable complexity reduction; and (2) the design and analysis of neural network architectures and learning methods for active hypothesis testing. A key connector between the two problem applications is Markov Decision Process modeling.
2.1 Optimal Control of Wireless Networks
We have taken a novel approach to the analysis and design of wireless networks. In particular, we have applied new methods in Graph Signal Processing (GSP)  to these problems. GSP extends classical Fourier analysis for time domain signals to the graph domain. In particular, one can determine graph Fourier bases to represent signals defined on a graph. As an example, one can associate a performance metric (throughput, probability of error) to each state of the previously described logical graphs. GSP is commonly applied to undirected graphs as found in images or sensor networks. This collection of performance metrics is denoted the value function. In such applications, the graph Laplacian is symmetric, and the eigen-decomposition of the Laplacian yields the needed graph Fourier basis: the eigenvalue represents the graph frequency and the eigenvector the associated basis function. For directed graphs, one can employ Jordan forms on the asymmetric Laplacian ; however, such schemes have significant issues with respect to computational complexity. In , we were able to use symmetrization methods to apply diffusion wavelets (a graph basis) to learn value functions (performance functions for each state) combining graph signal processing ideas with sparse approximation. Dimension reduction was possible, and furthermore, value functions were learned an order of magnitude faster than classical Q-learning strategies.
While the graph signal, or value function, can be effectively learned via GSP methods, designing an optimal control policy to steer network behavior toward “good” system states is a much more formidable problem. As noted previously, applying GSP to directed graphs has practical limitations. To this end, we have examined strategies that (a) capture key features of directed wireless graphs and (b) exploit properties of optimal strategies . We observe that logical chains are inherent to decision making in this scenario, and in the adaptive learning and updating of a control scheme, the internal workings of the decision making are also important. However, as we have seen with our prior work in [5,8], exploiting structural information is key to overcoming these barriers. We wish to distinguish between structural information such as convexity, monotonicity, smoothness, and optimality properties versus the hand-designing of features tailored to a single application.
A shortcoming of our prior approach  was that it required full knowledge of the graph Laplacian or adjacency matrix. To this end, we have very promising results  based on a very different viewpoint of the problem, coupled with intelligent use of reinforcement learning (RL). While RL has been applied to the design of media access control (MAC) protocols , power control and rate adaptation , and resource allocation , this prior work ignored the attendant sample complexity of RL algorithms which does not scale well with network size. Another challenge with Q-learning in RL is that if dimension reduction is considered, it is typically via hand-designed feature vectors , an activity not well-matched to the Markov chains associated with large wireless networks. Thus, we have tailored RL to learn system features, constrained by structural properties, when the graph adjacency matrix is unknown, coupled with GSP for dimension reduction. In , the policy signal itself taken as a graph signal (versus the value or cost function on the network graph) defined on an image graph. We have developed vertex sampling and reconstruction algorithms for RL in this context. In this case, the goal was to design a transmission policy (transmit or idle) as a function of the buffer state and channel state for a wireless user. Figure 2 shows that the proposed methods achieve negligible performance loss with respect to classical Q-learning which offers the superior performance, but with a run-time that is five times (for a network size 2,000) that of our proposed methods. Thus, classical Q-learning is insufficient.
It is natural to ask how well a neural network structure would have performed in solving our optimal control policy problem. Given the nature of our problem framework, we implemented a Deep Q Network  for our network graph model with 900 states (30 buffer states and 30 channel states); the neural network structure was a classical feed-forward, fully connected neural network with two input nodes (buffer and channel); four hidden layers with each layer containing 100 neurons; and a final output layer with two nodes, where the value of each node represents the Q values for each action associated with the particular input state. The performance of the DQN is also provided in Fig. 2. The proposed strategy has a much smaller averaged policy error (2.8% versus 10%) and a much smaller error variance (7 x 10-5 and 1.5 x10-3). This inferior performance comes at the price of having unclear design strategies for the architecture of the neural network (how many layers, how many nodes), significant training time for each architecture, and extensive (and potentially not fully optimized) hyperparameter tuning.
2.2 Neural Networks for Active Hypothesis Testing
Many application scenarios necessitate selecting between different choices (hypotheses) in a time-varying manner. If one can collect observations while making these choices, one can potentially choose between data sources (or experiments), we can adaptively select more informative experiments to infer the true hypothesis. This leads to a joint control and inference problem commonly referred to as active hypothesis testing. Active hypothesis testing problems are present in a variety of problems within wireless communication networks such as active beamforming, optimal location selection for operating a UAV as a mobile relay, etc. We refer to the transformation that maps data acquired in the past to a sensing action as the sensing strategy, the strategies are often designed to optimize performance with other constraints such as minimizing the average time to decision which often correlates with the energy expended to collect measurements. Herein, we distinguish between two scenarios: (i) a known observation model where the distribution of the observation under a sensing action/modality is known; and the (ii) unknown observation model case, where such distributions must be learned.
When the observation model is fully known and the only unknown is the underlying hypothesis (such as the user’s location), the problem of designing sequential sensing strategies can be formulated as an MDP. The state of this MDP is the posterior belief on the set of all possible hypotheses. We have proposed a useful metric, the expected confidence rate [15,16], which effectively quantifies the informativeness of a sensor and yields a high-performance sequential sensing strategy. The objective of our MDP formulation is to maximize the expected confidence rate. It is difficult to solve this MDP because the state (posterior belief) space of this MDP is infinite. Therefore, we used an approximate neural network based dynamic programming approach to solve this MDP in . This approximation approach is inspired by the recent developments in stabilizing the Q-learning algorithm for reinforcement learning . Additional structural properties which stem from characterizing the rate at which the belief converges to unity for the correct hypothesis suggest of a normalization for the neural network implementation that further dramatically improved the stability of the DQN.
We next address the unknown observation model. Such active hypothesis testing problems can be modeled as POMDPs with unknown observation kernels. One might expect that sequential problems of this kind are well-matched to recurrent neural networks (RNNs) since they are often capable of learning implicit notions of a state and its update rule. However, direct applicant of RNNs is not straightforward in our scenario. We implemented an RNN to learn the optimal sensing strategy in the unknown observation case; however, the architecture was not end-to-end differentiable, and this led to stability issues. Figure 6 depicts the RNN structure studied in . The observation y2 is generated by performing the sensing action u2. This generation involves a non-differentiable black-box operation. The dotted red line indicates this step, and this is the reason our model is not end-to-end differentiable.
In this article, we presented two case studies of the use of neural networks in the design of dynamic wireless systems: optimal policy design for transmission protocols and experiment selection for data-driven active decision making systems. In both cases, neural network architectures alone provided inferior performance. However, when coupled with important structural information such as the form of the optimal policy or the use of new metrics informed by the computationally expensive optimal solution, performance gains were achievable. Neural network solutions can play an important role in terms of bench-marking other strategies; however, they are still sensitive to hyperparameter tuning, limited design rules for architectures and often significant training times. To this end, the use of structural side information such as monotonicity, unimodality, thresholded structures, and known subspaces has the potential to reduce the impact of such negative features.
 N. Farsad and A. Goldsmith, “Neural network detection of data sequences in communication systems,” IEEE Trans. Signal Process., 66(21), pp. 5663–5678, 2018.
 C. Morin et al., “Transmitter classification with supervised deep learning,” arXiv preprint arXiv:1905.07923, 2019.
 T. O’Shea and J. Hoydis, “An introduction to deep learning for the physical layer,” IEEE Trans. Cognitive Commun. and Networking, vol. 3, no. 4, pp. 563–575, 2017.
 H. Kim et al., “Deepcode: Feedback codes via deep learning,” in Advances in Neural Information Processing Systems, pp. 9436–9446, 2018.
 M. Levorato, U. Mitra, and M. Zorzi, “Cognitive interference management in retransmission based wireless networks,” IEEE Trans. Inf. Theory, vol. 58, no. 5, pp. 3023-3046, 2012.
 A. Ortega et al., "Graph signal processing: overview, challenges, and applications," in Proc. IEEE, 106.5, pp. 808-828, 2018.
 A. Sandryhaila and J. M. F. Moura, “Discrete signal processing on graphs,” IEEE Trans.Signal Process., vol. 61, no. 7, pp. 1644-1656, 2013.
 L. Liu, A. Chattopadhyay, and U. Mitra, "On solving MDPs with large state space: exploitation of policy structures and spectral properties," IEEE Trans. Commun, 2019.
 L. Liu and U. Mitra, “Policy sampling and interpolation for wireless networks: a graph signal processing approach,” in IEEE International Global Communications Conference (GLOBECOM), IEEE, 2019, accepted.
 Y. Yu, T. Wang, and S. C. Liew, “Deep reinforcement learning multiple access for heterogeneous wireless networks,” in 2018 IEEE International Conference on Communications (ICC), IEEE, 2018.
 E. Ghadimi et al., “A reinforcement learning approach to power control and rate adaptation in cellular networks, in 2017 IEEE International Conference on Communications (ICC), IEEE, 2017.
 H. Ye and G. Y. Li, “Deep reinforcement learning for resource allocation in V2V communications, in 2018 IEEE International Conference on Communications (ICC), IEEE, 2018.
 R. Sutton and A. Barto, Reinforcement Learning: An Introduction. The MIT Press, 2017.
 V. Mnih et al., "Human-level control through deep reinforcement learning," Nature 518.7540, 2015, 529.
 D. Kartik et al., "Policy design for active sequential hypothesis testing using deep learning," 2018 56th Annual Allerton Conference on Communication, Control, and Computing (Allerton), IEEE, 2018.
 D. Kartik, A. Nayyar, and U. Mitra. "Sequential experiment design for hypothesis verification." 2018 52nd Asilomar Conference on Signals, Systems, and Computers, IEEE, 2018.
 P. Huang et al., “Improving channel charting with representation-constrained autoencoders,” in IEEE 20th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), IEEE, 2019.
 H. Ye and G. Y. Li, “Deep reinforcement learning for resource allocation in V2V communications,” in 2018 IEEE International Conference on Communications (ICC), IEEE, 2018.
 D. A. Awan, R. L. G. Cavalcante, and S. Stanczak, “A robust machine learning method for cell-load approximation in wireless networks,” in IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), IEEE, 2018.
 U. Mitra and H. V. Poor, “Neural network techniques for adaptive multiuser demodulation,” IEEE J. Sel. Areas Commun., vol. 12, no. 9, pp. 1460–1470, 1994.
 U. Mitra and H. V. Poor, “Adaptive receiver algorithms for near-far resistant CDMA,” IEEE Trans. Commun., vol. 43, no. 2/3/4, pp. 1713–1724, 1995.