Spectrum Management Using Deep Reinforcement Learning in Distributed Dynamic Spectrum Access

Get Complete Project Material File(s) Now! »

Spectrum Management Using Deep Reinforcement Learning in Distributed Dynamic Spectrum Access

Introduction

Q-learning, a type of reinforcement learning, is utilized, the model free nature of which couldenable DSAuserstocarryoutappropriatespectrummanagements,includingspectrum access and power allocations, just by interacting with environments without depending on anytrainingdata[15]. Nonetheless,Q-learningcannothandlelargeexplorationspace. When the number of states and actions becomes large, it is hard for Q-learning to converge [16]. For fast convergence, neural networks (NNs) are utilized to perform Q-learning processes, including approximating the expected cumulative reward and exploring optimal state-action pairs, so-called deep Q-network (DQN), which is a type of deep reinforcement learning [17]. To the best of my knowledge, there is still no work done on using Q-learning and DQN in DSA networks to enable spectrum managements, including both spectrum access and power allocations. The key contributions of this thesis are summarized as follows:
1) A framework of spectrum managements is proposed based on deep Q-network, enabling DSA users to perform proper spectrum managements individually and intelligently without relying on accurate channel estimations and centralized control. In the proposed framework, the current spectrum management strategies, including spectrum access and power allocations, is defined as states, while the adjustment for spectrum managements is defined as actions which is conducted based on the reward obtained through interacting with environments directly.
2) I provide a comprehensive investigation of the proper way to constitute deep Q-network. The potential types of neural networks that are suitable to be applied in distributed DSA networksarediscussed. Through simulation sand comparison ,the optimal selection of neural networks is found,which can bring in excellent performance in terms of both achievable data rate, PU protections and convergence behaviors.

System Model

As shown in Fig. 2.1, a DSA network consisting of multiple DSA users and primary users is considered, which is constructed in the distributed fashion without powerful infrastructures and centralized control support. Without loss of generality, assume that each DSA user is comprised of a transmitter (TX) and a receiver (RX), namely a DSA user pair. Each DSA user shares wireless channels with other DSA users and PUs, and opportunistically accesses wireless channels. For simplification, a reasonable assumption is made that each PU only uses one wireless channel and PUs occupy different channels to avoid making interference to each other. The main notations are presented as follows. N = {n|n = 1,2,···,N}T stands for the set of DSA users. Under the assumption that each PU only occupies one unique channel, let M = {m|m = 1,2,···,M}T be the set of both PUs and wireless channels. Additionally, m-th channel is the dedicated channel of m-th DSA user. Ωn = {m|m = 1,2,···,Mn}T and Φm = {n|n = 1,2,···,Nm}T represent the set of the channels allocated to DSA user n and the set of the users accessing channel m, respectively.
Due to lack of centralized control in DSA networks, DSA users may suffer from the interference from both other DSA users and PUs. As shown in Fig. 2.2, the received signals of DSA user n on channel m is given by ym n = xm n ·hm nn + xm m ·hm mn + ∑ j∈Φm,j̸=n xm j ·hm jn + zm n , (2.1)
where xm n denotesthedesiredsignalssentbyDSAuser n onchannel m. xm m and xm j standfor interference signals caused by DSA user j and PU m, respectively. Accordingly, hm nn, hm mn, and hm jn representthe channelgains ofthe linksfrom thetransmitter toreceiverofDSA user n, from PU m to DSA user n, and from DSA user j to DSA user n, respectively. zm n is the received additive white Gaussian Noise (AWGN).

System Design

Generally, no powerful infrastructure, like base stations (BSs) or control centers, is deployed in DSA networks to provide centralized control, so that DSA users have to carry out their spectrum managements individually. In such a network, a DSA user can only obtain very limited channel state information, namely that of the link between its own transmitter and receiver by channel detection, while the channel state information regarding other DSA users and PUs is unavailable. As a result, it is difficult for DSA users to perform spectrum managements through resource allocation algorithms, which require accurate and sufficient channel state information. Thus, to protect PUs from harmful interference, PUs should at least provide the basic feedback of received interference to DSA users, enabling DSA users to adjusting their transmission parameters properly. However, DSA users and PUs may be operated by different mobile systems ,and only limited information exchange can be achieved. Therefore, two possible interference information feedback methods of PUs are analyzed and corresponding system procedures are designed for viability.

System procedure of interference information feedback

The preliminary condition of effective information exchange between DSA users and PUs is the synchronization in time and frequency domain. In other words, DSA users need to know the frequency-time resource blocks that carry interference feedback information. Therefore, the system procedure of interference information feedback is designed. interference information feedback processes less complicated and easy to realize.
Fig. 2.3 describes the process of interference information feedback. To let DSA users be aware of configurations regarding DSA, PUs should add the corresponding information in their system information (SI) and broadcast it periodically. SI is a proper carrier for DSA configurations, since SI is used to carry common control information that are fundamental and indispensable for all users to conduct wireless transmissions, and generally delivered upon fixed wireless channels [18]. Thus, in my design, when a DSA user attempts to access a frequency band, it needs to receive the SI from the corresponding PUs to read DSA configurations. According to the received DSA configurations, the DSA user carries out data transmissions. Then, PUs measure received interference caused by DSA users and feed the corresponding interference information back to DSA users through the dedicated timefrequency resources indicated in DSA configurations. Based on the interference information feedback, DSA users adjust their DSA parameters to improve their own performance and guarantee PU protections.

Interference information feedback method

There is no doubt that DSA users would be able to make the more appropriate decision on DSA parameter adjustments if they can obtain more precise interference information feedback. However, the accuracy of interference information feedback is dominated by the way that PUs measure interference from DSA users. Here, based on the designed system procedure of interference information feedback, I discuss two possible methods of PUs performing interference measurements and the corresponding interference information that can be attained by DSA users.
1) Method 1: In general cases, a PU is only able to measure the total received interference, which can be realized by sensing blank time-frequency slots embedded in their occupied channels. Then, the PU broadcast the measurement results to DSA users. The method is presented in Fig. 2.4. Obviously, with this method, the overhead of interference information feedback is relatively small, while interference information that DSA users can get is really rare, only the total interference level that PUs are suffering.
2) Method 2: As shown in Fig. 2.5, PUs could identify and detect the interference caused bye achindividualDSAusers, and feed the specific interference level back to each DSA user. Unfortunately, PU can only receive the mixed interference signals of all DSA users sharing the same channels. To distinguish the interference signals from different DSA users, each DSAuserneedstobeconfiguredwithuser-specificpilots,bydetectingwhichPUscanacquire thespecificinterferencecausedbydifferentDSAusers[19]. However,in DSA networks,there is no powerful infrastructure, like BSs, to conduct centralized measurement configurations for DSA users and PUs. Therefore, a low-complexity and efficient user-specific pilot assignment method is proposed, which is described as follows. To avoid pilot contamination, the user-specific pilots of different DSA users should be transmitted on different time-frequency resource blocks. A PU includes the information of unused user-specific pilots and the corresponding time-frequency resource blocks used to send different user-specific pilots in its SIs and broadcast to all DSA users. If a DSA user attempt to access the channels occupied by this PU, it needs to receive and read the PU’s SI first. Then, the DSA user randomly selectsauser-specificpilotandsendsthechosenuser-specificpilotonthecorrespondingtimefrequency resource blocks. The PU needs to keep monitoring the time-frequency resource blocks used for user-specific pilot transmissions. If the PU notices that a user-specific pilot is transmitted on the corresponding time-frequency resource blocks, the PU should remove the user-specific pilot from its SIs, and measure the user-specific pilot to obtain interference information. By this way, interference measurements for each particular DSA user could be achieved without relying on centralized measurement configurations supported by powerful infrastructures.
Although this method is able to provide more precise interference information feedback to DSA users, considerable overhead would also be aroused. Compared to the method 1, more time and frequency resources are consumed to perform user-specific pilot assignments, transmissions and measurements.

Contents
List of Figures
List of Tables
1. Introduction
2. Spectrum Management Using Deep Reinforcement Learning in Distributed Dynamic Spectrum Access
2.1 Introduction
2.2 System Model
2.3 System Design
2.4 Reinforcement learning
2.5 Reinforcement Learning Based Spectrum Management
2.6 Deep Q-network Based Spectrum Management
2.7 Simulation Results and Analysis
Summary
Bibliography