I Introduction
Usercentric communication has drawn significant attention lately. To ensure a better quality of service (QoS) and appease the datahungry users, more networking components are being shifted towards the network edges day by day. Vehicular networking, on the other hand, has also been evolving from the rudimentary phase to an intelligent transportation system (ITS) to guarantee public safety, lessen congestion, reduce travel time, and better QoS of the vehicle users (VUs). An advanced ITS can undoubtedly save countless lives by assuring ubiquitous connectivity and wellmeasured timely road hazard alerts, thus, increasing the quality of experience of the VUs. Motivated by this, several governing bodies, such as  the United States Department of Transportation in the USA [USDOT, USDOT2], are heavily investigating more of vehicletoeverything (V2X) communication.
Novel technologies, such as  dedicated shortrange communications (DSRC), cellularV2X (CV2X), etc., are being deemed to be coupled together [ghafoor2019enabling] to induce an intelligent solution to this sector. Note that while DSRC is an earlier technology  depends on IEEE 802.11, CV2X is developed by the 3rd generation partnership project (3GPP) and first introduced in its release 14 for basic safety messages in vehicular communication [zhou2020evolutionary]. The later releases of 3GPP are focusing on a more evolved system design with an advanced safety measure in addition to higher throughput, reliability, and much lower latency. VUs move rapidly in the highway causing frequent handover for V2I communication. In a very short period, the received signal strength at the downlink VUs can deteriorate severely for traditional networkcentric communication infrastructure. Therefore, V2I communication is notably problematic for connected transportation. A potential solution to these problems should ensure universal connectivity, reliability, higher throughput, and lowlatency. In addition to that, energyefficiency (EE) should also be considered as green communication has shown its emergency rigorously lately [jain2019energy, huang2020energy].
In the literature, there exist several works addressing diverse aspects of vehicular networks from traditional networkcentric prospects. A downlink multicasting scenario, for closeproximity vehicles, was acknowledged in [sahin2018virtual], where a group of vehicles as hot spots were used to simultaneously transmit multiple data streams. A vehicletovehicle (V2V) radio resource management (RRM) method was proposed by Ye et. al., in [ye2019deep]. The authors used deep reinforcement learning (DRL) to scrutinize how uplink frequency resources can effectively be utilized for V2V communication. A similar approach was also considered by Liang et. al., in [8792382], for V2V applications using multiagent RL (MARL). Ding et. al. considered RRM for vehicular networks in [8756652]. Although the authors considered virtual cell member association and RRM in their problem formulation, they did not consider user mobility. Note that we are considering highly mobile VUs for connected transportation, where each of the parameters needs to be chosen optimally in each transmission time interval (TTI). Therefore, our work is fundamentally different than a single snapshotbased static usercentric approach in [8756652].
Gao et. al. proposed a joint admission control and resource management scheme for both static and vehicular users in [gao2019joint]. Using the Lyapunov optimization technique, the authors showed a way to increase network throughput from the traditional networkcentric approach. Guleng et. al. considered a learning based solution for V2V communication in [guleng2020edge]. They used a twostaged learning process for minimizing overall latency and maximizing the network throughput. In our previous work [Pervej_WSR], we considered a throughputoptimal vehicular edge network for highway transportation, in which we achieved a maximum weighted sum rate using RL. While the studies in [ye2019deep, 8792382, 8756652, gao2019joint, guleng2020edge, Pervej_WSR] aimed at optimizing the network throughput, they did not address energy consumption issues in vehicular edge networks for smart and connected transportation.
Different from the existing studies, this paper focuses on uncovering an energyefficient solution for usercentric, reliable vehicular edge networks in connected transportation. Particularly, we express a joint virtual cell formation and power allocation problem for highly mobile VUs in a sophisticated SD environment. In a freeway road environment, we delicately deploy various edge servers to obtain the users’ demands by serving a VU from multiple lowpowered APs, as presented in Fig. 1
. Although such a distribution enhances endtoend latency and increases the reliability of the network, system complexities also increase. Furthermore, as multiple APs are serving each of the VUs, it is essential to optimally form virtual cells for the users and allocate the transmission powers of the APs. While our joint formulation addresses these, it is a hard combinatorial optimization problem. Therefore, we use a modelfree decentralized MARL (DMARL) solution that can effectively formulate the virtual cell and slice the resources.
To the best of our knowledge, this is the first work to consider a reliable energyefficient usercentric softwaredefined vehicular edge network for connected transportation. The rest of the paper is organized as follows: our softwaredefined network and problem formulation are presented in Section II. An efficient RL solution for resource slicing is presented in III. Section IV presents the results and findings. Finally, Section V concludes the paper.
Ii SoftwareDefined Vehicular Edge Networks
We present our softwaredefined vehicular edge network model, followed by the problem formulation, in this section.
Iia SoftwareDefined System Model
Following the freeway case of 3GPP [3gpp36_885], a threelane oneway road structure is considered as the region of interest (ROI). In this paper, we are interested establishing a communication framework for vehicular edge networks. However, our modeling can readily be extended to a more practical environment. Highly mobile autonomous VUs, denoted by , where , move on the road. Besides, several lowpowered APs, denoted by , where , are also deployed along the roadsides in order to maintain ubiquitous connectivity. In addition to that, various edge servers  controlled by its anchor node (AN), denoted by are deployed at a fixed known geographic position. The APs are physically meshconnected to each of these edge servers. Furthermore, the edge servers are connected to a centralized cloud server and have limited radio resources, denoted by hertz. We consider an openloop communication where the ANs have perfect channel state information (CSI). Moreover, our softwaredefined system model is based on [lin2018e2e], where the beamforming weights can be formed and scheduled by the ANs based on the users’ requirements.
Creating virtual cells for each of the scheduled users, we aim to guarantee reliability of the network. In each virtual cell, multiple APs are activated to serve the VU as shown by the dotted ellipses in Fig. 1. Throughout this paper, a user and an AP is denoted by and , respectively. Furthermore, the VUAP associations are denoted by the following two indicator functions:
(1) 
(2) 
Therefore, denotes the set of APs that VU is connected to and is the virtual cell for the VU .
IiB SDV2I Communication Model
In this paper, we consider a multipleinputsingleoutput communication model where the VUs are equipped with a single antenna and the APs are equipped with a number of antennas^{1}^{1}1Note that while we consider omnidirectional antennas in this paper, the proposed framework can be easily extended with directional antennas and beam patterns to further improve SINR at vehicle receivers.. The wireless channel is considered to be quasistatic flat fading during a basic time block. The channel between VU and the APs are denoted by , where , , and are the channel response at a VU from the AP , large scale fading,
Normal shadowing and fast fading channel vectors, respectively. Furthermore, the beamforming vector for VU
is denoted by , where represents the beamforming vector of AP for VU at time . Using this beamforming vector, the transmitted signal of AP is denoted as , where is the unit powered signal for and . As such, at time , the downlink received signal at is calculated as follows:(3a)  
(3b) 
where is the received noise at time .
is circularly symmetric complex Gaussian distributed with zero mean and
variance.IiC UserCentric Dynamic Cell Formation
We consider the vehicular edge network operates in time division duplex mode. Thus, the achievable rate for VU , at time , is calculated as follows:
(4) 
where is spectral efficiency loss due to signaling at the APs and is the SINR. Moreover, as multiple APs are scheduled to transmit to , the backhaul link consumption by the VU is carefully calculated as follows[6831362]:
(5) 
where denotes the total number of nonzero elements in a vector. This is commonly known as the norm. If a user is scheduled in a transmission time slot , the precoding vectors from all of the APs for that VU, i.e., , is nonzero leading to a nonzero achievable data rate.
Note that we presume to serve all active users in a transmission time slot by forming virtual cells for each of the users and dynamically selecting the transmission power of the APs. As such, we intend to find optimal usercentric cell formation and beamforming weights calculation for the APs in our objective function. The first question that we try to answer is  what is the maximum throughput in our SD controlled highly mobile vehicular network? A naive approach would be serving a user from as many APs as possible with the maximum transmission powers of the APs. However, this will bring down the user fairness and EE whatsoever. Therefore, it is essential to justify the user data rate with EE. To avoid crossdomain nomenclature, let us define what we refer to as the EE. The fraction of the total user sum rate to the total power consumption of the network is defined as EE. At a given time slot , we calculate EE as follows:
(6) 
where is calculate in Equation (5).
Therefore, in this paper, we address the following question: what are the usercentric associations and power allocations that guarantee reliability, programmability and EE of the entire network? To this end, we formulate a joint optimization problem as follows:
Find:  
Maximize  (7a)  
Subject to  (7b)  
(7c)  
(7d)  
(7e) 
where is the minimum SINR requirement for our reliable communication. The reliability constraint is reflected in Equation (7c). is the maximum allowable transmit power of AP which is controlled via Equation (7d). Equation (7b) is taken to ensure each virtual cell contains more than one APs. Moreover, Equation (7e) indicates the feasible solution space.
Note that the norm restricts using the gradientbased solution. Besides, the formulated problem is a hardcombinatorial problem, which is extremely difficult to solve within a short period. Moreover, for each of the AP, at each time slot , there are possible combinations only for the possible VUAP associations. For each of these associations, the AP, furthermore, needs to choose the optimal power level for the scheduled users. In this paper, instead of a continuous power level, we divide the AP’s transmission power level into multiple discrete levels. As our SD controlled ANs know the perfect CSI, we model the beamforming vector as follows:
(8) 
where is the wireless channel information from AP to VU and is the allocated transmission power of AP to transmit to VU . If a centralized decision has to be taken, the centralized agent needs to make a central decision for all of the APVU associations and their power level selections. In that case, the size of the action space is , where is the total discrete power levels. Thus, traditional optimization methods may take an enormous amount of time to solve such an intricate problem. As such, we use a modelfree learning approach to solve the optimization problem efficiently in the next section.
Iii EnergyEfficient Resource Slicing at Edges: A Reinforcement Learning Approach
As we assume the CSI is known, our statespace contains all CSIs  denoted by , the locations of the VUs  denoted by and the locations of the APs  denoted by . Therefore, we denote the statespace by . On the other hand, the action space contains the VUAP association and beamforming vectors for the chosen association. The action space is, thus, a two step process. First, the RL agent needs to choose a possible association. Then it designs the beamforming vectors. We express the action space as . Moreover, we have considered the EE of equation (6) as the reward function of the RL agent. However, to ensure fairness among users achievable rate, at each time slot , we have employed the following restriction:
(9) 
Iiia Single Agent Reinforcement Learning (SARL)
Taking state and action into account, learning based RL framework can effectively solve hard optimization problems. Note that it is a modelfree learning [watkins1992q, lin2016qos] process where, in each state , the agent takes an action , gets a reward for the chosen action and the environment transits to the next state . The governing equation of Qlearning is shown in the following:
(10) 
where and are learning rate and discount factor, respectively. Although SARL is a good baseline scheme, if the number of states and actions is too large, it may become impracticable to handle. For an example, if and , then the baseline centralized SARL has a action space^{2}^{2}2Note that this contains the total action space. The number of valid actions will be lesser than this due to the maximum allowable transmission power constraint of the AP. of order
. This is commonly known as the curse of dimensionality. As an alternative, a DMARL solution is proposed in what follows.
IiiB Decentralized MultiAgent RL (DMARL)
In traditional MARL, multiple agents can take independent decisions and lead to optimal network performances. The action space for each of the agents is very small compared to that of the centralized SARL. For the same example of the centralized SARL one, if we consider each AP as an independent agent for the MARL scheme, the order of the action space for an agent is . However, whether MARL will accomplish the optimal solution, in our platform, is uncertain. Therefore, we have used the concept of MARL. Yet, instead of multipleagents taking independent actions from a shrunk action space, we have used a distributed learning process where distributive agents take decisions from a segmented original SARL’s action space. In other words, the original SARL’s action space is subdivided into multiple groups. Each agent takes its decision from an assigned smaller group.
If there are such agents, then the dimension of the table of such an agent is , where and represents the size of the state space and action space, respectively. Therefore, the order of the action space of each agent is of , where . Furthermore, let us assume there is a centralized vector  denoted by , that stores the global best action at every state. We update this global best action using the following equation:
(11) 
Therefore, our proposed DMARL algorithm can distributively learn to take the optimal central action. On the other hand, traditional MARL [liu2019trajectory] may not achieve the optimal solution as independent agents take autonomous actions in a shrunk action space. The joint actions of these agents may not be centrally optimal and lead to a suboptimal solution. Algorithm 1 summarizes the proposed DMARL algorithm.
Iv Performance Evaluation
We consider ROI = m, VU velocity = km/h, , = , antenna per AP, noise power = dBm/Hz, , and TTI = milliseconds. The channels, path loss, and shadowing are modeled following [3gpp36_885]. For the ease of simulation, we consider a full buffer network model where all APs serve all VUs simultaneously. We consider the following association rule:
(12) 
Note that our proposed problem solution can work in other scheduling algorithms as well. While the VUs are dropped uniformly in each lane, the APs are placed meters apart fixed locations. For a tractable state space, we have considered that, at a given time step, all VUs are in the same locations  while they have different locations.
Scheme Name  Training Episodes  Test Episodes  Average EE [bits/Hz/J]  Deviation from Benchmark 

Brute Force (Benchmark)  N/A  (Benchmark)  
DMARL (Proposed)  
SARL [lin2016qos]  
MARL [liu2019trajectory]  
Equal Power  N/A  
Random Power  N/A 
Iva Performance Comparisons
At first, we show the effectiveness of our proposed solution. In order to do that, we compare our results with the following schemes:

Brute Force (Benchmark): This is the optimal solution. In this case, at each state, we need to search for the optimal action that provides the maximum reward.

SARL [lin2016qos]: This is the baseline RL scheme. We adopted the learning process mentioned in [lin2016qos] for this case.

MARL [liu2019trajectory]: We have used the novel MARL learning process proposed by Liu et. al. in [liu2019trajectory].

Equal Power Allocation: In this case, we have assumed that the AP divides its transmission power equally to serve the VUs. Essentially, this is the centralized case. The central power allocation decision is chosen in such a way that each AP transmits to its scheduled users using equal power.

Random Power Allocation: We have assumed that the AP chooses random transmission power from the discrete power level to serve a VU. This is also a centralized case. In each state and time slot, we pick a random central decision from all of the possible actions in the centralized action space.
We use each AP as an independent agent for the MARL algorithm. Therefore, there are agents for MARL [liu2019trajectory] where each AP takes it’s association and power allocation decision independently. For our proposed DMARL algorithm, we have used agents. The SARL and MARL models are trained for episodes whereas, the DMARL model is trained only on episodes. We have taken . Besides, the value of both and are decayed linearly from to in each episode.
From test episodes, the performance comparisons of our proposed algorithm with other schemes are listed in Table I. Note that we have taken dB and AP coverage radius
m for this comparison. Clearly, machine learning solutions achieve much higher performances than two baseline schemes (equal power allocation and random power allocation). Furthermore, notice that the centralized baseline SARL solution and the proposed DMARL solution deliver nearly identical performance to that of the brute force optimal performance. Thanks to RL, the agents learn to take optimal actions from the training episodes and deliver a nearoptimal performance. The performance of the MARL
[liu2019trajectory] is also very close to this optimal solution. However, recall that the total number of training episodes for SARL and MARL is 4 times the training episodes of our proposed DMARL algorithm. Our proposed DMARL achieves dB and dB performance gain over equal power allocation and random power allocation schemes, respectively.IvB Impact of the Reliability Constraint
The reliability constraint has a significant impact on the overall network performance. If we increase the reliability constraint, , we force the RL agents to find optimal solutions that maximize the EE without violating the reliability constraint. Therefore, as this constraint increases, the number of total failed events also increases. We first calculate the success probability of delivering the reliability constraint as follows:
(13) 
where is the total number of time steps and is an indicator function for the event that for any of the . The probability of delivering the minimum required SINR is shown in Fig. 2. The RL algorithms perform better than the baseline schemes. Furthermore, as increases, the successful transmission events get decayed. Our proposed DMARL can deliver nearoptimal success probability with these varying reliability requirements. On the other hand, the performance gap between MARL [liu2019trajectory] and DMARL is quite evident from this result. Moreover, increasing the reliability constraint may necessitate the APs to transmit to the VUs with more power so that it can attain the SINR threshold. However, this will immediately downgrade the EE. This is also reflected in our simulation results in Fig. 3. As the performances of the two baseline schemes (equal power allocation and random power allocation) are very poor compared to the RL schemes, hereinafter, we will only compare the performance of our proposed algorithm with the brute force (benchmark) and other two RL schemes.
IvC Impact of the Coverage Radius
Now, we analyze the impact of the AP’s coverage radius. To do that, we keep the reliability constraint fixed and vary the coverage radius. Note that as the reliability constraint is fixed, the probability of success, as shown in equation (13), should not fluctuate that much while we vary the coverage radius. This is also reflected in Fig. 4. Besides, as the coverage radius of the AP increases, more VUs can be served by each of the APs. Although the SINR constraint is fixed, recall that from our association rule in equation (12) and rate calculation in equation (5), it is quite clear that increasing the coverage radius will increase the total number of links for a VU. This will, therefore, improve the user sum rate. On the other hand, if the VU is far away from an AP, the AP might need to transmit to it with more power. However, the RL agents will find optimal power allocations that increase the user sum rate with appropriate power levels that ultimately increase the EE of equation (6). This trend is also reflected in our simulation results in Fig. 5. As the coverage radius increases, the DMARL algorithm finds optimal associations and power allocations, leading to an improved EE of the network.
IvD User Fairness
Furthermore, a reliable and efficient network should ensure fairness while serving its associated users. A fair system delivers a nearly equal data rate to all users. Our reward function is designed to serve this purpose. Moreover, user fairness is also ensured by our proposed DMARL algorithm. From test episodes, user fairness  while conserving the maximized EE, is presented in Fig. 6. Our proposed DMARL delivers a Jain’s fairness index [jain1999throughput] of . The fairness index for the optimal scheme, SARL [lin2016qos] and MARL [liu2019trajectory] are , , and , respectively. Note that this fairness index varies between and . A fairness index of means there is no fairness whatsoever in the network. Fairness among users increases with the rise of this index.
V Conclusion
In this paper, we have jointly optimized virtual cell formation and power allocation to assure ubiquitous connectivity and reliability at the vehicular edge networks for connected transportation. Thanks to RL’s powerful complex problemsolving ability, the hard combinatorial joint optimization problem is efficiently solved using this sophisticated learning process. Particularly, we have used a sophisticated DMARL solution for the ecovehicular edge network in connected transportation. Our proposed algorithm attains nearoptimal benchmark performance within a nominal number of training episodes.
Comments
There are no comments yet.