I Introduction
Network Slice Placement is a critical research problem in the context of Network Slicing [4]
, which can be formulated as a multiobjective Integer Linear Programming (ILP) optimization problem. It is wellknown in the literature that such a problem is
hard [3] with very long convergence time. Therefore, heuristic approaches have been developed (an extensive list is given in [1]). From an operational perspective, heuristics are more suitable than ILP as they yield faster placement results. However, they give suboptimal solutions.To overcome convergence issues, Machine Learning (ML) and Reinforcement Learning (RL) techniques have been introduced as they are more accurate than heuristics
[5]. In practice, ensuring that a Deep Reinforcement Learning (DRL) algorithm converges to an optimal policy is a challenge since the algorithm acts as a selfcontrolled black box. In addition, such algorithm relies on a large number of hyperparameters to finetune in order to ensure accuracy of the solution exploration and the exploitation of knowledge acquired via training.We describe the Heuristically Assisted DRL (HADRL) introduced in [2] to specifically address the challenge of obtaining a better and explainable behavior of DRL agents. The proposed solution is innovative: i) because it is a hybrid approach gathering the strength of heuristics and the flexibility of a DRL, and ii) because of its performance results as it accelerates and stabilizes the convergence of stateoftheart DRL techniques when applied to Network Slice Placement. In the rest of the paper, we present the problem statement in Section II. The main aspects of the proposed HADRL approach are introduced in Section III.
Ii Network Slice Placement Optimization problem
The compact formulation for the proposed Network Slice Placement Optimization problem is as follows:

Given: a Network Slice Placement Request (NSPR) graph and a Physical Substrate Network (PSN) graph ,

Find: a mapping , , ,

Subject to: the VNF CPU requirements , the VNF RAM requirements , the Virtual Link (VL) bandwidth requirements , the node CPU available capacity , the node RAM available capacity , the physical link bandwidth available capacity .

Optimization Objectives: 1) maximize the network slice placement request acceptance ratio; 2) minimize the total resource consumption, and 3) maximize load balancing.
A complete model description and problem formulation can be found in [1] and [2]. This is a multiobjective optimization problem with objective function given by:
(1) 
for some weight coefficients , , where and are available capacities on node in CPU and RAM, respectively. This objective function is clearly a weighted sum of the three optimization objectives described previously and we use the following decision variables:

for and is equal to 1 if the VNF is placed onto node and 0 otherwise,

for and is equal to 1 if the virtual link is mapped onto physical link and 0 otherwise.

is equal to 1 if all the VNFs of the NSPR are placed and 0 otherwise.
The values assumed by the above variables are controlled by the problem constraints identified in [1] and [2].
Optimally solving this problem using ILP based techniques would require a proper definition of the values for the three objective function coefficients and also affording for the long convergence times of ILP on hard problems. We propose instead in Section III a faster and more scalable solution based on DRL controlled by a heuristic method.
Iii Heuristically Assisted DRL for Network Slice Placement
We transform the Network Slice Placement Optimization problem described in Section II
in a Partially Observable Markov Decision Process (POMDP) in which: i) the placement of one NSPR is calculated iteratively (i.e., VNFs are placed sequentially, one by one, starting from the first VNF, along with its associated VLs), ii) the state contains the features of the PSN (i.e., the CPU, RAM, and bandwidth available capacities —
, , , respectively — associated with each node; a placement mask giving the number of VNFs of the current NSPR placed in each PSN node) and NSPR (i.e., the CPU, RAM and bandwidth requirements — , , , respectively — associated with each VNF of the NSPR and a counter of missing VNFs to place at each iteration), iii) the action calculated by the policy is a valid placement (i.e., a placement respecting the problem constraints), and iv) the reward corresponds to an appropriate function that measures how good is the computed action with respect to the optimization objectives described in Section II. We apply the proposed HADRL concept to this problem.Fig. 1 presents the structure of the proposed HADRL algorithm. It is an extension of the Asynchronous Advantage Actor Critic (A3C) algorithm introduced in [5]
, referred to in the present paper as DRL, for the VNE problem. DRL uses two Deep Neural Networks (DNNs) to learn the optimal policy
and the optimal statevalue function called Actor and Critic Networks, respectively. We propose to modify the Actor Network by adding a Heuristic Function layer that enhances the exploration process and accelerates the convergence of the algorithm by influencing the policy choice of actions. This layer benefits from external information provided by an efficient heuristic for slice placement we proposed in [1], referred as HEU. A detailed description of the proposed HADRL algorithm can be found in [2].Iv Implementation & Evaluation Results
We developed a simulator in Python to implement and evaluate the proposed DRL and HADRL algorithms. We used the PyTorch framework to implement the DNNs. We consider an implementation of the HEU algorithm
[1] in Julia as a benchmark in the performance evaluation experiments. Experiments were run in a 2x6 cores @2.95Ghz 96GB machine. We consider NSPRs to have the Enhanced Mobile Broadband (eMBB) settings described in [1] with 5 to 20 VNFs. We consider a PSN that could reflect the infrastructure of an operator such as Orange [1] with 126 to 1008 placement nodes. DRL and HADRL agents are trained for 24 hours with learning rates for the Actor and Critic networks set to and respectively.We consider four versions of HADRL algorithms each with a different value for the parameter of the Heuristic Function (HADRL, ; HADRL, ; HADRL, ; HADRL, ). The value of controls how much the HEU method influences the policy. The higher the value of is, the more the HADRL algorithm is most likely to select the action computed by the HEU method, see [2].
Iva Acceptance Ratio Evaluation
Fig. 2 and Tables I, II present the Acceptance Ratios obtained with the HADRL, DRL and HEU algorithms in underloaded (50%) and critical (100 %) regimes. HADRL with parameter shows the best performance with fast convergence in both scenarios. This happens because when setting the Heuristic Function calculated on the basis of the HEU algorithm has strong influence on the actions chosen by the algorithm. Since the HEU algorithm often indicates a good action, this helps the algorithm to become stable more quickly.
Fig. (a)a and Table I reveal that DRL and HADRL algorithms with converge within an interval of 200 to 300 training phases in the underloaded scenario and that all algorithms except HADRL with have an Acceptance Ratio higher than 94% in the last training phase.
Fig. (b)b and Table II show that in the Critical load scenario, HADRL with performs significantly better than the other algorithms at the end of the training phase. HADRL with accepts 67.2 % of the NSPR arrivals in the last training phase, 8.34 % more than HEU, 10.21 % more than DRL, 14.7 % more than HADRL with , 22.6% more than HADRL with , 29.3% more than HADRL with . Fig. (b)b also shows that the performance of algorithms DRL and HADRL with is still not stable at the end of the training process in the Critical load scenario.
Algorithm  Acceptance Ratio at different Training Phases (%)  

25  100  200  300  400  
HADRL,=0.1  57.90  80.20  92.70  93.00  96.20 
HADRL,=0.5  58.50  83.00  90.20  94.80  96.30 
HADRL,=1.0  58.80  86.20  86.80  85.50  85.80 
HADRL,=2.0  93.50  90.30  93.10  92.20  94.80 
DRL  57.10  83.60  91.20  94.80  95.70 
HEU  93.50  94.00  94.00*  94.00  94.00* 

Performance of the HEU algorithm in the steady state.
Algorithm  Acceptance Ratios at different Training Phases (%)  

25  100  200  300  400  480  
HADRL,=0.1  30.10  49.70  45.30  45.0  41.00  37.90 
HADRL,=0.5  30.80  45.90  50.80  44.5  38.90  44.60 
HADRL,=1.0  27.40  52.00  55.50  55.60  49.00  52.50 
HADRL,=2.0  67.60  67.60  70.80  69.60  66.10  67.2 
DRL  29.30  49.10  46.60  50.10  53.40  56.99 
HEU  60.70  58.86  58.86*  58.86*  58.86*  58.86* 
IvB Execution Time Evaluation
Fig. (a)a and (b)b present the average execution time of the HEU, DRL and HADRL algorithms as a function of the number of VNFs in the NSPR and the number of servers in the PSN, respectively. The evaluation results show that the average execution times increase faster for heuristics than for a pure DRL approach.
However, both HEU and DRL strategies have low execution times (less than 0.6s in the largest scenarios). HADRL has an average execution time approximately equal to the sum of HEU and DRL execution times.
Since DRL and HEU have small execution times, the average execution times of HADRL are also small.
V Conclusion
We have briefly described the Heuristically Assisted DRL (HADRL) approach to Network Slice Placement Optimization introduced in [2]. The main contributions of the proposed solution are: i) enhanced scalability compared to classical ILP and heuristic approaches, ii) adoption of multiple optimization criteria in a simple way, iii) accelerated learning by exploiting external information provided by an efficient heuristic. Evaluation results show that the proposed HADRL approach yields good placement solutions in nearly real time, converges significantly faster than pure DRL approaches, and yields better performance in terms of acceptance ratio than stateoftheart heuristics and DRL algorithms when used alone. As part of our future work, we plan to explore distribution and parallel computing techniques to solve the considered multiobjective optimization problem using a multiagent or federated learning to assess scalability of network slices.
Acknowledgment
This work has been performed in the framework of 5GPPP MONB5G project (www.monb5g.eu).
References
 [1] (2020) Heuristic for edgeenabled network slicing optimization using the “power of two choices”. In Proc. 2020 IEEE 16th Int. Conf. Netw. Service Manag. (CNSM), pp. 1–9. External Links: Document Cited by: §I, §II, §III, §IV.
 [2] (2021) A heuristically assisted deep reinforcement learning approach for network slice placement. arXiv preprint arXiv:2105.06741. Cited by: §I, §II, §III, §IV, §V.
 [3] (2016Jun.) On the computational complexity of the virtual network embedding problem. Electron. Notes Discrete Math. 52, pp. 213–220. Cited by: §I.
 [4] (20192nd. Quart.,) A survey on the placement of virtual resources and virtual network functions. ieee_o_csto 21 (2), pp. 1409–1434. Cited by: §I.
 [5] (2020Jun.) Automatic virtual network embedding: a deep reinforcement learning approach with graph convolutional networks. ieee_j_jsac 38 (6), pp. 1040–1057. Cited by: §I, §III.
Comments
There are no comments yet.