Internet Engineering Task Force I. Ullah Internet-Draft Y-H. Han Intended status: Informational KOREATECH Expires: 23 October 2022 TY. Kim ETRI 21 April 2022 Reinforcement Learning-Based Virtual Network Embedding: Problem Statement draft-ihsan-nmrg-rl-vne-ps-02 Abstract In Network virtualization (NV) technology, Virtual Network Embedding (VNE) is an algorithm used to map a virtual network to the substrate network. VNE is the core orientation of NV which has a great impact on the performance of virtual network and resource utilization of the substrate network. An efficient embedding algorithm can maximize the acceptance ratio of virtual networks to increase the revenue for Internet service provider. Several works have been appeared on the design of VNE solutions, however, it has becomes a challenging issues for researchers. To solved the VNE problem, we believe that reinforcement learning (RL) can play a vital role to make the VNE algorithm more intelligent and efficient. Moreover, RL has been merged with deep learning techniques to develop adaptive models with effective strategies for various complex problems. In RL, agents can learn desired behaviors (e.g, optimal VNE strategies), and after learning and completing training, it can embed the virtual network to the subtract network very quickly and efficiently. RL can reduce the complexity of the VNE algorithm, however, it is too difficult to apply RL techniques directly to VNE problems and need more research study. In this document, we presenting a problem statement to motivate the researchers toward the VNE problem using deep reinforcement learning. Status of This Memo This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet- Drafts is at https://datatracker.ietf.org/drafts/current/. Ullah, et al. Expires 23 October 2022 [Page 1] Internet-Draft ML-based Virtual Network Embedding April 2022 Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." This Internet-Draft will expire on 23 October 2022. Copyright Notice Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved. This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/ license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License. Table of Contents 1. Introduction and Scope . . . . . . . . . . . . . . . . . . . 2 2. Reinforcement Learning-based VNE Solutions . . . . . . . . . 5 3. Terminology . . . . . . . . . . . . . . . . . . . . . . . . . 8 4. Problem Space . . . . . . . . . . . . . . . . . . . . . . . . 9 4.1. State Representation . . . . . . . . . . . . . . . . . . 9 4.2. Action Space . . . . . . . . . . . . . . . . . . . . . . 9 4.3. Reward Description . . . . . . . . . . . . . . . . . . . 10 4.4. Policy and RL Algorithms . . . . . . . . . . . . . . . . 11 4.5. Training Environment . . . . . . . . . . . . . . . . . . 12 4.6. Sim2Real Gap . . . . . . . . . . . . . . . . . . . . . . 13 4.7. Generalization . . . . . . . . . . . . . . . . . . . . . 14 5. IANA Considerations . . . . . . . . . . . . . . . . . . . . . 14 6. Security Considerations . . . . . . . . . . . . . . . . . . . 14 7. Informative References . . . . . . . . . . . . . . . . . . . 14 Authors' Addresses . . . . . . . . . . . . . . . . . . . . . . . 18 1. Introduction and Scope Recently, Network virtualization (NV) technology has received a lot of attention from academics and industry. It allows multiple heterogeneous virtual networks to share resources on the same substrate network (SN) [RFC7364], [ASNVT2020]. The current large- size fixed substrate network architecture is no longer efficient and not extendable due to network ossification. To overcome this limitations, traditional Internet Service Providers (ISPs) are Ullah, et al. Expires 23 October 2022 [Page 2] Internet-Draft ML-based Virtual Network Embedding April 2022 divided into two independent parts which work together. One is the Service Providers (SPs) who create and own the different number of the VNs, and the other one is the Infrastructure Providers (InPs) who own the SN devices and links as underlying resources. SPs generate and construct the customized Virtual Network Requests (VNRs), and lease the resources from InPs based on that requests. In addition, two types of mediators can enter into the industry domain for better coordination of SPs and InPs. One is the Virtual Network Providers (VNPs) who assemble and coordinate diverse virtual resources from one or more InPs, the other one is the Virtual Network Operators (VNOs) who create, manage, and operate the VN according to the demand of the SPs. VNPs and VNOs could enable efficient use of the physical network and increase the commercial revenue of both SPs and InPs. NV can increase network agility, flexibility and scalability while creating significant cost savings. Greater network workload mobility, increased availability of network resources with good performance, and automated operations, are all the benefits of NV. Virtual Network Embedding (VNE) [VNESURV2013] is one of the main technique and strategy which used to map a virtual network to the substrate network. VNE algorithm has two main parts, Node embedding: where virtual nodes of VN have to be mapped to the SN nodes, and Link embedding: where virtual links between the VNs have to be mapped to the physical paths in the substrate network. It has been proven to be NP-Hard, and both node and link embeddings have become challenging for the researchers. A virtual node and link should be efficiently embedded into a given SN, so that more VNR can be accepted with minimum cost. The distance of the virtual nodes from each other in a given SN is a big contribution to the link failures and causes the rejection of VNRs. Hence, an efficient and intelligent technique is required for VNE problem to reduce VNRs rejection [ENViNE2021]. In the perspective of the InPs, the efficient VNE performs better mostly in terms of revenue, acceptance ratio, and revenue-to-cost ratio. Figure 1 shows the the example of two virtual network request VNR1 and VNR2 to embed them in the given substrate network. VNR1 contain three virtual nodes (a, b, and c) with cpu demands (15, 30, and 10) respectively, and the link between virtual the nodes a-b,b-c, and c-a with bandwidth demands 15,20, and 35 respectively. Similarly, VNR2 contains virtual nodes and links with cpu and bandwidth demand respectively. The purpose of the VNE algorithm to map the virtual nodes and links of the VNRs to the physical nodes and links of the given substrate as shown in Figure 1. [ENViNE2021]. Ullah, et al. Expires 23 October 2022 [Page 3] Internet-Draft ML-based Virtual Network Embedding April 2022 +----+ +----+ +----+ +----+ | a | | d | | e | | f | | 15 | | 25 |__ _25___| 30 |__ _35_ __| 45 | +----+ +----+ +----+ +----+ / \ \ / 15 35 30 20 / \ \ / +----+ +----+ +----+ +----+ | b | | c | | g | | h | | 30 |__ _20_ __ _| 10 | | 15 |__ _ __10__ __ __| 35 | +----+ +----+ +----+ +----+ (VNR1) (VNR2) || Embedding || Embedding VV VV +----+ +----+ +----+ +----+ .......| a |......35......| c | | d |........25........| e | : _____| 15 | | 10 |_______| 25 | ________| 30 | : | +----+ +----+ +----+ | +----+ : | A | | : B | : | C | : : | 50 |__ ___50__ __ __| : 60 |_:_ __30 _ _| 40 | : : +__________+ +_:_________+ : +__________+ : : | : | : | : 15 | : | : | 35 : 40 20 60 : 50 : : | : | 30 | : : | _:_____|_ : | : +----:..............20........|.: | : | +----+ | b | | +----+.....30......|.........|....: | | f | | 30 |_|___| g | | +----+ __|___| 45 | +----+ | 15 |.....10......|.......| h |........20.....|......+----+ | D +____+ | E | 35 | | F | | 50 |__ __ __ 70 _____| 40 +____+ ___ __ 50_ ___| 60 | +__________+ +_________+ +__________+ Figure 1: Substrate network with embedded virtual network, VNR1 and VNR2 Recently, artificial intelligence and machine learning technologies have been widely used to solve networking problems [SUR2018], [MLCNM2018], [MVNNML2021]. There has been a surge in research efforts,specially,reinforcement learning (RL) which has been contributed much more in the many complex tasks, e.g. video games and auto-driving etc. The main goal of an RL to learn better policies for sequential decision making problems (e.g., VNE) and solve them very efficiently. Ullah, et al. Expires 23 October 2022 [Page 4] Internet-Draft ML-based Virtual Network Embedding April 2022 Problems such as node classification, pattern matching, and network feature extraction, can be simplified by graph-related theories and techniques. Graph neural network (GNN) is a new type of ML model architecture that can aggregate graph features (degrees, distance to specific nodes, node connectivity, etc.) on nodes [DVNEGCN2021]. Graph convolution neural network (GCNN) is a natural generalization form of GNN which is used to automatically extract the features of underlying network, which optimizes the selection of VNE decision. The model can be used to cluster nodes and links according to the physical nodes and physical links attribute characteristics (CPU, storage, bandwidth, delay, etc.), and it is highly suitable for graph structures of any topological form. Hence, GNN is useful to find the best VNE strategy by intelligent agent training, and the organic combination of VNE and GCN has a good prerequisite. Designing and applying RL techniques directly into VNE problems is not yet trivial, but may face several challenges. Several works have been appeared on the design of VNE solutions using RL, which focuses on how to interact with the environment to achieve maximum cumulative return [VNEQS2021], [NRRL2020], [MVNE2020], [CDVNE2020], [PPRL2020], [RLVNEWSN2020], [QLDC2019], [VNFFG2020], [VNEGCN2020], [NFVDeep2019], [DeepViNE2019], [VNETD2019], [RDAM2018], [MOQL2018], [ZTORCH2018], [NeuroViNE2018], [QVNE2020]. This document outlines the problems encountered when designing and applying RL-based VNE solutions. Section 2 describes how to design RL-based VNE solutions. Section 3 gives terminology, and Section 4 describes the problem space details. 2. Reinforcement Learning-based VNE Solutions As we discussed that RL has been studied in various fields (such as game, control system, operation research, information theory, multi- agent system, network system, etc.) and shows better performance than humans. Unlike deep learning, RL trains a policy model by receiving rewards through interaction with the environment without training label data. Ullah, et al. Expires 23 October 2022 [Page 5] Internet-Draft ML-based Virtual Network Embedding April 2022 Recently, there have been several attempts to solve VNE problems using RL. When applying RL-based algorithms to solve VNE problems, the RL agent automatically learns through the environment without human intervention. Once the agent completed the learning process, it can generate the most appropriate embeddings decision (action) based on the his knowledge and network state. For single embedding or action at each time step the agent get reward from the environments to adaptively train its policy for future action. The RL agent gets the most optimized model based on the reward function defined according to each objective (revenue, cost, revenue to cost ratio and acceptance ratio). The optimal RL policy model provides the VNE strategy appropriately according to the objective of the network operator. Figure 2. shows the virtual network embedding solution based on RL algorithm. The RL strategy is divided into two main parts training process and an inference process. In the training process, state information is composed of various substrate networks and VNRs (Environment), which are used as suitable inputs for RL models through feature extraction. After that, the RL model is updated by model updater using a feature extracted state and reward. In the inference process, using the trained RL model, the embedding result is provided to the operating network in real time. The following figure shows the detail about RL method based virtual networks embedding solutions. Ullah, et al. Expires 23 October 2022 [Page 6] Internet-Draft ML-based Virtual Network Embedding April 2022 RL Model Training Process +--------------------------------------------------------------------+ | Training Environment | | +-------------------+ RL-based VNE Agent | | | +---------+ | +----------------------------------+ | | | | +---------+ | | Action | | | | | | +----------+ |<----------------------------------+ | | | | + | | Substrate| | | | | | | | | | Networks | | | +----------+ +----------+ | | | | + +----------+ | State | | Feature | | RL | | | | | |----------->|Extraction|----->| Model | | | | | +--------+ | | +----------+ | (Policy) | | | | | | +---------+ | | | +----------+ | | | | + | +---------+ | | | +---------+ A | | | | + | VNRs | | Reward | +-->| Model | | | | | | +---------+ |-------------------->| Updater |-----+ | | | +-------------------+ | +---------+ | | | +----------------------------------+ | +--------------------------------------------------------------------+ | Inference Process | +---------------------------------V----------------------------------+ | + - - - - - - - + | | Operating Network | RL Model | Trained RL Model | | (Inference Environment) | Training |------------------+ | | +-------------------+ | Process | | | | | +-----------+ | + - - - - - - - + | | | | | | | RL-based VNE Agent | | | | | Substrate | | +----------------------------|-----+ | | | | Network | | | Action | | | | | | | |<--------------------------------+ | | | | | +-----------+ | | | V | | | | +---------+ | | +------------+ +---------+ | | | | | +---------+ | State | | Feature | | Trained | | | | | + | +----------+ |----------->| Extraction |---->| RL | | | | | + | VNRs | | | +------------+ | Model | | | | | +----------+ | | +---------+ | | | +-------------------+ +----------------------------------+ | +--------------------------------------------------------------------+ Figure 2: Two processes for RL method based VNE Ullah, et al. Expires 23 October 2022 [Page 7] Internet-Draft ML-based Virtual Network Embedding April 2022 3. Terminology Network Virtualization Network virtualization is the process of combining hardware and software network resources and network functionality into a single, software-based administrative entity, a virtual network [RFC7364]. Virtual Network Embedding (VNE) Virtual Network Embedding (VNE) [VNESURV2013] is one of the main techniques used to map a virtual network to the substrate network. Substrate Network (SN) The underlying physical network which contains the resources such as CPU and bandwidth for virtual networks is called substrate network. Virtual Network Request (VNR) Virtual Network Request is a complete single Virtual network containing virtual nodes and virtual links. Agent In RL, an agent is the component that makes the decision abd take action (i.e., embedding decision). State State is a representation (e.g., remaining SN capacity and requested VN resource) of the current environment, and it tells the agent what situation it is in currently. Action Actions (i.e., node and link embedding) are behavior an RL agent can do to change the states of the environment. Policy A policy defines an agent's way of behaving at a given time. It is a mapping from perceived states of environment to actions to be taken when in those states. It is usually implemented as a deep learning model because the state and action spaces are too large to be completely known. Reward A reward is the feedback which provides an agent to the agent for taking actions that lead to good outcomes (i.g., achieve the objective of the network operator). Ullah, et al. Expires 23 October 2022 [Page 8] Internet-Draft ML-based Virtual Network Embedding April 2022 Environment An environment is the agent's world in which it lives and interacts. The agent can interact with the environment by performing some action but cannot influence the rules of the environment by those actions. 4. Problem Space RL contains three main components: state representation, action space, and reward description. For solving a VNE problem, we need to consider how to design the three main RL components. In addition, a specific RL algorithm, training environment, sim2real gap, and generalization are also important issues that should be considered and addressed. We will describe each one in detail as follows. 4.1. State Representation The way to understand and observe the VNE problem is crucial for an RL agent to establish a thorough knowledge of the network status and generate efficient embedding decisions. Therefore, it is essential to firstly design the state representation that serves as the input to the agent. The state representation is the information which an agent can receive from the environment, and consists of a set of values representing the current situation in the environment. Based on the state representation, the RL agent selects the most appropriate action through its policy model. In the VNE problem, an RL agent needs to know the information of the overall SN entities and their current status in order to use the resources of the nodes and links of the substrate network. Also it must know the requirements of the VNR. Therefore, in the VNE problem, the state usually should represent the current resource state of the nodes and links of the substrate network (ie, CPU, memory, storage, bandwidth, delay, loss rate, etc.) and the requirements of the virtual node and link of the VNR. The collected status information is used as raw input, or refined status information through the feature extraction process is used as input for the RL agent. The state representation may vary depending on the operator's objective and VNE strategy. The method of determining such feature extraction and representation greatly affects the performance of the agent. 4.2. Action Space In RL, an action represents a decision that an RL agent can take based on current state representation. The set of all possible actions is called an action space. In the VNE problems, actions are generally divided into node embedding and link embedding. The action for node embedding means the VNR's nodes are assigned to which nodes in the SN. Also, for link embedding, the action represents the Ullah, et al. Expires 23 October 2022 [Page 9] Internet-Draft ML-based Virtual Network Embedding April 2022 selected paths between the selected substrate network nodes from the node embedding result. If the policy model of the RL agent is well trained, it will select the embedding result to maximize the reward appropriate for the operator's objectives. The output actions generated from the agent will indicate the adjustment of allocated resources. It is noted that, at each point of time step, an RL algorithm may decide to 1) embed each virtual node onto substrate nodes and then embed each virtual link onto substrate paths separately, or 2) embed the given whole VNR onto substrate nodes and links in the SN at once. In the former case, at every single step, a learning agent focuses on exactly one virtual node from the current VNR, and it generates a certain substrate node to host the virtual node. Link embedding is then performed separately in the same time step. To solve the VNE problem efficiently, mapping of virtual nodes and links are considered together, although they are performed separately. Link mapping is considering more complex than node mapping, because a virtual link can be mapped onto a physical path with different hops. On the other hand, at every single step, a learning agent can try to embed the given whole VNR, i.e., all virtual nodes and links in the given VNR, onto a subset of SN components. The whole VNR embedding should be handled as a graph embedding, so that the action space is huge and the design of the RL algorithm is usually more difficult than the one with each node and link embedding. 4.3. Reward Description Designing rewards is an important issue for an RL algorithm. In general, the reward is the benefit that an RL agent follows when performing its determined action. Reward is an immediate value that evaluates only the current state and action. The value of reward depends on success or failure of each step. In order to select the action that gives the best results in the long run, an RL agent needs to select the action with the highest cumulative reward. The reward is calculated through the reward function according to the objective of the environment, and even in the same environment, it may be different depending on the operator's objective. Based on the given reward the agent can evaluate the effectiveness to improve the policy. Hence, the reward function play a important rules in the training process of RL. In the VNE problem, the overall objectives are to reduce the VNE rejection, embed them with minimum cost, maximize the revenue, and increase the resource utilization of physical resources. Reward function should be designed to achieve one or multiple ones of these objectives. Each objective and its correspondent reward design are outlined as follows: Ullah, et al. Expires 23 October 2022 [Page 10] Internet-Draft ML-based Virtual Network Embedding April 2022 Revenue Revenue is the sum of the virtual resources requested by the VN, and calculated to determine the total cost of the resources. Typically, a successful action (e.g., VNR is embedded without violation) is treated to be a good reward which also increases the revenue. Otherwise, a failed action (e.g., VNR is rejected) leads that the agent will receive a negative reward as well as decreasing the revenue. Cost Cost is the expenditure incurred when VNR is embedded as a substrate network. It's not a good embedding result to pursue only high revenue. It is important for the network operator and SP to spend less. The lower the cost, the better the agent will be rewarded. Acceptance Ratio Acceptance ratio is the ratio measured by the number of successfully embedded virtual network requests divided by total number of virtual network requests. To achieve a high acceptance ratio, the agent is trying to embed maximum VNR and get a good reward. Getting a good reward is usually proportional to the acceptance ratio. Revenue-to-cost ratio To balance and compare the cost of resources for embedding VNR, the revenue is divided by cost. Revenue-to-cost ratio compares the embedding algorithms with respect to their embedding results in terms of the cost and revenue. Since most VNOs are most interested in this objective, a reward function should be made to relate to this performance metric. 4.4. Policy and RL Algorithms The policy is the strategy that the agent employs to determine the next action based on the current state. It maps states to actions that promise the highest reward. Therefore, an RL agent updates its policy repeatedly in the learning phase to maximize the expected cumulative reward. Unlike supervised learning, in which each sample has a corresponding label indicating the preferred output of the learning model, an RL agent relies on reward signals to evaluate the effectiveness of actions and further improve the policy. From the perspective of RL, the goal of VNE is to find an optimal policy to embed an VNR onto the given SN in any state at any time. There are two types of RL algorithms: on-policy and off-policy. In on-policy RL algorithms, the (behaviour) policy of the exploration step to select an action and the policy to learn are the same. On-policy algorithms work with a single policy, and require any observations Ullah, et al. Expires 23 October 2022 [Page 11] Internet-Draft ML-based Virtual Network Embedding April 2022 (state, action, reward, next state) to have been generated using that policy. Representative on-policy algorithms include A2C, A3C, TRPO, and PPO. On the other hand, off-policy RL algorithms work with two policies. These are a policy being learned, called the target policy, and the policy being followed that generates the observations, called the behaviour policy. In off-policy RL algorithms, the learning policy and the behaviour policy are not necessarily the same. It allows the use of exploratory policies for collecting the experience, since learning and behavior policies are separated. In the VNE problem, various experiences can be accumulated by extracting embedding results using various behavior policies. Representative off-policy algorithms include Q-learning, DQN, DDPG, and SAC. There are different classifications for RL algorithms: model-based and model-free. In model-based RL algorithms, an RL agent learns its optimal behavior indirectly by learning a model of the environment by taking actions and observing the outcomes that include the next state and the immediate reward. The models predict the outcomes of actions. The model is used instead of the environment or in addition to interaction with it to learn optimal policies. This becomes, however, impractical when the state and action space is large. Unlike model-based algorithms, model-free RL algorithms learn directly by trial and error with the environment and do not require the relatively large memory. Since data efficiency or safety is very important even in VNE problems, the use of model-based algorithms can be actively considered. However, since it is not easy to build a good model that mimics a real network environment, a model-free RL algorithm may be more suitable for VNE problems. In conclusion, a good RL algorithm selection plays an important role in solving the VNE problem, and VNE performance metrics vary depending on the selected RL algorithm. 4.5. Training Environment Simulation is the use of software to simulate an interacting environment that is difficult to actually execute and test. An RL algorithm learns by iteratively interacting with the environment. However, in the real environment, various variables such as failure and component consumption exist. Therefore, it is necessary to learn through a simulation that simulates the real environment. In order to solve the VNE problem, we need to use a network simulator similar to the real environment because it is difficult to repeatedly experiment with real network environments using an RL algorithm, and it is very challenging and overwhelming to directly apply an RL algorithm to real-world environments. When solving VNE problems, a network simulation environment similar to a real network is required. The network simulation environment should have a general SN environment and VNR required by the operator. The SN has nodes and links between nodes, and each has capacity such as CPU and Bandwidth. Ullah, et al. Expires 23 October 2022 [Page 12] Internet-Draft ML-based Virtual Network Embedding April 2022 In the case of VNR, there are virtual nodes and links required by the operator, and each must have its own requirements. As described in [DTwin2022], a digital twin network is a virtual representation of the physical network environment and can be built by applying digital twin technologies to the environment and creating virtual images of diverse physical network facilities. The digital twin for networks is an expansion platform of network simulation. In [DTwin2022], Section 8.2 describes that a digital twin network provides the complete machine learning lifecycle development by providing a realistic network environment, including network topologies, etc. Hence, RL algorithms to solve the VNE problem can be trained and verified on a digital twin network upfront before deployed to the physical networks, and the verification accuracy will be generally high when the digital twin network reproduces network behaviors well under various conditions. On the other hand, two placeholders marked as [DTwin2022] in the above new paragraph should be replaced with the right reference number after inserting the following new Internet-Draft, which introduces the definition, architecture, and use-cases of digital twin network, into "Section 7. Informative References" of our Internet Draft. 4.6. Sim2Real Gap Sim-to-real is a very comprehensive concept and applied in many fields including robotics and classic machine vision tasks. An RL algorithm iteratively learns through a simulation environment to train a model of the desired policy. The trained model is then applied to the real environment and/or tuned more for adapting to the real one. However, when the trained model is applied in the simulation to the real environment, sim2real gap problem arises. Closing the gap between simulation and reality gap in terms of actuation requires simulators to be more accurate, and to account for variability in agent dynamics. Obviously, the simulation environment does not match perfectly to the real environment which mostly fails in the tuning process and gives poor performance in the model because of the Sim2Real gap. The sim2real gap is caused by the difference between the simulation and the real environment. It is because the simulation environment cannot perfectly simulate the real environment, and there are many variables in the real environment. In a real network environment for VNE, the SN's nodes and links may fail due to external factors, or capacity such as CPU may change suddenly. In order to solve this problem, the simulation environment should be more robust or the trained RL model should be generalized. To reduce the gap between sim and real network environments we need to train our model with an efficient and large number of VNR and keep learning the agent not only depend on previous memorization. Ullah, et al. Expires 23 October 2022 [Page 13] Internet-Draft ML-based Virtual Network Embedding April 2022 4.7. Generalization Generalization refers to the trained model's ability to adapt properly to previously unseen new observations. An RL algorithm tries to learn a model that optimizes some objective with the purpose of performing well on data that has never been seen by the model during training. In terms of VNE problems, the generalization is a measure of how the agent's policy model performs on predicting unseen VNR. The RL agent not only has to memorize all the previous variance of the VNR but also to learn and explore more possible variance. It is important to have good and efficient training data for VNR with good variance and train the model with all possible VNRs. 5. IANA Considerations This memo includes no request to IANA. All drafts are required to have an IANA considerations section (see Guidelines for Writing an IANA Considerations Section in RFCs [RFC5226] for a guide). If the draft does not require IANA to do anything, the section contains an explicit statement that this is the case (as above). If there are no requirements for IANA, the section will be removed during conversion into an RFC by the RFC Editor. 6. Security Considerations All drafts are required to have a security considerations section. See RFC 3552 [RFC3552] for a guide. 7. Informative References [ASNVT2020] Sharif, Kashif., Li, Fan., Latif, Zohaib., Karim, MM., and Sujit. Biswas, "A Survey of Network Virtualization Techniques for Internet of Things using SND and NFV", DOI 10.1145/3379444, April 2020, . [CDVNE2020] "A Continuous-Decision Virtual Network Embedding Scheme Relying on Reinforcement Learning", DOI 10.1109/TNSM.2020.2971543, February 2020, . Ullah, et al. Expires 23 October 2022 [Page 14] Internet-Draft ML-based Virtual Network Embedding April 2022 [DeepViNE2019] Dolati, M., Hassanpour, S. B., Ghaderi, M., and A. Khonsari, "DeepViNE: Virtual Network Embedding with Deep Reinforcement Learning", BCP 72, RFC 3552, DOI 10.1109/INFCOMW.2019.8845171, September 2019, . [DTwin2022] Yang, H., Zhou, C., Duan, X., Lopez, D., Pastor, A., and Q. Wu, "Digital Twin Network: Concepts and Reference Architecture", DOI https://datatracker.ietf.org/doc/draft- irtf-nmrg-network-digital-twin-arch/, March 2022, . [DVNEGCN2021] Zhang, Peiying., Wang, Chao., Kumar, NeeraJ., Zhang,, Weishan., and Lei. Liu, "Dynamic Virtual Network Embedding Algorithm based on Graph Convolution Neural Network and Reinforcement Learning", DOI 10.1109/JIOT.2021.3095094, July 2021, . [ENViNE2021] ULLAH, IHSAN., Lim, Hyun-Kyo., and Youn-Hee. Han, "Ego Network-Based Virtual Network Embedding Scheme for Revenue Maximization", DOI 10.1109/ICAIIC51459.2021.9415185, April 2021, . [MLCNM2018] Ayoubi, Sara., Noura, Limam., Salahuddin, Mohammad., Shahriar, Nashid., Boutaba, NRaouf., Estrada-Solano, Felipe., and Oscar. M. Caicedo, "Machine Learning for Cognitive Network Management", DOI 10.1109/MCOM.2018.1700560, January 2018, . [MOQL2018] "Multi-Objective Virtual Network Embedding Algorithm Based on Q-learning and Curiosity-Driven", DOI 10.1109/TETC.2018.2871549, June 2018, . [MVNE2020] "Modeling on Virtual Network Embedding using Reinforcement Learning", DOI 10.1002/cpe.6020, September 2020, . Ullah, et al. Expires 23 October 2022 [Page 15] Internet-Draft ML-based Virtual Network Embedding April 2022 [MVNNML2021] Boutaba, Raouf., Shahriar, Nashid., A, Mohammad., and Noura. Limam, "Managing Virtualized Networks and Services with Machine Learning", DOI 48b8fc73c1609d4632d7db5e67e373a62a3cc1f6, January 2021, . [NeuroViNE2018] "NeuroViNE: A Neural Preprocessor for Your Virtual Network Embedding Algorithm", DOI 10.1109/INFOCOM.2018.8486263, June 2018, . [NFVDeep2019] Xiao, Y., Zhang, Q., Liu, F., Wang, J., Zhao, M., Zhang, Z., and J. Zhang, "NFVdeep: Adaptive Online Service Function Chain Deployment with Deep Reinforcement Learning", RFC 1129, DOI 10.1145/3326285.3329056, June 2019, . [NRRL2020] "Network Resource Allocation Strategy Based on Deep Reinforcement Learning", DOI 10.1109/OJCS.2020.3000330, June 2020, . [PPRL2020] "A Privacy-Preserving Reinforcement Learning Algorithm for Multi-Domain Virtual Network Embedding", DOI 10.1109/TNSM.2020.2971543, September 2020, . [QLDC2019] "A Q-Learning-Based Approach for Virtual Network Embedding in Data Center", DOI 10.1007/s00521-019-04376, July 2019, . [QVNE2020] Yuan, Y., Tian, Z., Wang, C., Zheng, F., and Y. Lv, "A Q- learning-Based Approach for Virtual Network Embedding in Data Center", DOI 10.1007/s00521-019-04376-6, July 2020, . [RDAM2018] "RDAM: A Reinforcement Learning Based Dynamic Attribute Matrix Representation for Virtual Network Embedding", DOI 10.1109/TETC.2018.2871549, September 2018, . Ullah, et al. Expires 23 October 2022 [Page 16] Internet-Draft ML-based Virtual Network Embedding April 2022 [RFC3552] Rescorla, E. and B. Korver, "Guidelines for Writing RFC Text on Security Considerations", BCP 72, RFC 3552, DOI 10.17487/RFC3552, July 2003, . [RFC5226] Narten, T. and H. Alvestrand, "Guidelines for Writing an IANA Considerations Section in RFCs", RFC 5226, DOI 10.17487/RFC5226, May 2008, . [RFC7364] Thomas, P.T., Eric, Y., David, A., Luyuan, A., Larry, A., and A. Maria Napierala, "Problem Statement: Overlays for Network Virtualization", October 2015, . [RLVNEWSN2020] "Reinforcement Learning for Virtual Network Embedding in Wireless Sensor Networks", DOI 10.1109/WiMob50308.2020.9253442, October 2020, . [SUR2018] Boutaba, Raouf., Salahuddin, Mohammad., Limam, Noura., Ayoubi, Sara., Shahriar, Nashid., Estrada-Solano, Felipe., and Oscar. M. Caicedo, "A Comprehensive survey on Machine Learning for Networking: Evolution, Applications and Research Opportunities", DOI 10.1186/s13174-018-0087-2, June 2018, . [VNEGCN2020] Yan, Z., Ge, J., Wu, Y., Li, L., and T. Li, "Automatic Virtual Network Embedding: A Deep Reinforcement Learning Approach With Graph Convolutional Networks", RFC 1129, DOI 10.1109/JSAC.2020.2986662, April 2020, . [VNEQS2021] Wang, Chao., Batth, Ranbir Singh., Zhang, Peiying., Aujla, Gagangeet., Duan, Youxiang., and Lihua. Ren, "VNE Solution for Network Differentiated QoS and Security Requirements: From the Perspective of Deep Reinforcement Learning", DOI 10.1007/s00607-020-00883-w, January 2021, . [VNESURV2013] Fischer, Fischer., Botero, Juan Felipe., Till Beck, Michael;., Karim, MM., De Meer, Hermann., and Xavier. Ullah, et al. Expires 23 October 2022 [Page 17] Internet-Draft ML-based Virtual Network Embedding April 2022 Hesselbach, "Virtual Network Embedding: A Survey", DOI 10.1109/SURV.2013.013013.00155, April 2020, . [VNETD2019] Wang, S., Bi, J., V.Vasilakos, A., and Q. Fan, "VNE-TD: A Virtual Network Embedding Algorithm Based on Temporal- Difference Learning", BCP 72, RFC 3552, DOI 10.1016/j.comnet.2019.05.004, October 2019, . [VNFFG2020] Anh Quang, P.T., Hadjadj-Aoul, Y., and A. Outtagarts, "Evolutionary Actor-Multi-Critic Model for VNF-FG Embedding", RFC 1129, DOI 10.1109/CCNC46108.2020.9045434, January 2020, . [ZTORCH2018] Sciancalepore, V., Chen, X., Yousaf, F. Z., and X. Costa- Perez, "Z-TORCH: An Automated NFV Orchestration and Monitoring Solution", BCP 72, RFC 3552, DOI 10.1109/TNSM.2018.2867827, August 2018, . Authors' Addresses Ihsan Ullah KOREATECH 1600, Chungjeol-ro, Byeongcheon-myeon, Dongnam-gu Cheonan Chungcheongnam-do 31253 Republic of Korea Email: ihsan@koreatech.ac.kr Youn-Hee Han KOREATECH 1600, Chungjeol-ro, Byeongcheon-myeon, Dongnam-gu Cheonan Chungcheongnam-do 31253 Republic of Korea Email: yhhan@koreatech.ac.kr Ullah, et al. Expires 23 October 2022 [Page 18] Internet-Draft ML-based Virtual Network Embedding April 2022 TaeYeon Kim ETRI 218 Gajeong-ro, Yuseong-gu Daejeon 34129 Republic of Korea Email: tykim@etri.re.kr Ullah, et al. Expires 23 October 2022 [Page 19]