| Internet-Draft | EPE-OAM | May 2022 | 
| Hegde, et al. | Expires 2 November 2022 | [Page] | 
Egress Peer Engineering (EPE) is an application of Segment Routing to Solve the problem of egress peer selection. The Segment Routing based BGP-EPE solution allows a centralized controller, e.g. a Software Defined Network (SDN) controller to program any egress peer. The EPE solution requires a node to program the PeerNode Segment Identifier(SID) describing a session between two nodes, the PeerAdj SID describing the link (one or more) that is used by sessions between peer nodes, and the PeerSet SID describing an arbitrary set of sessions or links between a local node and its peers. This document provides new sub-TLVs for EPE Segment Identifiers (SID) that would be used in the MPLS Target stack TLV (Type 1), in MPLS Ping and Traceroute procedures.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 2 November 2022.¶
Copyright (c) 2022 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
Egress Peer Engineering (EPE) as defined in [I-D.ietf-spring-segment-routing-central-epe] is an effective mechanism to select the egress peer link based on different criteria. The EPE-SIDs provide means to represent egress peer links. Many network deployments have built their networks consisting of multiple Autonomous Systems either for ease of operations or as a result of network mergers and acquisitons. The inter-AS links connecting the two Autonomous Systems could be traffic engineered using EPE-SIDs in this case as well.It is important to be able to validate the control plane to forwarding plane synchronization for these SIDs so that any anomaly can be detected easily by the operator.¶
+---------+ +------+ | | | | | H B------D G | | +---/| AS 2 |\ +------+ | |/ +------+ \ | |---L/8 A AS1 C---+ \| | | |\\ \ +------+ /| AS 4 |---M/8 | | \\ +-E |/ +------+ | X | \\ | K | | +===F AS 3 | +---------+ +------+
In this reference diagram, EPE-SIDs are advertised from AS1 to AS2 and AS3. In certain cases the EPE-SIDs advertised by the control plane may not be in synchronization with label programmed in data-plane. For example, on C a PeerAdj SID could be advertised to indicate it is for the link C->D. Due to some software anomaly the actual data forwarding on this PeerAdj SID could be happening over C->E link. If E had relevant data paths for further forwarding the packet, this kind of anomalies will go unnoticed by the operator. A FEC definition for the EPE-SIDs will define the details of the control plane association of the SID and the data plane validation of the SID will be done during the MPLS trace route procedure. When there is a multi-hop EBGP session between the ASBRs, PeerNode SID is advertised and traffic would be load-balanced between the interfaces connecting two nodes. In the reference diagram C and F could have a PeerNode-SID advertised. When the OAM packet is received on F, it needs to validate if the packet came on one of the two interfaces connected to C.¶
This document provides Target Forwarding Equivalence Class (FEC) stack TLV definitions for EPE-SIDs. Other procedures for MPLS Ping and Traceroute as defined in [RFC8287] section 7 and clarified by [RFC8690] are applicable for EPE-SIDs as well.¶
[I-D.ietf-idr-bgpls-segment-routing-epe] provides mechanisms to advertise the EPE-SIDs in BGP-LS. These EPE-SIDs may be used to build Segment Routing paths as described in [I-D.ietf-spring-segment-routing-policy] or using Path Computation Element Protocol (PCEP) extensions as defined in [RFC8664]. Data plane monitoring for such paths which consist of EPE-SIDs will use extensions defined in this document to build the Taget FEC stack TLV. The MPLS Ping and Traceroute procedures MAY be initaited by the head-end of the Segment Routing path or a centralized topology-aware data plane monitoring system as described in [RFC8403]. The extensions in [I-D.ietf-spring-segment-routing-policy] and [RFC8664] do not define the details of the SID and such extensions are out of scope for this document. The node initiating the data plane monitoring may acquire the details of EPE-SIDs through BGP-LS advertisements as described in [I-D.ietf-idr-bgpls-segment-routing-epe]. There may be other possible mechanisms to learn the definition of the SID from controller. Details of such mechanisms are out of scope for this document.¶
The EPE-SIDs are advertised for inter-AS links which run EBGP sessions. The procedures to operate EBGP sessions in a scenario with unnumbered interfaces is not very well defined and hence out of scope for this document. During AS migration scenario procedures described in [RFC7705] may be in force. In these scenarios, if the local and remote AS fields in the FEC as described in Section 4 carries the global AS and not the "local AS" as defined in [RFC7705], the FEC validation procedures may fail.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "NOT RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in BCP 14, [RFC2119], [RFC8174] when, and only when, they appear in all capitals, as shown here.¶
Three new sub-TLVs are defined for the Target FEC Stack TLV (Type 1), the Reverse-Path Target FEC Stack TLV (Type 16), and the Reply Path TLV (Type 21).¶
            Sub-Type    Sub-TLV Name
            --------  ---------------
             TBD1      PeerAdj SID Sub-TLV
             TBD2      PeerNode SID Sub-TLV
             TBD3      PeerSet SID Sub-TLV
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Type = TBD                     |          Length               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |               Local AS Number (4  octets)                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote As Number (4 octets)                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Local BGP router ID (4 octets)                   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote BGP Router ID (4 octets)                  |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Local Interface address (4/16 octets)            |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote Interface address (4/16 octets)           |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type : TBD¶
Length : variable based on IPV4/IPV6 interface address. Length excludes the length of Type and length field.For IPV4 interface addresses length will be 24. In case of IPV6 address length will be 48¶
Local AS Number :¶
4 octet unsigned integer representing the Member ASN inside the Confederation.[RFC5065]. The AS number corresponds to the AS to which PeerAdj SID advertising node belongs to.¶
Remote AS Number :¶
4 octet unsigned integer representing the Member ASN inside the Confederation.[RFC5065]. The AS number corresponds to the AS of the remote node for which the PeerAdj SID is advertised.¶
Local BGP Router ID :¶
4 octet unsigned integer of the advertising node representing the BGP Identifier as defined in [RFC4271] and [RFC6286].¶
Remote BGP Router ID :¶
4 octet unsigned integer of the receiving node representing the BGP Identifier as defined in [RFC4271] and [RFC6286].¶
Local Interface Address :¶
In case of PeerAdj SID Local interface address corresponding to the PeerAdj SID should be apecified in this field. For IPV4,this field is 4 octets; for IPV6, this field is 16 octets. Link Local IPV6 addresses are for further study.¶
Remote Interface Address :¶
In case of PeerAdj SID Remote interface address corresponding to the PeerAdj SID should be apecified in this field. For IPV4,this field is 4 octets; for IPV6, this field is 16 octets.Link Local IPv6 addresses are for further study.¶
[I-D.ietf-idr-bgpls-segment-routing-epe] mandates sending local interface ID and remote interface ID in the Link Descriptors and allows a value of 0 in the remote descriptors. It is useful to validate the incoming interface for a OAM packet and if the remote descriptor is 0 this validation is not possible. [I-D.ietf-idr-bgpls-segment-routing-epe] allows optional link descriptors of local and remote interface addresses as described in section 4.2. This document recommends sending these optional descriptors and use them to validate incoming interface. When these local and remote interface addresses are not available, an ingress node can send 0 in the local and/or remote interface address field. The receiver SHOULD skip the validation for the incoming interface if the address field contains 0.¶
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Type = TBD                     |          Length               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |               Local AS Number (4  octets)                     |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote As Number (4 octets)                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Local BGP router ID (4 octets)                   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote BGP Router ID (4 octets)                  |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Type : TBD¶
Length : 16¶
Local AS Number :¶
4 octet unsigned integer representing the Member ASN inside the Confederation.[RFC5065]. The AS number corresponds to the AS to which PeerNode SID advertising node belongs to.¶
Remote AS Number :¶
4 octet unsigned integer representing the Member ASN inside the Confederation.[RFC5065]. The AS number corresponds to the AS of the remote node for which the PeerNode SID is advertised.¶
Local BGP Router ID :¶
4 octet unsigned integer of the advertising node representing the BGP Identifier as defined in [RFC4271] and [RFC6286].¶
Remote BGP Router ID :¶
4 octet unsigned integer of the receiving node representing the BGP Identifier as defined in [RFC4271] and [RFC6286].¶
When there is a multi-hop EBGP session between two ASBRs, PeerNode SID is advertised for this session and traffic can be load balanced across these interfaces. An EPE controller that does bandiwdth management for these links should be aware of the links on which the traffic will be load-balanced. As per [RFC8029], the node advertising the EPE SIDs will send Downstream Detailed Mapping TLV (DDMT) specifying the details of nexthop interfaces, the OAM packet will be sent out. Based on this information controller MAY choose to verify the actual forwarding state with the topology information controller has. On the router, the validation procedures will include received DDMT validation as specified in [RFC8029] to verify the control and forwarding state synchronization on the two routers. Any descrepancies between controller's state and forwarding state will not be detected by the procedures described in the document.¶
        0                   1                   2                   3
        0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |Type = TBD                     |          Length               |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Local AS Number (4  octets)                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Local BGP router ID (4 octets)                   |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |   No.of elements in set       |          Reserved             |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote As Number (4 octets)                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote BGP Router ID (4 octets)                  |
       ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++
        One element in set consists of below details
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote As Number (4 octets)                      |
       +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
       |              Remote BGP Router ID (4 octets)                  |
       ++-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-++
Type : TBD¶
Length : variable based on the number of elements in the set. The length field does not include the length of Type and Length fields.¶
Local AS Number :¶
4 octet unsigned integer representing the Member ASN inside the Confederation.[RFC5065]. The AS number corresponds to the AS to which PeerSet SID advertising node belongs to.¶
Remote AS Number :¶
4 octet unsigned integer representing the Member ASN inside the Confederation.[RFC5065]. The AS number corresponds to the AS of the remote node for which the PeerSet SID is advertised.¶
Advertising BGP Router ID :¶
4 octet unsigned integer of the advertising node representing the BGP Identifier as defined in [RFC4271] and [RFC6286].¶
Receiving BGP Router ID :¶
4 octet unsigned integer of the receiving node representing the BGP Identifier as defined in [RFC4271] and [RFC6286].¶
No.of elements in set:¶
Number of remote ASes, the set SID load-balances on.¶
PeerSet SID may be associated with a number of PeerNode SIDs and PeerAdj SIDs. The remote AS number and the Router ID of each of these PeerNode SIDs PeerAdj SIDs MUST be included in the FEC.¶
When a remote ASBR of the EPE-SID advertisement receives the MPLS OAM packet with top FEC being the EPE-SID, it SHOULD perform validity checks on the content of the EPE-SID FEC sub-TLV. The basic length check should be performed on the received FEC.¶
 PeerAdj SID
 -----------
 Length = 24 or 48
 Peer Node SID
 -------------
 Length = 20 + “No.of IPv4 interface pairs” * 8 +
          “No.of IPv6 interface pairs ” * 32
 PeerSet SID
 -----------
 Length = 9 + no.of elements in the set *
          (8 + “No.of IPv4 interface pairs” * 8 +
           “No.of IPv6 interface pairs ” * 32)
If a malformed FEC sub-TLV is received, then a return code of 1, "Malformed echo request received" as defined in [RFC8029] SHOULD be sent. The below section augments the section 7.4 of [RFC8287]¶
      4a. Segment Routing EPE-SID Validation:
    If the Label-stack-depth is 0 and the Target FEC Stack sub-TLV
         at FEC-stack-depth is TBD1 (PeerAdj SID sub-TLV)
            Set the Best-return-code to 10, "Mapping for this FEC is not
            the given label at stack-depth  if any below
            conditions fail:
               o  Validate that the Receiving Node BGP Local AS matches
                  with the remote AS field in the received PeerAdj SID
                  FEC sub-TLV.
               o  Validate that the Receiving Node BGP Router-ID matches
                  with the Remote Router ID field in the received
                  PeerAdj SID FEC.
               o  Validate that there is a EBGP session with a peer
                  having local As number and BGP Router-ID as
                  specified in the Local AS number and Local Router-ID
                  field in the received PeerAdj SID FEC sub-TLV.
            If the Remote interface address is not zero, validate the
            incoming interface.
            Set the Best-return-code to 35 "Mapping for this FEC is not
            associated with the incoming interface"  (RFC8287) if any below
            conditions fail:
               o  Validate the incoming interface on which the OAM packet
                  was receieved, matches with the remote interface
                  specified in the PeerAdj SID FEC sub-TLV
            If all above validations have passed, set the return code to 3
            "Replying router is an egress for the FEC at stack-depth"
    Else, if the Target FEC sub-TLV at FEC-stack-depth is TBD2
         (PeerNode SID sub-TLV),
            Set the Best-return-code to 10, "Mapping for this FEC is not
            the given label at stack-depth  if any below
            conditions fail:
               o  Validate that the Receiving Node BGP Local AS matches with
                  the remote AS field in the
                  received PeerNode SID FEC sub-TLV.
               o  Validate that the Receiving Node BGP Router-ID matches
                  with the Remote Router ID field in the received
                  PeerNode SID FEC.
               o  Validate that there is a EBGP session with a peer
                  having local As number and BGP Router-ID as
                  specified in the Local AS number and Local Router-ID
                  field in the received PeerNode SID FEC sub-TLV.
            If all above validations have passed, set the return code to 3
            "Replying router is an egress for the FEC at stack-depth"
    Else, if the Target FEC sub-TLV at FEC-stack-depth is TBD3
         (PeerSet SID sub-TLV),
            Set the Best-return-code to 10, "Mapping for this FEC is not
            the given label at stack-depth"  if any below
            conditions fail:
               o  Validate that the Receiving Node BGP Local AS matches
                  with one of the remote AS field in the received PeerSet
                  SID FEC sub-TLV.
               o  Validate that the Receiving Node BGP Router-ID matches
                  with one of the  Remote Router ID field in the received
                  PeerSet SID FEC sub-TLV.
               o  Validate that there is a EBGP session with a peer having
                  local As number and BGP Router-ID as
                  specified in the Local AS number and Local Router-ID
                  field in the received PeerSet SID FEC sub-TLV.
            If all above validations have passed, set the return code to 3
            "Replying router is an egress for the FEC at stack-depth"
IANA is requested to allocated three new Target FEC stack sub-TLVs from the "Sub-TLVs for TLV types 1,16 and 21" subregistry in the "TLVs" registry of the "Multi-Protocol Label switching (MPLS) Label Switched Paths (LSPs) Ping parameters" namespace.¶
The three lowest free values from the Standard Tracks range should be allocated if possible.¶
The EPE-SIDs are advertised for egress links for Egress Peer Engineering purposes or for inter-As links between co-operating ASes. When co-operating domains are involved, they can allow the packets arriving on trusted interfaces to reach the control plane and get processed. When EPE-SIDs which are created for egress TE links where the neighbor AS is an independent entity, it may not allow packets arriving from external world to reach the control plane. In such deployments MPLS OAM packets will be dropped by the neighboring AS that receives the MPLS OAM packet. In MPLS traceroute applications, when the AS boundary is crossed with the EPE-SIDs, the FEC stack is changed. [RFC8287] does not mandate that the initiator upon receiving an MPLS Echo Reply message that includes the FEC Stack Change TLV with one or more of the original segments being popped remove a corresponding FEC(s) from the Target FEC Stack TLV in the next (TTL+1) traceroute request. If an initiator does not remove the FECs belonging to the previous AS that has traversed, it MAY expose the internal AS information to the following AS being traversed in traceroute.¶
Thanks to Loa Andersson, Dhruv Dhody, Ketan Talaulikar, Italo Busi and Alexander Vainshtein, Deepti Rathi for careful review and comments.¶