rfc9692v2.txt | rfc9692.txt | |||
---|---|---|---|---|
skipping to change at line 12 ¶ | skipping to change at line 12 ¶ | |||
Internet Engineering Task Force (IETF) T. Przygienda, Ed. | Internet Engineering Task Force (IETF) T. Przygienda, Ed. | |||
Request for Comments: 9692 J. Head, Ed. | Request for Comments: 9692 J. Head, Ed. | |||
Category: Standards Track Juniper Networks | Category: Standards Track Juniper Networks | |||
ISSN: 2070-1721 A. Sharma | ISSN: 2070-1721 A. Sharma | |||
Hudson River Trading | Hudson River Trading | |||
P. Thubert | P. Thubert | |||
B. Rijsman | B. Rijsman | |||
Individual | Individual | |||
D. Afanasiev | D. Afanasiev | |||
Yandex | Yandex | |||
December 2024 | January 2025 | |||
RIFT: Routing in Fat Trees | RIFT: Routing in Fat Trees | |||
Abstract | Abstract | |||
This document defines a specialized, dynamic routing protocol for | This document defines a specialized, dynamic routing protocol for | |||
Clos, Fat Tree, and variants thereof. These topologies were | Clos, fat tree, and variants thereof. These topologies were | |||
initially used within crossbar interconnects and consequently router | initially used within crossbar interconnects and consequently router | |||
and switch backplanes, but their characteristics make them ideal for | and switch backplanes, but their characteristics make them ideal for | |||
constructing IP fabrics as well. The protocol specified by this | constructing IP fabrics as well. The protocol specified by this | |||
document is optimized towards the minimization of control plane state | document is optimized towards the minimization of control plane state | |||
to support very large substrates as well as the minimization of | to support very large substrates as well as the minimization of | |||
configuration and operational complexity to allow for a simplified | configuration and operational complexity to allow for a simplified | |||
deployment of said topologies. | deployment of said topologies. | |||
Status of This Memo | Status of This Memo | |||
skipping to change at line 44 ¶ | skipping to change at line 44 ¶ | |||
received public review and has been approved for publication by the | received public review and has been approved for publication by the | |||
Internet Engineering Steering Group (IESG). Further information on | Internet Engineering Steering Group (IESG). Further information on | |||
Internet Standards is available in Section 2 of RFC 7841. | Internet Standards is available in Section 2 of RFC 7841. | |||
Information about the current status of this document, any errata, | Information about the current status of this document, any errata, | |||
and how to provide feedback on it may be obtained at | and how to provide feedback on it may be obtained at | |||
https://www.rfc-editor.org/info/rfc9692. | https://www.rfc-editor.org/info/rfc9692. | |||
Copyright Notice | Copyright Notice | |||
Copyright (c) 2024 IETF Trust and the persons identified as the | Copyright (c) 2025 IETF Trust and the persons identified as the | |||
document authors. All rights reserved. | document authors. All rights reserved. | |||
This document is subject to BCP 78 and the IETF Trust's Legal | This document is subject to BCP 78 and the IETF Trust's Legal | |||
Provisions Relating to IETF Documents | Provisions Relating to IETF Documents | |||
(https://trustee.ietf.org/license-info) in effect on the date of | (https://trustee.ietf.org/license-info) in effect on the date of | |||
publication of this document. Please review these documents | publication of this document. Please review these documents | |||
carefully, as they describe your rights and restrictions with respect | carefully, as they describe your rights and restrictions with respect | |||
to this document. Code Components extracted from this document must | to this document. Code Components extracted from this document must | |||
include Revised BSD License text as described in Section 4.e of the | include Revised BSD License text as described in Section 4.e of the | |||
Trust Legal Provisions and are provided without warranty as described | Trust Legal Provisions and are provided without warranty as described | |||
skipping to change at line 114 ¶ | skipping to change at line 114 ¶ | |||
6.7.6. Resulting Topologies | 6.7.6. Resulting Topologies | |||
6.8. Further Mechanisms | 6.8. Further Mechanisms | |||
6.8.1. Route Preferences | 6.8.1. Route Preferences | |||
6.8.2. Overload Bit | 6.8.2. Overload Bit | |||
6.8.3. Optimized Route Computation on Leaves | 6.8.3. Optimized Route Computation on Leaves | |||
6.8.4. Mobility | 6.8.4. Mobility | |||
6.8.5. Key/Value (KV) Store | 6.8.5. Key/Value (KV) Store | |||
6.8.6. Interactions with BFD | 6.8.6. Interactions with BFD | |||
6.8.7. Fabric Bandwidth Balancing | 6.8.7. Fabric Bandwidth Balancing | |||
6.8.8. Label Binding | 6.8.8. Label Binding | |||
6.8.9. Leaf-to-Leaf Procedures | 6.8.9. L2L Procedures | |||
6.8.10. Address Family and Multi-Topology Considerations | 6.8.10. Address Family and Multi-Topology Considerations | |||
6.8.11. One-Hop Healing of Levels with East-West Links | 6.8.11. One-Hop Healing of Levels with East-West Links | |||
6.9. Security | 6.9. Security | |||
6.9.1. Security Model | 6.9.1. Security Model | |||
6.9.2. Security Mechanisms | 6.9.2. Security Mechanisms | |||
6.9.3. Security Envelope | 6.9.3. Security Envelope | |||
6.9.4. Weak Nonces | 6.9.4. Weak Nonces | |||
6.9.5. Lifetime | 6.9.5. Lifetime | |||
6.9.6. Security Association Changes | 6.9.6. Security Association Changes | |||
7. Information Elements Schema | 7. Information Elements Schema | |||
skipping to change at line 202 ¶ | skipping to change at line 202 ¶ | |||
Acknowledgments | Acknowledgments | |||
Contributors | Contributors | |||
Authors' Addresses | Authors' Addresses | |||
1. Introduction | 1. Introduction | |||
Clos [CLOS] topologies have gained prominence in today's networking, | Clos [CLOS] topologies have gained prominence in today's networking, | |||
primarily as a result of the paradigm shift towards a centralized | primarily as a result of the paradigm shift towards a centralized | |||
data center architecture that is poised to deliver a majority of | data center architecture that is poised to deliver a majority of | |||
computation and storage services in the future. Such networks are | computation and storage services in the future. Such networks are | |||
commonly called a Fat Tree / network in modern IP fabric | commonly called a fat tree / network in modern IP fabric | |||
considerations [VAHDAT08] as a homonym to the original definition of | considerations [VAHDAT08] as a similar term for the original | |||
the term [FATTREE]. In most generic terms, and disregarding | definition of the term Fat Tree [FATTREE]. In most generic terms, | |||
exceptions like horizontal shortcuts, those networks are all | and disregarding exceptions like horizontal shortcuts, those networks | |||
variations of a structured design isomorphic to a ranked lattice | are all variations of a structured design isomorphic to a ranked | |||
where the least upper bound is the "top of the fabric" and links | lattice where the least upper bound is the "top of the fabric" and | |||
closer to the top may be "fatter" to guarantee non-blocking | links closer to the top may be "fatter" to guarantee non-blocking | |||
bisectional capacity. | bisectional capacity. | |||
Many builders of such IP fabrics desire a protocol that | Many builders of such IP fabrics desire a protocol that | |||
autoconfigures itself and deals with failures and misconfigurations | autoconfigures itself and deals with failures and misconfigurations | |||
with a minimum amount of human intervention. Such a solution would | with a minimum amount of human intervention. Such a solution would | |||
allow local IP fabric bandwidth to be consumed in a "standard | allow local IP fabric bandwidth to be consumed in a "standard | |||
component" fashion, i.e., provision it much faster and operate it at | component" fashion, i.e., provision it much faster and operate it at | |||
much lower costs than today, much like compute or storage is consumed | much lower costs than today, similar to how compute or storage is | |||
already. | consumed already. | |||
In looking at the problem through the lens of such IP fabric | In looking at the problem through the lens of such IP fabric | |||
requirements, Routing in Fat Trees (RIFT) addresses those challenges | requirements, Routing in Fat Trees (RIFT) addresses those challenges | |||
not through an incremental modification of either a link-state | not through an incremental modification of either a link-state | |||
(distributed computation) or distance-vector (diffused computation) | (distributed computation) or distance-vector (diffused computation) | |||
technique but rather a mixture of both, briefly described as "link- | technique but rather a mixture of both, briefly described as "link- | |||
state towards the spines" and "distance vector towards the leaves". | state towards the spines" and "distance vector towards the leaves". | |||
In other words, "bottom" levels are flooding their link-state | In other words, "bottom" levels are flooding their link-state | |||
information in the "northern" direction while each node generates | information in the "northern" direction while each node generates | |||
under normal conditions a "default route" and floods it in the | under normal conditions a "default route" and floods it in the | |||
"southern" direction. This type of protocol naturally supports | "southern" direction. This type of protocol naturally supports | |||
highly desirable address aggregation. Alas, such aggregation could | highly desirable address aggregation. Alas, such aggregation could | |||
drop traffic in cases of misconfiguration or while failures are being | drop traffic in cases of misconfiguration or while failures are being | |||
resolved or even cause persistent network partitioning and this has | resolved. It could also cause persistent network partitioning, which | |||
to be addressed by some adequate mechanism. The approach RIFT takes | has to be addressed by some adequate mechanism. The approach RIFT | |||
is described in Section 6.5 and is based on automatic, sufficient | takes is described in Section 6.5 and is based on automatic, | |||
disaggregation of prefixes in case of link and node failures. | sufficient disaggregation of prefixes in case of link and node | |||
failures. | ||||
The protocol further provides: | The protocol further provides: | |||
* optional fully automated construction of Fat Tree topologies based | * optional fully automated construction of fat tree topologies based | |||
on detection of links without any configuration (Section 6.7) | on detection of links without any configuration (Section 6.7) | |||
while allowing for conventional configuration methods or an | while allowing for conventional configuration methods or an | |||
arbitrary mix of both, | arbitrary mix of both, | |||
* the minimum amount of routing state held by nodes, | * the minimum amount of routing state held by nodes, | |||
* automatic pruning and load balancing of topology flooding | * automatic pruning and load balancing of topology flooding | |||
exchanges over a sufficient subset of links (Section 6.3.9), | exchanges over a sufficient subset of links (Section 6.3.9), | |||
* automatic address aggregation (Section 6.3.8) and consequently | * automatic address aggregation (Section 6.3.8) and consequently | |||
skipping to change at line 337 ¶ | skipping to change at line 338 ¶ | |||
on their level of interest. The authors recommend reading the HTML | on their level of interest. The authors recommend reading the HTML | |||
or PDF versions of this document due to the inherent limitation of | or PDF versions of this document due to the inherent limitation of | |||
text version to represent complex figures. | text version to represent complex figures. | |||
The "Terminology" (Section 3.1) section should be used as a | The "Terminology" (Section 3.1) section should be used as a | |||
supporting reference as the document is read. | supporting reference as the document is read. | |||
The indications of direction (i.e., "top", "bottom", etc.) referenced | The indications of direction (i.e., "top", "bottom", etc.) referenced | |||
in Section 1 are of paramount importance. RIFT requires a topology | in Section 1 are of paramount importance. RIFT requires a topology | |||
with a sense of top and bottom in order to properly achieve a sorted | with a sense of top and bottom in order to properly achieve a sorted | |||
topology. Clos, Fat Tree, and other similarly structured networks | topology. Clos, fat tree, and other similarly structured networks | |||
are conducive to such requirements. Where RIFT allows for further | are conducive to such requirements. Where RIFT allows for further | |||
relaxation of these constraints will be mentioned later in this | relaxation of these constraints will be mentioned later in this | |||
section. | section. | |||
Several of the images in this document are annotated with "northern | Several of the images in this document are annotated with "northern | |||
view" or "southern view" to indicate perspective to the reader. A | view" or "southern view" to indicate perspective to the reader. A | |||
"northern view" should be interpreted as "from the top of the fabric | "northern view" should be interpreted as "from the top of the fabric | |||
looking down", whereas "southern view" should be interpreted as "from | looking down", whereas "southern view" should be interpreted as "from | |||
the bottom looking up". | the bottom looking up". | |||
skipping to change at line 455 ¶ | skipping to change at line 456 ¶ | |||
different algorithms whether the link should be included. | different algorithms whether the link should be included. | |||
Bow-tying: | Bow-tying: | |||
Traffic patterns in fully converged IP fabrics normally traverse | Traffic patterns in fully converged IP fabrics normally traverse | |||
the shortest route based on hop count towards their destination | the shortest route based on hop count towards their destination | |||
(e.g., leaf, spine, leaf). Some failure scenarios with partial | (e.g., leaf, spine, leaf). Some failure scenarios with partial | |||
routing information cause nodes to lose the required downstream | routing information cause nodes to lose the required downstream | |||
reachability to a destination and force traffic to utilize routes | reachability to a destination and force traffic to utilize routes | |||
that traverse higher levels in the fabric in order to turn south | that traverse higher levels in the fabric in order to turn south | |||
again using a different route to resolve reachability (e.g., leaf, | again using a different route to resolve reachability (e.g., leaf, | |||
spine-1, super-spine, spine-2, leaf). | spine-1, superspine, spine-2, leaf). | |||
Clos / Fat Tree: | Clos / fat tree: | |||
This document uses the terms "Clos" and "Fat Tree" interchangeably | This document uses the terms "Clos" and "fat tree" interchangeably | |||
where it always refers to a folded spine-and-leaf topology with | where it always refers to a folded spine-and-leaf topology with | |||
possibly multiple Points of Delivery (PoDs) and one or multiple | possibly multiple Points of Delivery (PoDs) and one or multiple | |||
Top of Fabric (ToF) planes. Several modifications such as leaf- | Top of Fabric (ToF) planes. Several modifications such as L2L | |||
to-leaf shortcuts and shortcuts that span multiple levels are | shortcuts and multi-level shortcuts are possible and described | |||
possible and described further in the document. | further in the document. | |||
Cost: | Cost: | |||
A natural number without a unit associated with two entities. The | A natural number without the unit associated with two entities. | |||
usual natural numbers algebra can be applied to costs. A cost may | The cost is a monoid under addition. A cost may be associated | |||
be associated with either a single link or prefix, or it may | with either a single link or prefix, or it may represent the sum | |||
represent the sum of costs (distance) of links in the path between | of costs (distance) of links in the path between two nodes. | |||
two nodes. | ||||
Crossbar: | Crossbar: | |||
Physical arrangement of ports in a switching matrix without | Physical arrangement of ports in a switching matrix without | |||
implying any further scheduling or buffering disciplines. | implying any further scheduling or buffering disciplines. | |||
Directed Acyclic Graph (DAG): | Directed Acyclic Graph (DAG): | |||
A finite directed graph with no directed cycles (loops). If links | A finite directed graph with no directed cycles (loops). If links | |||
in a Clos are considered as either being all directed towards the | in a Clos are considered as either being all directed towards the | |||
top or vice versa, each of two such graphs is a DAG. | top or vice versa, each of two such graphs is a DAG. | |||
skipping to change at line 495 ¶ | skipping to change at line 495 ¶ | |||
is performed to prevent traffic loss and suboptimal routing to the | is performed to prevent traffic loss and suboptimal routing to the | |||
more specific prefixes. | more specific prefixes. | |||
Distance: | Distance: | |||
The sum of costs (bound by the infinite cost constant) between two | The sum of costs (bound by the infinite cost constant) between two | |||
nodes. A distance is primarily used to express separation between | nodes. A distance is primarily used to express separation between | |||
two entities and can be used again as cost in another context. | two entities and can be used again as cost in another context. | |||
East-West (E-W) Link: | East-West (E-W) Link: | |||
A link between two nodes at the same level. East-West links are | A link between two nodes at the same level. East-West links are | |||
normally not part of Clos or Fat Tree topologies. | normally not part of Clos or fat tree topologies. | |||
Flood Repeater (FR): | Flood Repeater (FR): | |||
A node can designate one or more northbound neighbor nodes to be | A node can designate one or more northbound neighbor nodes to be | |||
flood repeaters. The flood repeaters are responsible for flooding | flood repeaters. The flood repeaters are responsible for flooding | |||
northbound TIEs further north. The document sometimes calls them | northbound TIEs further north. The document sometimes calls them | |||
flood leaders as well. | flood leaders as well. | |||
Folded Spine-and-Leaf: | Folded Spine-and-Leaf: | |||
In case the Clos fabric input and output stages are equivalent, | In case the Clos fabric input and output stages are equivalent, | |||
the fabric can be "folded" to build a "superspine" or top, which | the fabric can be "folded" to build a "superspine" or top, which | |||
skipping to change at line 525 ¶ | skipping to change at line 525 ¶ | |||
Leaf-to-Leaf (L2L) Shortcuts: | Leaf-to-Leaf (L2L) Shortcuts: | |||
East-West links at leaf level will need to be differentiated from | East-West links at leaf level will need to be differentiated from | |||
East-West links at other levels. | East-West links at other levels. | |||
Leaf: | Leaf: | |||
A node without southbound adjacencies. Level 0 implies a leaf in | A node without southbound adjacencies. Level 0 implies a leaf in | |||
RIFT, but a leaf does not have to be level 0. | RIFT, but a leaf does not have to be level 0. | |||
Level: | Level: | |||
Clos and Fat Tree networks are topologically partially ordered | Clos and fat tree networks are topologically partially ordered | |||
graphs, and "level" denotes the set of nodes at the same height in | graphs, and "level" denotes the set of nodes at the same height in | |||
such a network. Nodes at the top level (i.e., ToF) are at the | such a network. Nodes at the top level (i.e., ToF) are at the | |||
level with the highest value and count down to the nodes at the | level with the highest value and count down to the nodes at the | |||
bottom level (i.e., leaf) with the lowest value. A node will have | bottom level (i.e., leaf) with the lowest value. A node will have | |||
links to nodes one level down and/or one level up. In some | links to nodes one level down and/or one level up. In some | |||
circumstances, a node may have links to other nodes at the same | circumstances, a node may have links to other nodes at the same | |||
level. A leaf node may also have links to nodes multiple levels | level. A leaf node may also have links to nodes multiple levels | |||
higher. In RIFT, level 0 always indicates that a node is a leaf | higher. In RIFT, level 0 always indicates that a node is a leaf | |||
but does not have to be level 0. Level values can be configured | but does not have to be level 0. Level values can be configured | |||
manually or automatically as described in Section 6.7. As a final | manually or automatically as described in Section 6.7. | |||
footnote: Clos terminology often uses the concept of "stage", but | ||||
due to the folded nature of the Fat Tree, it is not used from this | | As a final footnote: Clos terminology often uses the concept | |||
point on to prevent misunderstandings. | | of "stage", but due to the folded nature of the fat tree, it | |||
| is not used from this point on to prevent misunderstandings. | ||||
LIE: | LIE: | |||
This is an acronym for a "Link Information Element" exchanged on | This is an acronym for a "Link Information Element" exchanged on | |||
all the system's links running RIFT to form _ThreeWay_ adjacencies | all the system's links running RIFT to form _ThreeWay_ adjacencies | |||
and carry information used to perform RIFT Zero Touch Provisioning | and carry information used to perform RIFT Zero Touch Provisioning | |||
(ZTP) of levels. | (ZTP) of levels. | |||
Metric: | Metric: | |||
Used interchangeably with "cost". | Used interchangeably with "cost". | |||
skipping to change at line 584 ¶ | skipping to change at line 585 ¶ | |||
Northbound Representation: | Northbound Representation: | |||
The subset of topology information flooded towards higher levels | The subset of topology information flooded towards higher levels | |||
of the fabric. | of the fabric. | |||
Overloaded: | Overloaded: | |||
Applies to a node advertising the _overload_ attribute as set. | Applies to a node advertising the _overload_ attribute as set. | |||
The overload attribute is carried in the _NodeFlags_ object of the | The overload attribute is carried in the _NodeFlags_ object of the | |||
encoding schema. | encoding schema. | |||
Point of Delivery (PoD): | Point of Delivery (PoD): | |||
A self-contained vertical slice or subset of a Clos or Fat Tree | A self-contained vertical slice or subset of a Clos or fat tree | |||
network normally containing only level 0 and level 1 nodes. A | network normally containing only level 0 and level 1 nodes. A | |||
node in a PoD communicates with nodes in other PoDs via the ToF | node in a PoD communicates with nodes in other PoDs via the ToF | |||
nodes. PoDs are numbered to distinguish them, and PoD value 0 | nodes. PoDs are numbered to distinguish them, and PoD value 0 | |||
(defined later in the encoding schema as _common.default_pod_) is | (defined later in the encoding schema as _common.default_pod_) is | |||
used to denote "undefined" or "any" PoD. | used to denote "undefined" or "any" PoD. | |||
Prefix TIE: | Prefix TIE: | |||
This is an acronym for a "Prefix Topology Information Element", | This is an acronym for a "Prefix Topology Information Element", | |||
and it contains all prefixes directly attached to this node in | and it contains all prefixes directly attached to this node in | |||
case of a North TIE and the necessary default routes the node | case of a North TIE and the necessary default routes the node | |||
skipping to change at line 607 ¶ | skipping to change at line 608 ¶ | |||
Radix: | Radix: | |||
A radix of a switch is the number of switching ports it provides. | A radix of a switch is the number of switching ports it provides. | |||
It's sometimes called "fanout" as well. | It's sometimes called "fanout" as well. | |||
Routing on the Host (RotH): | Routing on the Host (RotH): | |||
A modern data center architecture variant where servers/leaves are | A modern data center architecture variant where servers/leaves are | |||
multihomed and consequently participate in routing. | multihomed and consequently participate in routing. | |||
Security Envelope: | Security Envelope: | |||
RIFT packets are flooded within an authenticated security envelope | RIFT packets are flooded within an authenticated security envelope | |||
that allows to protect the integrity of information a node accepts | that optionally enables protection of the integrity of information | |||
if any of the mechanisms in Section 10.2 are used. This is | a node accepts if any of the mechanisms in Section 10.2 are used. | |||
further described in Section 6.9.3. | This is further described in Section 6.9.3. | |||
Shortest Path First (SPF): | Shortest Path First (SPF): | |||
A well-known graph algorithm attributed to Dijkstra [DIJKSTRA] | A well-known graph algorithm attributed to Dijkstra [DIJKSTRA] | |||
that establishes a tree of shortest paths from a source to | that establishes a tree of shortest paths from a source to | |||
destinations on the graph. The SPF acronym is used due to its | destinations on the graph. The SPF acronym is used due to its | |||
familiarity as a general term for the node reachability | familiarity as a general term for the node reachability | |||
calculations RIFT can employ to ultimately calculate routes, of | calculations RIFT can employ to ultimately calculate routes, of | |||
which Dijkstra's algorithm is a possible one. | which Dijkstra's algorithm is a possible one. | |||
South Reflection: | South Reflection: | |||
skipping to change at line 633 ¶ | skipping to change at line 634 ¶ | |||
aware of each other's node Topology Information Elements (TIEs). | aware of each other's node Topology Information Elements (TIEs). | |||
South SPF (S-SPF): | South SPF (S-SPF): | |||
A reachability calculation that is progressing southbound, for | A reachability calculation that is progressing southbound, for | |||
example, SPF that is using North Node TIEs only. | example, SPF that is using North Node TIEs only. | |||
South/Southbound and North/Northbound (Direction): | South/Southbound and North/Northbound (Direction): | |||
When describing protocol elements and procedures, in different | When describing protocol elements and procedures, in different | |||
situations, the directionality of the compass is used, i.e., | situations, the directionality of the compass is used, i.e., | |||
"lower", "south", and "southbound" mean moving towards the bottom | "lower", "south", and "southbound" mean moving towards the bottom | |||
of the Clos or Fat Tree network and "higher", "north", and | of the Clos or fat tree network and "higher", "north", and | |||
"northbound" mean moving towards the top of the Clos or Fat Tree | "northbound" mean moving towards the top of the Clos or fat tree | |||
network. | network. | |||
Southbound Link: | Southbound Link: | |||
A link to a node one level down or, in other words, one level | A link to a node one level down or, in other words, one level | |||
further south. | further south. | |||
Southbound Representation: | Southbound Representation: | |||
The subset of topology information sent towards a lower level. | The subset of topology information sent towards a lower level. | |||
Spine: | Spine: | |||
Any nodes north of leaves and south of ToF nodes. Multiple layers | Any nodes north of leaves and south of ToF nodes. Multiple layers | |||
of spines in a PoD are possible. | of spines in a PoD are possible. | |||
Superspine, Aggregation/Spine, and Edge/Leaf Switches: | Superspine, Aggregation/Spine, and Edge/Leaf Switches: | |||
Traditional level names in 5 stages folded Clos for levels 2, 1, | Typical level names in 5 stages folded Clos for levels 2, 1, and | |||
and 0, respectively (counting up from the bottom). We normalize | 0, respectively (counting up from the bottom). We normalize this | |||
this language to talk about ToF, Top-of-Pod (ToP), and leaves. | language to talk about ToF, Top-of-Pod (ToP), and leaves. | |||
System ID: | System ID: | |||
RIFT nodes identify themselves with a unique network-wide number | RIFT nodes identify themselves with a unique network-wide number | |||
when trying to build adjacencies or describe their topology. RIFT | when trying to build adjacencies or describe their topology. RIFT | |||
System IDs can be auto-derived or configured. | System IDs can be auto-derived or configured. | |||
ThreeWay Adjacency: | ThreeWay Adjacency: | |||
RIFT tries to form a unique adjacency between two nodes over a | RIFT tries to form a unique adjacency between two nodes over a | |||
point-to-point interface and exchange local configuration and | point-to-point interface and exchange local configuration and | |||
necessary RIFT ZTP information. An adjacency is only advertised | necessary RIFT ZTP information. An adjacency is only advertised | |||
skipping to change at line 773 ¶ | skipping to change at line 774 ¶ | |||
|Leaf111+~~~~~~~~~~+Leaf112| |Leaf121| |Leaf122| | |Leaf111+~~~~~~~~~~+Leaf112| |Leaf121| |Leaf122| | |||
+-+-----+ +-+---+-+ +--+--+-+ +-+-----+ | +-+-----+ +-+---+-+ +--+--+-+ +-+-----+ | |||
+ + \ / + + | + + \ / + + | |||
Prefix111 Prefix112 \ / Prefix121 Prefix122 | Prefix111 Prefix112 \ / Prefix121 Prefix122 | |||
multihomed | multihomed | |||
Prefix | Prefix | |||
+---------- PoD 1 ---------+ +---------- PoD 2 ---------+ | +---------- PoD 1 ---------+ +---------- PoD 2 ---------+ | |||
Figure 2: A Three-Level Spine-and-Leaf Topology | Figure 2: A Three-Level Spine-and-Leaf Topology | |||
____________________________________________________________________________ | ||||
| [Plane A] . [Plane B] . [Plane C] . [Plane D] | | ||||
|..........................................................................| | ||||
| +-+ . +-+ . +-+ . +-+ | | ||||
| |n| . |n| . |n| . |n| | | ||||
| +++ . +++ . +++ . +++ | | ||||
| . | | . . | | . . | | . . | | | | ||||
| . | | . . | | . . | | . . | | | | ||||
| +-+ | | . +-+ | | . +-+ | | . +-+ | | | | ||||
| |1| +-+ | . |1| +-+ | . |1| +-+ | . |1| +-+ | | | ||||
| +++ | | . +++ | | . +++ | | . +++ | | | | ||||
| || | | . || | | . || | | . || | | | | ||||
| || | | . || | | . || | | . || | | | | ||||
| |+--|--+| . |+--|--+| . |+--|--+| . |+--|----+ | | ||||
| | | || . | | || . | | || . | | || | | ||||
| | | || . | | || . | | || . | | +|---+ | | ||||
=====|===|==||=========|===|==||=========|===|==||=========|===|====|===|=== | | ||||
/ | | | || . | | || . | | || . | | / | | / | | ||||
/ | | | || . | | || . | | || . | | / ++---++ / | | ||||
/ | | | || . | | || . | | || . | | / | n | / | | ||||
/ | | | || . | | || . | | || . | | / +++-+++ / | | ||||
/ | ++---++ || . ++---++ || . ++---++ || . ++---++/ / | | ||||
/ | | 1 | || . | 2 | || . | 3 | || . | 4 |/ / | | ||||
/ | +++-+++ || . +++-+++ || . +++-+++ || . +++-+++/ / | | ||||
/ | || || || . || || || . || || || . || || / / | | ||||
/ \__||_||_____________||_||_____________||_||_____________||_||_/_________/_/ | ||||
/ || || || || || || || || / || || / | ||||
/ || || +-----------+| || || || || || / || || / | ||||
/ || || |+-----------|-||-------------+| || || || / || || / | ||||
/ || || ||+----------|-||--------------|-||-------------+| || / || || / | ||||
/ || || ||| | || | || +-------+ || / || || / | ||||
/ || || ||| | |+--------------|-||------|---+ || / || || / | ||||
/ || || ||| | | | || | | +-+| / || || / | ||||
/ || || ||| | +-----------+ | || | | | | / || || / | ||||
/ || +|-|||----------|------------+| | |+------|---|---|-+| / || || / | ||||
/ || +-|||----------|------------||---|-|-------|-+ | | || / || || / | ||||
/ || ||| | +------||---+ | | | | | || / || || / | ||||
/ |+----|||-----+ | |+-----||-----|-------+ | | | || / || || / | ||||
/ | ||| | | || || | | | | || / || || / | ||||
/ | ||| | | || || | +----|-|---+ || / || || / | ||||
/ | ||| | | || || | | | | || / || || / | ||||
/ |+----+|| | | || || | | | | || / || || / | ||||
/ || +---+| | | +---+| |+---+ | | | +---+ || / +++-+++ / | ||||
/ || |+---+ +---+| |+---+ +---+| |+---+ +----+| || / | n | / | ||||
/ || || || || || || || || / +++-+++ / | ||||
/ +++-+++ +++-+++ +++-+++ +++-+++/=========/ | ||||
/ | 1 | | 2 + | 3 | . . . | n |/ ^^ | ||||
/ +++-+++ +-----+ +-----+ +-----+/ // | ||||
/ / PoDs | ||||
================================================================== // | ||||
Figure 3: Topology with Multiple Planes | ||||
The topology in Figure 2 is referred to in all further | The topology in Figure 2 is referred to in all further | |||
considerations. This figure depicts a generic "single plane Fat | considerations. This figure depicts a generic "single-plane fat | |||
Tree" and the concepts explained using three levels apply by | tree" and the concepts explained using three levels apply by | |||
induction to further levels and higher degrees of connectivity. | induction to further levels and higher degrees of connectivity. | |||
(Artwork only available as SVG: see | ||||
https://www.rfc-editor.org/rfc/rfc9692.html) | ||||
Figure 3: Topology with Multiple Planes | ||||
Further, this document will also deal with designs that provide only | Further, this document will also deal with designs that provide only | |||
sparser connectivity and "partitioned spines", as shown in Figure 3 | sparser connectivity and "partitioned spines", as shown in Figure 3 | |||
and explained further in Section 5.2. | and explained further in Section 5.2. | |||
4. RIFT: Routing in Fat Trees | 4. RIFT: Routing in Fat Trees | |||
The remainder of this document presents the detailed specification of | The remainder of this document presents the detailed specification of | |||
the RIFT protocol, which in the most abstract terms has many | the RIFT protocol, which in the most abstract terms has many | |||
properties of a modified link-state protocol when distributing | properties of a modified link-state protocol when distributing | |||
information northbound and a distance-vector protocol when | information northbound and a distance-vector protocol when | |||
skipping to change at line 857 ¶ | skipping to change at line 811 ¶ | |||
The most singular property of RIFT is that it only floods link-state | The most singular property of RIFT is that it only floods link-state | |||
information northbound so that each level obtains the full topology | information northbound so that each level obtains the full topology | |||
of levels south of it. Link-State information is, with some | of levels south of it. Link-State information is, with some | |||
exceptions, not flooded East-West nor back south again. Exceptions | exceptions, not flooded East-West nor back south again. Exceptions | |||
like south reflection is explained in detail in Section 6.5.1, and | like south reflection is explained in detail in Section 6.5.1, and | |||
east-west flooding at the ToF level in multi-plane fabrics is | east-west flooding at the ToF level in multi-plane fabrics is | |||
outlined in Section 5.2. In the southbound direction, the necessary | outlined in Section 5.2. In the southbound direction, the necessary | |||
routing information required (normally just a default route as per | routing information required (normally just a default route as per | |||
Section 6.3.8) only propagates one hop south. Those nodes then | Section 6.3.8) only propagates one hop south. Those nodes then | |||
generate their own routing information and flood it south to avoid | generate their own routing information and flood it south to avoid | |||
the overhead of building an update per adjacency. For the moment, | the overhead of building an update per adjacency. The East-West | |||
describing the East-West direction is left out until later in the | direction is described later in the document. | |||
document. | ||||
Those information flow constraints create not only an anisotropic | Those information flow constraints create not only an anisotropic | |||
protocol (i.e., the information is not distributed "evenly" or | protocol (i.e., the information is not distributed "evenly" or | |||
"clumped" but summarized along the north-south gradient) but also a | "clumped" but summarized along the north-south gradient) but also a | |||
"smooth" information propagation where nodes do not receive the same | "smooth" information propagation where nodes do not receive the same | |||
information from multiple directions at the same time. Normally, | information from multiple directions at the same time. Normally, | |||
accepting the same reachability on any link, without understanding | accepting the same reachability on any link, without understanding | |||
its topological significance, forces tie-breaking on some kind of | its topological significance, forces tie-breaking on some kind of | |||
distance function. And such tie-breaking ultimately leads to hop-by- | distance function. And such tie-breaking ultimately leads to hop-by- | |||
hop forwarding by shortest paths only. In contrast to that, RIFT, | hop forwarding by shortest paths only. In contrast to that, RIFT, | |||
skipping to change at line 883 ¶ | skipping to change at line 836 ¶ | |||
valley-free [VFR] forwarding behavior. In the shortest terms, | valley-free [VFR] forwarding behavior. In the shortest terms, | |||
valley-free paths allow reversal of direction from a packet heading | valley-free paths allow reversal of direction from a packet heading | |||
northbound to southbound while permitting traversal of horizontal | northbound to southbound while permitting traversal of horizontal | |||
links in the northbound phase at most once. Those principles | links in the northbound phase at most once. Those principles | |||
guarantee loop-free forwarding and with that can take advantage of | guarantee loop-free forwarding and with that can take advantage of | |||
all such feasible paths on a fabric. This is another highly | all such feasible paths on a fabric. This is another highly | |||
desirable property if available bandwidth should be utilized to the | desirable property if available bandwidth should be utilized to the | |||
maximum extent possible. | maximum extent possible. | |||
To account for the "northern" and the "southern" information split, | To account for the "northern" and the "southern" information split, | |||
the link state database is partitioned accordingly into "north | the link state database (LSDB) is partitioned accordingly into "north | |||
representation" and "south representation" Topology Information | representation" and "south representation" Topology Information | |||
Elements (TIEs). In the simplest terms, the North TIEs contain a | Elements (TIEs). In the simplest terms, the North TIEs contain a | |||
link-state topology description of lower levels and South TIEs simply | link-state topology description of lower levels and South TIEs simply | |||
carry a node description of the level above and default routes | carry a node description of the level above and default routes | |||
pointing north. This oversimplified view will be refined gradually | pointing north. This oversimplified view will be refined gradually | |||
in the following sections while introducing protocol procedures and | in the following sections while introducing protocol procedures and | |||
state machines at the same time. | state machines at the same time. | |||
5.2. Generalized Topology View | 5.2. Generalized Topology View | |||
This section and Section 6.5.2 are dedicated to multi-plane fabrics, | This section and Section 6.5.2 are dedicated to multi-plane fabrics, | |||
in contrast with the single plane designs where all ToF nodes are | in contrast with the single-plane designs where all ToF nodes are | |||
topologically equal and initially connected to all the switches at | topologically equal and initially connected to all the switches at | |||
the level below them. | the level below them. | |||
The multi-plane design is effectively a multidimensional switching | The multi-plane design is effectively a multidimensional switching | |||
matrix. To make that easier to visualize, this document introduces a | matrix. To make that easier to visualize, this document introduces a | |||
methodology depicting the connectivity in two-dimensional pictures. | methodology depicting the connectivity in two-dimensional pictures. | |||
Further, it can be leveraged that what is under consideration here is | Further, it can be leveraged that what is under consideration here is | |||
basically stacked crossbar fabrics where ports align "on top of each | basically stacked crossbar fabrics where ports align "on top of each | |||
other" in a regular fashion. | other" in a regular fashion. | |||
skipping to change at line 947 ¶ | skipping to change at line 900 ¶ | |||
ToF Plane: | ToF Plane: | |||
Set of ToFs that are aware of each other by means of south | Set of ToFs that are aware of each other by means of south | |||
reflection. Planes are designated by capital letters, e.g., plane | reflection. Planes are designated by capital letters, e.g., plane | |||
A. | A. | |||
N: | N: | |||
Denotes the number of independent ToF planes in a topology. | Denotes the number of independent ToF planes in a topology. | |||
R: | R: | |||
Denotes a redundancy factor, i.e., the number of connections a | Denotes a redundancy factor, i.e., the number of ToP nodes in a | |||
spine has towards a ToF plane. In a single plane design, K_TOP is | PoD that are connected to a ToF plane. In a single-plane design, | |||
equal to R. | R is equal to K_LEAF | |||
Fallen Leaf: | Fallen Leaf: | |||
A fallen leaf in a plane Z is a switch that lost all connectivity | A fallen leaf in a plane Z is a switch that lost all connectivity | |||
northbound to Z. | northbound to Z. | |||
5.2.2. Clos as Crossed, Stacked Crossbars | 5.2.2. Clos as Crossed, Stacked Crossbars | |||
The typical topology for which RIFT is defined is built of P number | The typical topology for which RIFT is defined is built of P number | |||
of PoDs and connected together by S number of ToF nodes. A PoD node | of PoDs and connected together by S number of ToF nodes. A PoD node | |||
has K number of ports. From here on, half of them (K=Radix/2) are | has 2K number of ports. From here on, half of them (K=Radix/2) are | |||
assumed to connect host devices from the south, and the other half is | assumed to connect host devices from the south, and the other half is | |||
assumed to connect to interleaved PoD top-level switches to the | assumed to connect to interleaved PoD top-level switches to the | |||
north. The K ratio can be chosen differently without loss of | north. The K ratio can be chosen differently without loss of | |||
generality when port speeds differ or the fabric is oversubscribed, | generality when port speeds differ or the fabric is oversubscribed, | |||
but K=Radix/2 allows for more readable representation whereby there | but K=Radix/2 allows for more readable representation whereby there | |||
are as many ports facing north as south on any intermediate node. A | are as many ports facing north as south on any intermediate node. A | |||
node is hence represented in a schematic fashion with ports "sticking | node is hence represented in a schematic fashion with ports "sticking | |||
out" to its north and south, rather than by the usual real-world | out" to its north and south, rather than by the usual real-world | |||
front faceplate designs of the day. | front faceplate designs of the day. | |||
skipping to change at line 1008 ¶ | skipping to change at line 961 ¶ | |||
+----+ +------------------------------------------------+ | +----+ +------------------------------------------------+ | |||
|| || || || || || || | || || || || || || || | |||
Side Views | Side Views | |||
Figure 4: A Leaf Node, K_LEAF=6 | Figure 4: A Leaf Node, K_LEAF=6 | |||
The Radix of a PoD's top node may be different than that of the leaf | The Radix of a PoD's top node may be different than that of the leaf | |||
node. Though, more often than not, a same type of node is used for | node. Though, more often than not, a same type of node is used for | |||
both, effectively forming a square (K*K). In the general case, | both, effectively forming a square (K*K). In the general case, | |||
switches at the top of the PoD with K_TOP southern ports not | switches at the top of the PoD with K_TOP southern ports not | |||
necessarily equal to K_LEAF could be considered . For instance, in | necessarily equal to K_LEAF could be considered. For instance, in | |||
the representations below, we pick a 6-port K_LEAF and an 8-port | the representations below, we pick a 6-port K_LEAF and an 8-port | |||
K_TOP. In order to form a crossbar, K_TOP leaf nodes are necessary | K_TOP. In order to form a crossbar, K_TOP leaf nodes are necessary | |||
as illustrated in Figure 5. | as illustrated in Figure 5. | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
| O | | O | | O | | O | | O | | O | | O | | O | | | O | | O | | O | | O | | O | | O | | O | | O | | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
skipping to change at line 1073 ¶ | skipping to change at line 1026 ¶ | |||
^ | | ^ | | |||
| | | | | | |||
| ---------- ----------------------- | | | ---------- ----------------------- | | |||
+----- Leaf Node Top-of-PoD Node (Spine) --+ | +----- Leaf Node Top-of-PoD Node (Spine) --+ | |||
---------- ----------------------- | ---------- ----------------------- | |||
Figure 6: Northern View of a PoD's Spines, K_TOP=8 | Figure 6: Northern View of a PoD's Spines, K_TOP=8 | |||
Side views of this PoD is illustrated in Figures 7 and 8. | Side views of this PoD is illustrated in Figures 7 and 8. | |||
Connecting to Spine Nodes | Connecting to ToP Nodes | |||
|| || || || || || || || | || || || || || || || || | |||
+----------------------------------------------------------------+ N | +----------------------------------------------------------------+ N | |||
| Top-of-PoD Node (Sideways) | ^ | | Top-of-PoD Node (Sideways) | ^ | |||
+----------------------------------------------------------------+ | | +----------------------------------------------------------------+ | | |||
|| || || || || || || || * | || || || || || || || || * | |||
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | | |||
|Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| v | |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| |Leaf| v | |||
|Node| |Node| |Node| |Node| |Node| |Node| |Node| |Node| S | |Node| |Node| |Node| |Node| |Node| |Node| |Node| |Node| S | |||
+----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | +----+ +----+ +----+ +----+ +----+ +----+ +----+ +----+ | |||
|| || || || || || || || | || || || || || || || || | |||
Connecting to Client Nodes | Connecting to Client Nodes | |||
Figure 7: Side View of a PoD, K_TOP=8, K_LEAF=6 | Figure 7: Side View of a PoD, K_TOP=8, K_LEAF=6 | |||
Connecting to Spine Nodes | Connecting to ToP Nodes | |||
|| || || || || || | || || || || || || | |||
+----+ +----+ +----+ +----+ +----+ +----+ N | +----+ +----+ +----+ +----+ +----+ +----+ N | |||
|ToP | |ToP | |ToP | |ToP | |ToP | |ToP | ^ | |ToP | |ToP | |ToP | |ToP | |ToP | |ToP | ^ | |||
|Node| |Node| |Node| |Node| |Node| |Node| | | |Node| |Node| |Node| |Node| |Node| |Node| | | |||
+----+ +----+ +----+ +----+ +----+ +----+ * | +----+ +----+ +----+ +----+ +----+ +----+ * | |||
|| || || || || || | | || || || || || || | | |||
+------------------------------------------------+ v | +------------------------------------------------+ v | |||
| Leaf Node (Sideways) | S | | Leaf Node (Sideways) | S | |||
+------------------------------------------------+ | +------------------------------------------------+ | |||
skipping to change at line 1117 ¶ | skipping to change at line 1070 ¶ | |||
As a next step, observe that a resulting PoD can be abstracted as a | As a next step, observe that a resulting PoD can be abstracted as a | |||
bigger node with a number K of K_POD = K_TOP * K_LEAF, and the design | bigger node with a number K of K_POD = K_TOP * K_LEAF, and the design | |||
can recurse. | can recurse. | |||
It will be critical at this point that, before progressing further, | It will be critical at this point that, before progressing further, | |||
the concept and the picture of "crossed crossbars" is understood. | the concept and the picture of "crossed crossbars" is understood. | |||
Else, the following considerations might be difficult to comprehend. | Else, the following considerations might be difficult to comprehend. | |||
To continue, the PoDs are interconnected with each other through a | To continue, the PoDs are interconnected with each other through a | |||
ToF node at the very top or the north edge of the fabric. The | ToF node at the very top or the north edge of the fabric. The | |||
resulting ToF is *not* partitioned if and only if (IIF) every PoD | resulting ToF is *not* partitioned if and only if (IIF) every ToP | |||
top-level node (spine) is connected to every ToF node. This topology | node is connected to every ToF node. This topology is also referred | |||
is also referred to as a single plane configuration and is quite | to as a single-plane configuration and is quite popular due to its | |||
popular due to its simplicity. There are K_TOP ToF nodes and K_LEAF | simplicity. There are K_TOP ToF nodes and K_LEAF ToP nodes because | |||
ToP nodes because each port of a ToP node connects to a different ToF | each port of a ToP node connects to a different ToF node. | |||
node. Consequently, it will take at least P * K_LEAF ports on a ToF | Consequently, it will take at least P * K_LEAF ports on a ToF node to | |||
node to connect to each of the K_LEAF ToP nodes of the P PoDs. | connect to each of the K_LEAF ToP nodes of the P PoDs. Figure 9 | |||
Figure 9 illustrates this, looking at P=3 PoDs from above and 2 | illustrates this, looking at P=3 PoDs from above and 2 sides. The | |||
sides. The large view is the one from above, with the 8 ToF of 3 * 6 | large view is the one from above, with the 8 ToF of 3 * 6 ports each | |||
ports each interconnecting the PoDs and every ToP Node being | interconnecting the PoDs and every ToP Node being connected to every | |||
connected to every ToF node. | ToF node. | |||
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] <-----+ | [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] <-----+ | |||
| | | | | | | | | | | | | | | | | | | | |||
[=================================] | -------------- | [=================================] | -------------- | |||
| | | | | | | | +----- ToF | | | | | | | | | +----- ToF | |||
[ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] +----- Node ---+ | [ ] [ ] [ ] [ ] [ ] [ ] [ ] [ ] +----- Node ---+ | |||
| -------------- | | | -------------- | | |||
| v | | v | |||
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ <-----+ +-+ | +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ <-----+ +-+ | |||
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | |||
skipping to change at line 1161 ¶ | skipping to change at line 1114 ¶ | |||
| | | | | | | | | | | | | | | | -+ +- +-+ v | | | | | | | | | | | | | | | | | | | -+ +- +-+ v | | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] +--- PoD ---+ --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] +--- PoD ---+ --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | ----- | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | [ |o| |o| |o| |o| |o| |o| |o| |o| ] | | --| |--[ ]--| | | |||
| | | | | | | | | | | | | | | | -+ +- +-+ | | | | | | | | | | | | | | | | | | | -+ +- +-+ | | | |||
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ | +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ | |||
Figure 9: Fabric Spines and ToFs in Single Plane Design, 3 PoDs | Figure 9: Fabric Spines and ToFs in Single-Plane Design, 3 PoDs | |||
The top view can be collapsed into a third dimension where the hidden | The top view can be collapsed into a third dimension where the hidden | |||
depth index is representing the PoD number. One PoD can be shown | depth index is representing the PoD number. One PoD can be shown | |||
then as a class of PoDs and hence save one dimension in the | then as a class of PoDs and hence save one dimension in the | |||
representation. The spine node expands in the depth and the vertical | representation. The ToF node expands in the depth and the vertical | |||
dimensions, whereas the PoD top-level nodes are constrained in the | dimensions, whereas the ToP nodes are constrained in the horizontal | |||
horizontal dimension. A port in the 2-D representation effectively | dimension. A port in the 2-D representation effectively represents | |||
represents the class of all the ports at the same position in all the | the class of all the ports at the same position in all the PoDs that | |||
PoDs that are projected in its position along the depth axis. This | are projected in its position along the depth axis. This is shown in | |||
is shown in Figure 10. | Figure 10. | |||
/ / / / / / / / / / / / / / / / | / / / / / / / / / / / / / / / / | |||
/ / / / / / / / / / / / / / / / | / / / / / / / / / / / / / / / / | |||
/ / / / / / / / / / / / / / / / | / / / / / / / / / / / / / / / / | |||
/ / / / / / / / / / / / / / / / ] | / / / / / / / / / / / / / / / / ] | |||
+-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ ]] | +-+ +-+ +-+ +-+ +-+ +-+ +-+ +-+ ]] | |||
| | | | | | | | | | | | | | | | ] ----------------------- | | | | | | | | | | | | | | | | | ] ----------------------- | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] <-- Top of PoD Node (Spine) | [ |o| |o| |o| |o| |o| |o| |o| |o| ] <-- Top of PoD Node (Spine) | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] ----------------------- | [ |o| |o| |o| |o| |o| |o| |o| |o| ] ----------------------- | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ]]]] | [ |o| |o| |o| |o| |o| |o| |o| |o| ]]]] | |||
skipping to change at line 1194 ¶ | skipping to change at line 1147 ¶ | |||
[ |o| |o| |o| |o| |o| |o| |o| |o| ] // (depth) | [ |o| |o| |o| |o| |o| |o| |o| |o| ] // (depth) | |||
| |/| |/| |/| |/| |/| |/| |/| |/ // | | |/| |/| |/| |/| |/| |/| |/| |/ // | |||
+-+ +-+ +-+/+-+/+-+ +-+ +-+ +-+ // | +-+ +-+ +-+/+-+/+-+ +-+ +-+ +-+ // | |||
^ | ^ | |||
| -------- | | -------- | |||
+----- ToF Node | +----- ToF Node | |||
-------- | -------- | |||
Figure 10: Collapsed Northern View of a Fabric for Any Number of PoDs | Figure 10: Collapsed Northern View of a Fabric for Any Number of PoDs | |||
As simple as a single plane deployment is, it introduces a limit due | As simple as a single-plane deployment is, it introduces a limit due | |||
to the bound on the available radix of the ToF nodes that has to be | to the bound on the available radix of the ToF nodes that has to be | |||
at least P * K_LEAF. Nevertheless, it will become clear that a | at least P * K_LEAF. Nevertheless, it will become clear that a | |||
distinct advantage of a connected or non-partitioned ToF is that all | distinct advantage of a connected or non-partitioned ToF is that all | |||
failures can be resolved by simple, non-transitive, positive | failures can be resolved by simple, non-transitive, positive | |||
disaggregation (i.e., nodes advertising more specific prefixes with | disaggregation (i.e., nodes advertising more specific prefixes with | |||
the default to the level below them that is not propagated further | the default to the level below them that is not propagated further | |||
down the fabric) as described in Section 6.5.1. In other words, non- | down the fabric) as described in Section 6.5.1. In other words, non- | |||
partitioned ToF nodes can always reach nodes below or withdraw the | partitioned ToF nodes can always reach nodes below or withdraw the | |||
routes from PoDs they cannot reach unambiguously. And with this, | routes from PoDs they cannot reach unambiguously. And with this, | |||
positive disaggregation can heal all failures and still allow all the | positive disaggregation can heal all failures and still allow all the | |||
ToF nodes to be aware of each other via south reflection. | ToF nodes to be aware of each other via south reflection. | |||
Disaggregation will be explained in further detail in Section 6.5. | Disaggregation will be explained in further detail in Section 6.5. | |||
In order to scale beyond the "single plane limit", the ToF can be | In order to scale beyond the "single-plane limit", the ToF can be | |||
partitioned into N number of identically wired planes where N is an | partitioned into N number of identically wired planes where N is an | |||
integer divider of K_LEAF. The 1:1 ratio and the desired symmetry | integer divider of K_LEAF. The 1:1 ratio and the desired symmetry | |||
are still served, this time with (K_TOP*N) ToF nodes, each of | are still served, this time with (K_TOP*N) ToF nodes, each of | |||
(P*K_LEAF/N) ports. N=1 represents a non-partitioned Spine, and | (P*K_LEAF/N) ports. N=1 represents a non-partitioned ToF | |||
N=K_LEAF is a maximally partitioned Spine. Further, if R is any | (superspine), and N=K_LEAF is a maximally partitioned ToF. Further, | |||
integer divisor of K_LEAF, then N=K_LEAF/R is a feasible number of | if R is any integer divisor of K_LEAF, then N=K_LEAF/R is a feasible | |||
planes and R is a redundancy factor that denotes the number of | number of planes and R is a redundancy factor that denotes the number | |||
independent paths between 2 leaves within a plane. It proves | of independent paths between 2 leaves within a plane. It proves | |||
convenient for deployments to use a radix for the leaf nodes that is | convenient for deployments to use a radix for the leaf nodes that is | |||
a power of 2 so they can pick a number of planes that is a lower | a power of 2 so they can pick a number of planes that is a lower | |||
power of 2. The example in Figure 11 splits the Spine in 2 planes | power of 2. The example in Figure 11 splits the ToF in 2 planes with | |||
with a redundancy factor of R=3, meaning that there are 3 non- | a redundancy factor of R=3, meaning that there are 3 non-intersecting | |||
intersecting paths between any leaf node and any ToF node. A ToF | paths between any leaf node and any ToF node. A ToF node must have, | |||
node must have, in this case, at least 3*P ports and be directly | in this case, at least 3*P ports and be directly connected to 3 of | |||
connected to 3 of the 6 ToP nodes (spines) in each PoD. The ToP | the 6 ToP nodes (spines) in each PoD. The ToP nodes are represented | |||
nodes are represented horizontally with K_TOP=8 ports northwards | horizontally with K_TOP=8 ports northwards each. | |||
each. | ||||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | |||
skipping to change at line 1263 ¶ | skipping to change at line 1215 ¶ | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | |||
^ | ^ | |||
| | | | |||
| --------------------- | | --------------------- | |||
+----- ToF Node Across Depth | +----- ToF Node Across Depth | |||
--------------------- | --------------------- | |||
Figure 11: Northern View of a Multi-Plane ToF Level, K_LEAF=6, N=2 | Figure 11: Northern View of a Multi-Plane ToF Level, K_LEAF=6, N=2 | |||
At the extreme end of the spectrum, it is even possible to fully | At the extreme end of the spectrum, it is even possible to fully | |||
partition the spine with N=K_LEAF and R=1 while maintaining | partition the ToF with N=K_LEAF and R=1 while maintaining | |||
connectivity between each leaf node and each ToF node. In that case, | connectivity between each leaf node and each ToF node. In that case, | |||
the ToF node connects to a single port per PoD, so it appears as a | the ToF node connects to a single port per PoD, so it appears as a | |||
single port in the projected view represented in Figure 12. The | single port in the projected view represented in Figure 12. The | |||
number of ports required on the spine node is more than or equal to | number of ports required on the ToF node is more than or equal to P, | |||
P, i.e., the number of PoDs. | i.e., the number of PoDs. | |||
Plane 1 | Plane 1 | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ -+ | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ -+ | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | | |||
| | O | | O | | O | | O | | O | | O | | O | | O | | | | | | O | | O | | O | | O | | O | | O | | O | | O | | | | |||
+-| |--| |--| |--| |--| |--| |--| |--| |-+ | | +-| |--| |--| |--| |--| |--| |--| |--| |-+ | | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | |||
----------- . ------------------- . ------------ . ------- | | ----------- . ------------------- . ------------ . ------- | | |||
Plane 2 | | Plane 2 | | |||
+---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | +---+ +---+ +---+ +---+ +---+ +---+ +---+ +---+ | | |||
skipping to change at line 1332 ¶ | skipping to change at line 1284 ¶ | |||
for fabrics with a north-south orientation and a high level of | for fabrics with a north-south orientation and a high level of | |||
interleaving paths. A non-partitioned fabric makes a total loss of | interleaving paths. A non-partitioned fabric makes a total loss of | |||
connectivity between a ToF node at the north and a leaf node at the | connectivity between a ToF node at the north and a leaf node at the | |||
south a very rare but possible occasion that is fully healed by | south a very rare but possible occasion that is fully healed by | |||
positive disaggregation as described in Section 6.5.1. In large | positive disaggregation as described in Section 6.5.1. In large | |||
fabrics or fabrics built from switches with a low radix, the ToF may | fabrics or fabrics built from switches with a low radix, the ToF may | |||
often become partitioned in planes, which makes it more likely that a | often become partitioned in planes, which makes it more likely that a | |||
given leaf is only reachable from a subset of the ToF nodes. This | given leaf is only reachable from a subset of the ToF nodes. This | |||
makes some further considerations necessary. | makes some further considerations necessary. | |||
A "Fallen Leaf" is a leaf that can be reached by only a subset of ToF | A "fallen leaf" is a leaf that can be reached by only a subset of ToF | |||
nodes due to missing connectivity. If R is the redundancy factor, | nodes due to missing connectivity. If R is the redundancy factor, | |||
then it takes at least R breakages to reach a "Fallen Leaf" | then it takes at least R breakages to reach a "fallen leaf" | |||
situation. | situation. | |||
In a maximally partitioned fabric, the redundancy factor is R=1, so | In a maximally partitioned fabric, the redundancy factor is R=1, so | |||
any breakage in the fabric will cause one or more fallen leaves in | any breakage in the fabric will cause one or more fallen leaves in | |||
the affected plane. R=2 guarantees that a single breakage will not | the affected plane. R=2 guarantees that a single breakage will not | |||
cause a fallen leaf. However, not all cases require disaggregation. | cause a fallen leaf. However, not all cases require disaggregation. | |||
The following cases do not require particular action: | The following cases do not require particular action: | |||
* If a southern link on a node goes down, then connectivity through | * If a southern link on a node goes down, then connectivity through | |||
that node is lost for all nodes south of it. There is no need to | that node is lost for all nodes south of that link. There is no | |||
disaggregate since the connectivity to this node is lost for all | need to disaggregate since the connectivity to this node is lost | |||
spine nodes in the same fashion. | for all spine nodes in the same fashion. | |||
* If a ToF node goes down, then northern traffic towards it is | * If a ToF node goes down, then northern traffic towards it is | |||
routed via alternate ToF nodes in the same plane and there is no | routed via alternate ToF nodes in the same plane and there is no | |||
need to disaggregate routes. | need to disaggregate routes. | |||
In a general manner, the mechanism of non-transitive, positive | In a general manner, the mechanism of non-transitive, positive | |||
disaggregation is sufficient when the disaggregating ToF nodes | disaggregation is sufficient when the disaggregating ToF nodes | |||
collectively connect to all the ToP nodes in the broken plane. This | collectively connect to all the ToP nodes in the broken plane. This | |||
happens in the following case: | happens in the following case: | |||
skipping to change at line 1377 ¶ | skipping to change at line 1329 ¶ | |||
* If the breakage is the last northern link from a leaf node within | * If the breakage is the last northern link from a leaf node within | |||
a plane (there is only one such link in a maximally partitioned | a plane (there is only one such link in a maximally partitioned | |||
fabric) that goes down, then connectivity to all unicast prefixes | fabric) that goes down, then connectivity to all unicast prefixes | |||
attached to the leaf node is lost within the plane where the link | attached to the leaf node is lost within the plane where the link | |||
is located. Southern Reflection by a leaf node, e.g., between ToP | is located. Southern Reflection by a leaf node, e.g., between ToP | |||
nodes, if the PoD has only 2 levels, happens in between planes, | nodes, if the PoD has only 2 levels, happens in between planes, | |||
allowing the ToP nodes to detect the problem within the PoD where | allowing the ToP nodes to detect the problem within the PoD where | |||
it occurs and positively disaggregate. The breakage can be | it occurs and positively disaggregate. The breakage can be | |||
observed by the ToF nodes in the same plane through the north | observed by the ToF nodes in the same plane through the north | |||
flooding of TIEs from the ToP nodes However, the ToF nodes need to | flooding of TIEs from the ToP nodes. However, the ToF nodes need | |||
be aware of all the affected prefixes for the negative, possibly | to be aware of all the affected prefixes for the negative, | |||
transitive, disaggregation to be fully effective (i.e., a node | possibly transitive, disaggregation to be fully effective (i.e., a | |||
advertising in the control plane that it cannot reach a certain | node advertising in the control plane that it cannot reach a | |||
more specific prefix than default, whereas such disaggregation in | certain more specific prefix than the default prefix, whereas such | |||
the extreme condition must be propagated further down southbound). | disaggregation in the extreme condition must be propagated further | |||
The problem can also be observed by the ToF nodes in the other | down southbound). The problem can also be observed by the ToF | |||
planes through the flooding of North TIEs from the affected leaf | nodes in the other planes through the flooding of North TIEs from | |||
nodes, together with non-node North TIEs, which indicate the | the affected leaf nodes, together with non-node North TIEs, which | |||
affected prefixes. To be effective in that case, the positive | indicate the affected prefixes. To be effective in that case, the | |||
disaggregation must reach down to the nodes that make the plane | positive disaggregation must reach down to the nodes that make the | |||
selection, which are typically the ingress leaf nodes. The | plane selection, which are typically the ingress leaf nodes. The | |||
information is not useful for routing in the intermediate levels. | information is not useful for routing in the intermediate levels. | |||
* If the breakage is a ToP node in a maximally partitioned fabric | * If the breakage is a ToP node in a maximally partitioned fabric | |||
(in which case it is the only ToP node serving the plane in that | (in which case it is the only ToP node serving the plane in that | |||
PoD that goes down), then the connectivity to all the nodes in the | PoD that goes down), then the connectivity to all the nodes in the | |||
PoD is lost within the plane where the ToP node is located. | PoD is lost within the plane where the ToP node is located. | |||
Consequently, all leaves of the PoD fall in this plane. Since the | Consequently, all leaves of the PoD fall in this plane. Since the | |||
Southern Reflection between the ToF nodes happens only within a | Southern Reflection between the ToF nodes happens only within a | |||
plane, ToF nodes in other planes cannot discover fallen leaves in | plane, ToF nodes in other planes cannot discover fallen leaves in | |||
a different plane. They also cannot determine beyond their local | a different plane. They also cannot determine beyond their local | |||
skipping to change at line 1459 ¶ | skipping to change at line 1411 ¶ | |||
possible to connect them together by interplane bidirectional rings | possible to connect them together by interplane bidirectional rings | |||
as illustrated in Figure 13. The rings will be used to exchange full | as illustrated in Figure 13. The rings will be used to exchange full | |||
north topology information between planes. All ToFs having the same | north topology information between planes. All ToFs having the same | |||
north topology allows, by the means of transitive, negative | north topology allows, by the means of transitive, negative | |||
disaggregation described in Section 6.5.2, to efficiently fix any | disaggregation described in Section 6.5.2, to efficiently fix any | |||
possible fallen leaf scenario. Somewhat as a side effect, the | possible fallen leaf scenario. Somewhat as a side effect, the | |||
exchange of information fulfills the requirement for a full view of | exchange of information fulfills the requirement for a full view of | |||
the fabric topology at the ToF level without the need to collate it | the fabric topology at the ToF level without the need to collate it | |||
from multiple points. | from multiple points. | |||
____________________________________________________________________________ | _______________________________________________________________________ | |||
| [Plane A] . [Plane B] . [Plane C] . [Plane D] | | | [Plane A] . [Plane B] . [Plane C] . [Plane D] | | |||
|..........................................................................| | |.....................................................................| | |||
| +-------------------------------------------------------------+ | | | +------------------------------------------------------------+ | | |||
| | +---+ . +---+ . +---+ . +---+ | | | | | +---+ . +---+ . +---+ . +---+ | | | |||
| +-+ n +-------------+ n +-------------+ n +-------------+ n +-+ | | | +-+ n +-------------+ n +-------------+ n +------------+ n +-+ | | |||
| +--++ . +-+++ . +-+++ . +--++ | | | +--++ . +-+++ . +-+++ . +--++ | | |||
| || . || . || . || | | | || . || . || . || | | |||
| +---------||---------------||----------------||---------------+ || | | | +---------||---------------||----------------||--------------+ || | | |||
| | +---+ || . +---+ || . +---+ || . +---+ | || | | | | +---+ || . +---+ || . +---+ || . +---+ | || | | |||
| +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||---------+ 1 +-+ || | | | +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||--------+ 1 +-+ || | | |||
| +--++ || . +-+++ || . +-+++ || . +-+++ || | | | +--++ || . +-+++ || . +-+++ || . +-+++ || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
Figure 13: Using Rings to Bring All Planes and Bind Them at the ToF | Figure 13: Using Rings to Bring All Planes and Bind Them at the ToF | |||
5.5. Addressing the Fallen Leaves Problem | 5.5. Addressing the Fallen Leaves Problem | |||
One consequence of the "Fallen Leaf" problem is that some prefixes | One consequence of the "fallen leaf" problem is that some prefixes | |||
attached to the fallen leaf become unreachable from some of the ToF | attached to the fallen leaf become unreachable from some of the ToF | |||
nodes. RIFT defines two methods to address this issue, denoted as | nodes. RIFT defines two methods to address this issue, denoted as | |||
positive disaggregation and negative disaggregation. Both methods | positive disaggregation and negative disaggregation. Both methods | |||
flood corresponding types of South TIEs to advertise the impacted | flood corresponding types of South TIEs to advertise the impacted | |||
prefix(es). | prefix(es). | |||
When used for the operation of disaggregation, a positive South TIE, | When used for the operation of disaggregation, a positive South TIE, | |||
as usual, indicates reachability to a prefix of given length and all | as usual, indicates reachability to a prefix of given length and all | |||
addresses subsumed by it. In contrast, a negative route | addresses subsumed by it. In contrast, a negative route | |||
advertisement indicates that the origin cannot route to the | advertisement indicates that the origin cannot route to the | |||
skipping to change at line 1576 ¶ | skipping to change at line 1528 ¶ | |||
observable behavior equivalent to the behavior of the standardized | observable behavior equivalent to the behavior of the standardized | |||
FSMs. | FSMs. | |||
The FSMs can use "timers" for different situations. Those timers are | The FSMs can use "timers" for different situations. Those timers are | |||
started through actions, and their expiration leads to queuing of | started through actions, and their expiration leads to queuing of | |||
corresponding events to be processed. | corresponding events to be processed. | |||
The term "holdtime" is used often as shorthand for "holddown timer" | The term "holdtime" is used often as shorthand for "holddown timer" | |||
and signifies either the length of the holding down period or the | and signifies either the length of the holding down period or the | |||
timer used to expire after such period. Such timers are used to | timer used to expire after such period. Such timers are used to | |||
"hold down" the state within an FSM that is cleaned if the machine | "holddown" the state within an FSM that is cleaned if the machine | |||
triggers a _HoldtimeExpired_ event. | triggers a _HoldtimeExpired_ event. | |||
6.1. Transport | 6.1. Transport | |||
All normative RIFT packet structures and their contents are defined | All normative RIFT packet structures and their contents are defined | |||
in the Thrift [thrift] models in Section 7. The packet structure | in the Thrift [thrift] models in Section 7. The packet structure | |||
itself is defined in _ProtocolPacket_, which contains the packet | itself is defined in _ProtocolPacket_, which contains the packet | |||
header in _PacketHeader_ and the packet contents in _PacketContent_. | header in _PacketHeader_ and the packet contents in _PacketContent_. | |||
_PacketContent_ is a union of the LIE, TIE, TIDE, and TIRE packets, | _PacketContent_ is a union of the LIE, TIE, TIDE, and TIRE packets, | |||
which are subsequently defined in _LIEPacket_, _TIEPacket_, | which are subsequently defined in _LIEPacket_, _TIEPacket_, | |||
skipping to change at line 1611 ¶ | skipping to change at line 1563 ¶ | |||
state, at which point it is ready to exchange TIEs as described in | state, at which point it is ready to exchange TIEs as described in | |||
Section 6.3. The adjacency exchanges RIFT ZTP information | Section 6.3. The adjacency exchanges RIFT ZTP information | |||
(Section 6.7) in any of the states, i.e., it is not necessary to | (Section 6.7) in any of the states, i.e., it is not necessary to | |||
reach _ThreeWay_ for ZTP to operate. | reach _ThreeWay_ for ZTP to operate. | |||
RIFT supports any combination of IPv4 and IPv6 addressing, including | RIFT supports any combination of IPv4 and IPv6 addressing, including | |||
link-local scope, on the fabric to form adjacencies with the | link-local scope, on the fabric to form adjacencies with the | |||
additional capability for forwarding paths that are capable of | additional capability for forwarding paths that are capable of | |||
forwarding IPv4 packets in the presence of IPv6 addressing only. | forwarding IPv4 packets in the presence of IPv6 addressing only. | |||
IPv4 LIE exchange happens by default over well-known administratively | IPv4 LIE exchange happens by default over a well-known IPv4 multicast | |||
locally scoped and configured or otherwise well-known IPv4 multicast | address [RFC2365] that may also be administratively configured (e.g., | |||
address [RFC2365]. For IPv6 [RFC8200], exchange is performed over | with a local scope). For IPv6 [RFC8200], exchange is performed over | |||
the link-local multicast scope [RFC4291] address, which is configured | the link-local multicast scope [RFC4291] address, which is configured | |||
or otherwise well-known. In both cases, a destination UDP port | or otherwise well-known. In both cases, a destination UDP port | |||
defined in the schema (Section 7.2) is used unless configured | defined in the schema (Section 7.2) is used unless configured | |||
otherwise. LIEs MUST be sent with an IPv4 Time to Live (TTL) or an | otherwise. LIEs MUST be sent with an IPv4 Time to Live (TTL) or an | |||
IPv6 Hop Limit (HL) of either 1 or 255 to prevent RIFT information | IPv6 Hop Limit (HL) of either 1 or 255 to prevent RIFT information | |||
reaching beyond a single Layer 3 (L3) next hop in the topology. | reaching beyond a single Layer 3 (L3) next hop in the topology. | |||
Observe that, for the allocated link-local scope IP multicast | Observe that, for the allocated link-local scope IP multicast | |||
address, the TTL value of 1 is a more logical choice since the TTL | address, the TTL value of 1 is a more logical choice since the TTL | |||
value of 255 may, in some environments, lead to an early drop due to | value of 255 may, in some environments, lead to an early drop due to | |||
the suspicious TTL value for a packet addressed to such a | the suspicious TTL value for a packet addressed to such a | |||
skipping to change at line 1691 ¶ | skipping to change at line 1643 ¶ | |||
_ipv4_forwarding_capable_ flag setting across the same address family | _ipv4_forwarding_capable_ flag setting across the same address family | |||
combinations. The table is symmetric, i.e., the local and remote | combinations. The table is symmetric, i.e., the local and remote | |||
columns can be exchanged to construct the remaining combinations. | columns can be exchanged to construct the remaining combinations. | |||
The specific forwarding implementation to support the described | The specific forwarding implementation to support the described | |||
behavior is out of scope for this document. | behavior is out of scope for this document. | |||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| Local | Remote | LIE Exchange Behavior | | | Local | Remote | LIE Exchange Behavior | | |||
| Neighbor | Neighbor | | | | Neighbor | Neighbor | | | |||
| AF | AF | | | | Address | Address | | | |||
| Family | Family | | | ||||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| IPv4 | IPv4 | LIEs and TIEs are exchanged over IPv4 | | | IPv4 | IPv4 | LIEs and TIEs are exchanged over IPv4 | | |||
| | | only. The local neighbor receives TIEs | | | | | only. The local neighbor receives TIEs | | |||
| | | from remote neighbors on any of the LIE | | | | | from remote neighbors on any of the LIE | | |||
| | | source addresses. | | | | | source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv6 | IPv6 | LIEs and TIEs are exchanged over IPv6 | | | IPv6 | IPv6 | LIEs and TIEs are exchanged over IPv6 | | |||
| | | only. The local neighbor receives TIEs | | | | | only. The local neighbor receives TIEs | | |||
| | | from remote neighbors on any of the LIE | | | | | from remote neighbors on any of the LIE | | |||
| | | source addresses. | | | | | source addresses. | | |||
skipping to change at line 1723 ¶ | skipping to change at line 1676 ¶ | |||
| | | the remote neighbors on any of the IPv4 | | | | | the remote neighbors on any of the IPv4 | | |||
| | | or IPv6 LIE source addresses. | | | | | or IPv6 LIE source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv4 | The local neighbor sends LIEs for both | | | IPv4, | IPv4 | The local neighbor sends LIEs for both | | |||
| IPv6 | | IPv4 and IPv6, while the remote neighbor | | | IPv6 | | IPv4 and IPv6, while the remote neighbor | | |||
| | | only sends LIEs for IPv4. The resulting | | | | | only sends LIEs for IPv4. The resulting | | |||
| | | adjacency will exchange TIEs over IPv4 | | | | | adjacency will exchange TIEs over IPv4 | | |||
| | | on any of the IPv4 LIE source addresses. | | | | | on any of the IPv4 LIE source addresses. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
Table 1: Control Plane Behavior for Neighbor AF Combinations | Table 1: Control Plane Behavior for Neighbor Address Family | |||
Combinations | ||||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| Local | Remote | Forwarding Behavior | | | Local | Remote | Forwarding Behavior | | |||
| Neighbor | Neighbor | | | | Neighbor | Neighbor | | | |||
| AF | AF | | | | Address | Address | | | |||
| Family | Family | | | ||||
+==========+==========+==========================================+ | +==========+==========+==========================================+ | |||
| IPv4 | IPv4 | Only IPv4 traffic can be forwarded. | | | IPv4 | IPv4 | Only IPv4 traffic can be forwarded. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv6 | IPv6 | If either neighbor sets | | | IPv6 | IPv6 | If either neighbor sets | | |||
| | | _ipv4_forwarding_capable_ to false, only | | | | | _ipv4_forwarding_capable_ to false, only | | |||
| | | IPv6 traffic can be forwarded. If both | | | | | IPv6 traffic can be forwarded. If both | | |||
| | | neighbors set _ipv4_forwarding_capable_ | | | | | neighbors set _ipv4_forwarding_capable_ | | |||
| | | to true, IPv4 traffic is also forwarded | | | | | to true, IPv4 traffic is also forwarded | | |||
| | | via IPv6 gateways. | | | | | via IPv6 gateways. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
skipping to change at line 1755 ¶ | skipping to change at line 1710 ¶ | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv4, | IPv4 and IPv6 traffic can be forwarded. | | | IPv4, | IPv4, | IPv4 and IPv6 traffic can be forwarded. | | |||
| IPv6 | IPv6 | If IPv4 and IPv6 LIEs advertise | | | IPv6 | IPv6 | If IPv4 and IPv6 LIEs advertise | | |||
| | | conflicting _ipv4_forwarding_capable_ | | | | | conflicting _ipv4_forwarding_capable_ | | |||
| | | flags, the behavior is unspecified. | | | | | flags, the behavior is unspecified. | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
| IPv4, | IPv4 | IPv4 traffic can be forwarded. | | | IPv4, | IPv4 | IPv4 traffic can be forwarded. | | |||
| IPv6 | | | | | IPv6 | | | | |||
+----------+----------+------------------------------------------+ | +----------+----------+------------------------------------------+ | |||
Table 2: Forwarding Behavior for Neighbor AF Combinations | Table 2: Forwarding Behavior for Neighbor Address Family | |||
Combinations | ||||
The protocol does *not* support selective disabling of address | The protocol does *not* support selective disabling of address | |||
families after adjacency formation, disabling IPv4 forwarding | families after adjacency formation, disabling IPv4 forwarding | |||
capability, or any local address changes in _ThreeWay_ state, i.e., | capability, or any local address changes in _ThreeWay_ state, i.e., | |||
if a link has entered ThreeWay IPv4 and/or IPv6 with a neighbor on an | if a link has entered ThreeWay IPv4 and/or IPv6 with a neighbor on an | |||
adjacency and it wants to stop supporting one of the families, change | adjacency and it wants to stop supporting one of the families, change | |||
any of its local addresses, or stop IPv4 forwarding, it MUST tear | any of its local addresses, or stop IPv4 forwarding, it MUST tear | |||
down and rebuild the adjacency. It MUST also remove any state it | down and rebuild the adjacency. It MUST also remove any state it | |||
stored about the remote side of the adjacency such as associated LIE | stored about the remote side of the adjacency such as associated LIE | |||
source addresses. | source addresses. | |||
skipping to change at line 1779 ¶ | skipping to change at line 1735 ¶ | |||
in the _level_ of the _PacketHeader_ schema element. It MAY also be | in the _level_ of the _PacketHeader_ schema element. It MAY also be | |||
provisioned with its PoD. If the level is not provisioned, it is not | provisioned with its PoD. If the level is not provisioned, it is not | |||
present in the optional _PacketHeader_ schema element and established | present in the optional _PacketHeader_ schema element and established | |||
by ZTP procedures, if feasible. If PoD is not provisioned, it is | by ZTP procedures, if feasible. If PoD is not provisioned, it is | |||
governed by the _LIEPacket_ schema element assuming the | governed by the _LIEPacket_ schema element assuming the | |||
_common.default_pod_ value. This means that switches except ToF do | _common.default_pod_ value. This means that switches except ToF do | |||
not need to be configured at all. Necessary information to configure | not need to be configured at all. Necessary information to configure | |||
all values is exchanged in the _LIEPacket_ and _PacketHeader_ or | all values is exchanged in the _LIEPacket_ and _PacketHeader_ or | |||
derived by the node automatically. | derived by the node automatically. | |||
Further definitions of leaf flags are found in Section 6.7 given they | Further leaf flag definitions are found in Section 6.7 as they have | |||
have implications in terms of level and adjacency forming here. Leaf | implications in terms of level and adjacency formation. Leaf flags | |||
flags are carried in _HierarchyIndications_. | are carried in _HierarchyIndications_. | |||
A node MUST form a _ThreeWay_ adjacency if, at a minimum, the | A node MUST form a _ThreeWay_ adjacency if, at a minimum, the | |||
following first order logic conditions are satisfied on a LIE packet, | following first order logic conditions are satisfied on a LIE packet, | |||
as specified by the _LIEPacket_ schema element and received on a link | as specified by the _LIEPacket_ schema element and received on a link | |||
(such a LIE is considered a "minimally valid" LIE). Observe that, | (such a LIE is considered a "minimally valid" LIE). Observe that, | |||
depending on the FSM involved and its state further, conditions may | depending on the FSM involved and its state further, conditions may | |||
be checked, and even a minimally valid LIE can be considered | be checked, and even a minimally valid LIE can be considered | |||
ultimately invalid if any of the additional conditions fail: | ultimately invalid if any of the additional conditions fail: | |||
1. the neighboring node is running the same major schema version as | 1. the neighboring node is running the same major schema version as | |||
indicated in the _major_version_ element in _PacketHeader_; | indicated in the _major_version_ element in _PacketHeader_ *and* | |||
2. the neighboring node uses a valid System ID (i.e., a value | 2. the neighboring node uses a valid System ID (i.e., a value | |||
different from _IllegalSystemID_) in the _sender_ element in | different from _IllegalSystemID_) in the _sender_ element in | |||
_PacketHeader_; | _PacketHeader_ *and* | |||
3. the neighboring node uses a different System ID than the node | 3. the neighboring node uses a different System ID than the node | |||
itself; | itself *and* | |||
4. the advertised MTU values in the _LiePacket_ element match on | 4. (the advertised MTU values in the _LiePacket_ element match on | |||
both sides, while a missing MTU in the _LiePacket_ element is | both sides, while a missing MTU in the _LiePacket_ element is | |||
interpreted as _default_mtu_size_; | interpreted as _default_mtu_size_) *and* | |||
5. both nodes advertise defined level values in the _level_ element | 5. both nodes advertise defined level values in the _level_ element | |||
in _PacketHeader_, *and* | in _PacketHeader_ *and* | |||
6. either: | 6. [ | |||
a. the node is at the _leaf_level_ value and has no _ThreeWay_ | a. the node is at the _leaf_level_ value and does not already | |||
adjacencies already to nodes at Highest Adjacency _ThreeWay_ | have any _ThreeWay_ adjacencies to nodes that are at the | |||
(HAT), as defined later in Section 6.7.1, with the level | Highest Adjacency _ThreeWay_ (HAT), as defined in | |||
different than the adjacent node; | Section 6.7.1, with a level that is different than the | |||
adjacent node *or* | ||||
b. the node is not at the _leaf_level_ value and the neighboring | b. the node is not at the _leaf_level_ value and the neighboring | |||
node is at the _leaf_level_ value; | node is at the _leaf_level_ value *or* | |||
c. both nodes are at the _leaf_level_ values *and* both indicate | c. both nodes are at the _leaf_level_ value *and* both indicate | |||
support for that described in Section 6.8.9; *or* | support for that described in Section 6.8.9 *or* | |||
d. neither node is at the _leaf_level_ value and the neighboring | d. neither node is at the _leaf_level_ value and the neighboring | |||
node is, at most, one level away. | node is, at most, one level away. | |||
] | ||||
LIEs arriving with IPv4 Time to Live (TTL) or an IPv6 Hop Limit (HL) | LIEs arriving with IPv4 Time to Live (TTL) or an IPv6 Hop Limit (HL) | |||
different than 1 or 255 MUST be ignored. | different than 1 or 255 MUST be ignored. | |||
6.2.1. LIE Finite State Machine | 6.2.1. LIE Finite State Machine | |||
This section specifies the precise, normative LIE FSM, which is also | This section specifies the precise, normative LIE FSM, which is also | |||
shown in Figure 14. Additionally, some sets of actions often repeat | shown in Figure 14. Additionally, some sets of actions often repeat | |||
and are hence summarized into well-known procedures. | and are hence summarized into well-known procedures. | |||
Events generated are fairly fine grained, especially when indicating | Events generated are fairly fine grained, especially when indicating | |||
skipping to change at line 2036 ¶ | skipping to change at line 1995 ¶ | |||
from LIE's address, then PUSH NeighborChangedAddress, else | from LIE's address, then PUSH NeighborChangedAddress, else | |||
d. if any of the neighbor's flood address port, name, or | d. if any of the neighbor's flood address port, name, or | |||
local LinkID changed, then PUSH NeighborChangedMinorFields | local LinkID changed, then PUSH NeighborChangedMinorFields | |||
e. CHECK_THREE_WAY | e. CHECK_THREE_WAY | |||
* CHECK_THREE_WAY: if the current state is _OneWay_, do nothing, | * CHECK_THREE_WAY: if the current state is _OneWay_, do nothing, | |||
else | else | |||
1. if LIE packet does not contain a neighbor and if the current | 1. if LIE packet does not contain a neighbor then if the current | |||
state is _ThreeWay_, then PUSH NeighborDroppedReflection, else | state is _ThreeWay_, then PUSH NeighborDroppedReflection, else | |||
2. if the packet reflects this System ID and local port and the | 2. if the packet reflects this System ID and local port and the | |||
state is _ThreeWay_, then PUSH the ValidReflection event, else | state is _ThreeWay_, then PUSH the ValidReflection event, else | |||
PUSH the MultipleNeighbors event. | PUSH the MultipleNeighbors event. | |||
States: | States: | |||
* OneWay: The initial state the FSM is starting from. In this | * OneWay: The initial state the FSM is starting from. In this | |||
state, the router did not receive any valid LIEs from a neighbor. | state, the router did not receive any valid LIEs from a neighbor. | |||
skipping to change at line 2112 ¶ | skipping to change at line 2071 ¶ | |||
* MTUMismatch: MTU mismatched. | * MTUMismatch: MTU mismatched. | |||
* NeighborChangedMinorFields: Minor fields changed in the neighbor's | * NeighborChangedMinorFields: Minor fields changed in the neighbor's | |||
LIE. | LIE. | |||
* HoldtimeExpired: Adjacency holddown timer expired. | * HoldtimeExpired: Adjacency holddown timer expired. | |||
* MultipleNeighbors: More than one neighbor is present on the | * MultipleNeighbors: More than one neighbor is present on the | |||
interface. | interface. | |||
* MultipleNeighborsDone: Multiple neighbors' timers expired. | * MultipleNeighborsDone: Multiple neighbors timer expired. | |||
* FloodLeadersChanged: Node's election algorithm determined new set | * FloodLeadersChanged: Node's election algorithm determined new set | |||
of flood leaders. | of flood leaders. | |||
* SendLie: Send a LIE out. | * SendLie: Send a LIE out. | |||
* UpdateZTPOffer: Update this node's ZTP offer. This is sent to the | * UpdateZTPOffer: Update this node's ZTP offer. This is sent to the | |||
ZTP FSM. | ZTP FSM. | |||
Actions: | Actions: | |||
skipping to change at line 2140 ¶ | skipping to change at line 2099 ¶ | |||
* on UnacceptableHeader in _OneWay_ finishes in OneWay: no action | * on UnacceptableHeader in _OneWay_ finishes in OneWay: no action | |||
* on NeighborChangedMinorFields in _OneWay_ finishes in OneWay: no | * on NeighborChangedMinorFields in _OneWay_ finishes in OneWay: no | |||
action | action | |||
* on SendLie in _OneWay_ finishes in OneWay: SEND_LIE | * on SendLie in _OneWay_ finishes in OneWay: SEND_LIE | |||
* on HALSChanged in _OneWay_ finishes in OneWay: store the HALS | * on HALSChanged in _OneWay_ finishes in OneWay: store the HALS | |||
* on MultipleNeighbors in _OneWay_ finishes in | * on MultipleNeighbors in _OneWay_ finishes in | |||
MultipleNeighborsWait: start multiple neighbors' timers with the | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on NeighborChangedLevel in _OneWay_ finishes in OneWay: no action | * on NeighborChangedLevel in _OneWay_ finishes in OneWay: no action | |||
* on LieRcvd in _OneWay_ finishes in OneWay: PROCESS_LIE | * on LieRcvd in _OneWay_ finishes in OneWay: PROCESS_LIE | |||
* on MTUMismatch in _OneWay_ finishes in OneWay: no action | * on MTUMismatch in _OneWay_ finishes in OneWay: no action | |||
* on ValidReflection in _OneWay_ finishes in ThreeWay: no action | * on ValidReflection in _OneWay_ finishes in ThreeWay: no action | |||
skipping to change at line 2214 ¶ | skipping to change at line 2173 ¶ | |||
* on HALSChanged in _TwoWay_ finishes in TwoWay: store the HALS | * on HALSChanged in _TwoWay_ finishes in TwoWay: store the HALS | |||
* on MTUMismatch in _TwoWay_ finishes in OneWay: no action | * on MTUMismatch in _TwoWay_ finishes in OneWay: no action | |||
* on NeighborChangedAddress in _TwoWay_ finishes in OneWay: no | * on NeighborChangedAddress in _TwoWay_ finishes in OneWay: no | |||
action | action | |||
* on SendLie in _TwoWay_ finishes in TwoWay: SEND_LIE | * on SendLie in _TwoWay_ finishes in TwoWay: SEND_LIE | |||
* on MultipleNeighbors in _TwoWay_ finishes in | * on MultipleNeighbors in _TwoWay_ finishes in | |||
MultipleNeighborsWait: start multiple neighbors' timers with the | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on TimerTick in _ThreeWay_ finishes in ThreeWay: PUSH the SendLie | * on TimerTick in _ThreeWay_ finishes in ThreeWay: PUSH the SendLie | |||
event, if the last valid LIE was received more than _holdtime_ ago | event, if the last valid LIE was received more than _holdtime_ ago | |||
as advertised by the neighbor, then PUSH the HoldtimeExpired event | as advertised by the neighbor, then PUSH the HoldtimeExpired event | |||
* on LevelChanged in _ThreeWay_ finishes in OneWay: update the level | * on LevelChanged in _ThreeWay_ finishes in OneWay: update the level | |||
with the event value | with the event value | |||
* on HATChanged in _ThreeWay_ finishes in ThreeWay: store HAT | * on HATChanged in _ThreeWay_ finishes in ThreeWay: store HAT | |||
* on MTUMismatch in _ThreeWay_ finishes in OneWay: no action | * on MTUMismatch in _ThreeWay_ finishes in OneWay: no action | |||
* on UnacceptableHeader in _ThreeWay_ finishes in OneWay: no action | * on UnacceptableHeader in _ThreeWay_ finishes in OneWay: no action | |||
* on MultipleNeighbors in _ThreeWay_ finishes in | * on MultipleNeighbors in _ThreeWay_ finishes in | |||
MultipleNeighborsWait: start multiple neighbors' timers with the | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on NeighborChangedLevel in _ThreeWay_ finishes in OneWay: no | * on NeighborChangedLevel in _ThreeWay_ finishes in OneWay: no | |||
action | action | |||
* on HALSChanged in _ThreeWay_ finishes in ThreeWay: store the HALS | * on HALSChanged in _ThreeWay_ finishes in ThreeWay: store the HALS | |||
* on LieRcvd in _ThreeWay_ finishes in ThreeWay: PROCESS_LIE | * on LieRcvd in _ThreeWay_ finishes in ThreeWay: PROCESS_LIE | |||
* on FloodLeadersChanged in _ThreeWay_ finishes in ThreeWay: update | * on FloodLeadersChanged in _ThreeWay_ finishes in ThreeWay: update | |||
skipping to change at line 2266 ¶ | skipping to change at line 2225 ¶ | |||
* on NeighborChangedAddress in _ThreeWay_ finishes in OneWay: no | * on NeighborChangedAddress in _ThreeWay_ finishes in OneWay: no | |||
action | action | |||
* on HALChanged in _ThreeWay_ finishes in ThreeWay: store the new | * on HALChanged in _ThreeWay_ finishes in ThreeWay: store the new | |||
HAL | HAL | |||
* on SendLie in _ThreeWay_ finishes in ThreeWay: SEND_LIE | * on SendLie in _ThreeWay_ finishes in ThreeWay: SEND_LIE | |||
* on MultipleNeighbors in MultipleNeighborsWait finishes in | * on MultipleNeighbors in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: start multiple neighbors' timers with the | MultipleNeighborsWait: start multiple neighbors timer with the | |||
interval _multiple_neighbors_lie_holdtime_multipler_ * | interval _multiple_neighbors_lie_holdtime_multiplier_ * | |||
_default_lie_holdtime_ | _default_lie_holdtime_ | |||
* on FloodLeadersChanged in MultipleNeighborsWait finishes in | * on FloodLeadersChanged in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: update _you_are_flood_repeater_ LIE | MultipleNeighborsWait: update _you_are_flood_repeater_ LIE | |||
elements based on the flood leader election results | elements based on the flood leader election results | |||
* on TimerTick in MultipleNeighborsWait finishes in | * on TimerTick in MultipleNeighborsWait finishes in | |||
MultipleNeighborsWait: check MultipleNeighbors timer, if the timer | MultipleNeighborsWait: check MultipleNeighbors timer, if the timer | |||
expired, PUSH MultipleNeighborsDone | expired, PUSH MultipleNeighborsDone | |||
skipping to change at line 2374 ¶ | skipping to change at line 2333 ¶ | |||
As an example illustrating a database holding both representations, | As an example illustrating a database holding both representations, | |||
the topology in Figure 2 with the optional link between spine 111 and | the topology in Figure 2 with the optional link between spine 111 and | |||
spine 112 (so that the flooding on an East-West link can be shown) is | spine 112 (so that the flooding on an East-West link can be shown) is | |||
shown below. Unnumbered interfaces are implicitly assumed and, for | shown below. Unnumbered interfaces are implicitly assumed and, for | |||
simplicity, the key value elements, which may be included in their | simplicity, the key value elements, which may be included in their | |||
South TIEs or North TIEs, are not shown. First, Figure 15 shows the | South TIEs or North TIEs, are not shown. First, Figure 15 shows the | |||
TIEs generated by some nodes. | TIEs generated by some nodes. | |||
ToF 21 South TIEs: | ToF 21 South TIEs: | |||
Node South TIE: | South Node TIE: | |||
NodeTIEElement(level=2, | NodeTIEElement(level=2, | |||
neighbors( | neighbors( | |||
(Spine 111, level 1, cost 1, links(...)), | (Spine 111, level 1, cost 1, links(...)), | |||
(Spine 112, level 1, cost 1, links(...)), | (Spine 112, level 1, cost 1, links(...)), | |||
(Spine 121, level 1, cost 1, links(...)), | (Spine 121, level 1, cost 1, links(...)), | |||
(Spine 122, level 1, cost 1, links(...)) | (Spine 122, level 1, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix South TIE: | South Prefix TIE: | |||
PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | |||
Spine 111 South TIEs: | Spine 111 South TIEs: | |||
Node South TIE: | South Node TIE: | |||
NodeTIEElement(level=1, | NodeTIEElement(level=1, | |||
neighbors( | neighbors( | |||
(ToF 21, level 2, cost 1, links(...)), | (ToF 21, level 2, cost 1, links(...)), | |||
(ToF 22, level 2, cost 1, links(...)), | (ToF 22, level 2, cost 1, links(...)), | |||
(Spine 112, level 1, cost 1, links(...)), | (Spine 112, level 1, cost 1, links(...)), | |||
(Leaf111, level 0, cost 1, links(...)), | (Leaf111, level 0, cost 1, links(...)), | |||
(Leaf112, level 0, cost 1, links(...)) | (Leaf112, level 0, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix South TIE: | South Prefix TIE: | |||
PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | |||
Spine 111 North TIEs: | Spine 111 North TIEs: | |||
Node North TIE: | North Node TIE: | |||
NodeTIEElement(level=1, | NodeTIEElement(level=1, | |||
neighbors( | neighbors( | |||
(ToF 21, level 2, cost 1, links(...)), | (ToF 21, level 2, cost 1, links(...)), | |||
(ToF 22, level 2, cost 1, links(...)), | (ToF 22, level 2, cost 1, links(...)), | |||
(Spine 112, level 1, cost 1, links(...)), | (Spine 112, level 1, cost 1, links(...)), | |||
(Leaf111, level 0, cost 1, links(...)), | (Leaf111, level 0, cost 1, links(...)), | |||
(Leaf112, level 0, cost 1, links(...)) | (Leaf112, level 0, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix North TIE: | North Prefix TIE: | |||
PrefixTIEElement(prefixes(Spine 111.loopback) | PrefixTIEElement(prefixes(Spine 111.loopback) | |||
Spine 121 South TIEs: | Spine 121 South TIEs: | |||
Node South TIE: | South Node TIE: | |||
NodeTIEElement(level=1, | NodeTIEElement(level=1, | |||
neighbors( | neighbors( | |||
(ToF 21, level 2, cost 1, links(...)), | (ToF 21, level 2, cost 1, links(...)), | |||
(ToF 22, level 2, cost 1, links(...)), | (ToF 22, level 2, cost 1, links(...)), | |||
(Leaf121, level 0, cost 1, links(...)), | (Leaf121, level 0, cost 1, links(...)), | |||
(Leaf122, level 0, cost 1, links(...)) | (Leaf122, level 0, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix South TIE: | South Prefix TIE: | |||
PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | PrefixTIEElement(prefixes(0/0, metric 1), (::/0, metric 1)) | |||
Spine 121 North TIEs: | Spine 121 North TIEs: | |||
Node North TIE: | North Node TIE: | |||
NodeTIEElement(level=1, | NodeTIEElement(level=1, | |||
neighbors( | neighbors( | |||
(ToF 21, level 2, cost 1, links(...)), | (ToF 21, level 2, cost 1, links(...)), | |||
(ToF 22, level 2, cost 1, links(...)), | (ToF 22, level 2, cost 1, links(...)), | |||
(Leaf121, level 0, cost 1, links(...)), | (Leaf121, level 0, cost 1, links(...)), | |||
(Leaf122, level 0, cost 1, links(...)) | (Leaf122, level 0, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix North TIE: | North Prefix TIE: | |||
PrefixTIEElement(prefixes(Spine 121.loopback) | PrefixTIEElement(prefixes(Spine 121.loopback) | |||
Leaf112 North TIEs: | Leaf112 North TIEs: | |||
Node North TIE: | North Node TIE: | |||
NodeTIEElement(level=0, | NodeTIEElement(level=0, | |||
neighbors( | neighbors( | |||
(Spine 111, level 1, cost 1, links(...)), | (Spine 111, level 1, cost 1, links(...)), | |||
(Spine 112, level 1, cost 1, links(...)) | (Spine 112, level 1, cost 1, links(...)) | |||
) | ) | |||
) | ) | |||
Prefix North TIE: | North Prefix TIE: | |||
PrefixTIEElement(prefixes(Leaf112.loopback, Prefix112, Prefix_MH)) | PrefixTIEElement(prefixes(Leaf112.loopback, Prefix112, Prefix_MH)) | |||
Figure 15: Example TIEs Generated in a 2-Level Spine-and-Leaf | Figure 15: Example TIEs Generated in a 2-Level Spine-and-Leaf | |||
Topology | Topology | |||
It may not be obvious here as to why the Node South TIEs contain all | It may not be obvious here as to why the South Node TIEs contain all | |||
the adjacencies of the corresponding node. This will be necessary | the adjacencies of the corresponding node. This will be necessary | |||
for algorithms further elaborated on in Sections 6.3.9 and 6.8.7. | for algorithms further elaborated on in Sections 6.3.9 and 6.8.7. | |||
For Node TIEs to carry more adjacencies than fit into an MTU-sized | For Node TIEs to carry more adjacencies than fit into an MTU-sized | |||
packet, the _neighbors_ element may contain a different set of | packet, the _neighbors_ element may contain a different set of | |||
neighbors in each TIE. Those disjointed sets of neighbors MUST be | neighbors in each TIE. Those disjointed sets of neighbors MUST be | |||
joined during corresponding computation. However, if the following | joined during corresponding computation. However, if the following | |||
occurs across multiple Node TIEs: | occurs across multiple Node TIEs: | |||
1. _capabilities_ do not match, | 1. _capabilities_ do not match *or* | |||
2. _flags_ values do not match, *or* | 2. _flags_ values do not match *or* | |||
3. the same neighbor repeats in multiple TIEs with different values. | 3. the same neighbor repeats in multiple TIEs with different values. | |||
The implementation is expected to use the value of any of the valid | The implementation is expected to use the value of any of the valid | |||
TIEs it received, as it cannot control the arrival order of those | TIEs it received, as it cannot control the arrival order of those | |||
TIEs. | TIEs. | |||
The _miscabled_links_ element SHOULD be included in every Node TIE; | The _miscabled_links_ element SHOULD be included in every Node TIE; | |||
otherwise, the behavior is undefined. | otherwise, the behavior is undefined. | |||
skipping to change at line 2578 ¶ | skipping to change at line 2537 ¶ | |||
return TIEHeader with larger tie_nr is larger | return TIEHeader with larger tie_nr is larger | |||
else: | else: | |||
return TIEHeader with larger TIEType is larger | return TIEHeader with larger TIEType is larger | |||
Figure 16: TIEHeader Comparison Function | Figure 16: TIEHeader Comparison Function | |||
All valid TIE types are defined in _TIETypeType_. This enum | All valid TIE types are defined in _TIETypeType_. This enum | |||
indicates what TIE type the TIE is carrying. In case the value is | indicates what TIE type the TIE is carrying. In case the value is | |||
not known to the receiver, the TIE MUST be reflooded with the scope | not known to the receiver, the TIE MUST be reflooded with the scope | |||
identical to the scope of a prefix TIE. This allows for future | identical to the scope of a prefix TIE. This allows for future | |||
extensions of the protocol within the same major schema with types | extensions of the protocol that are within the same major schema and | |||
opaque to some nodes with some restrictions defined in Section 7. | that have types that are opaque to some nodes; some restrictions are | |||
defined in Section 7. | ||||
6.3.3.1. Normative Flooding Procedures | 6.3.3.1. Normative Flooding Procedures | |||
On reception of a TIE with an undefined level value in the packet | On reception of a TIE with an undefined level value in the packet | |||
header, the node MUST issue a warning and discard the packet. | header, the node MUST issue a warning and discard the packet. | |||
This section specifies the precise, normative flooding mechanism and | This section specifies the precise, normative flooding mechanism and | |||
can be omitted unless the reader is pursuing an implementation of the | can be omitted unless the reader is pursuing an implementation of the | |||
protocol or looks for a deep understanding of underlying information | protocol or looks for a deep understanding of underlying information | |||
distribution mechanism. | distribution mechanism. | |||
skipping to change at line 2731 ¶ | skipping to change at line 2691 ¶ | |||
"stuck" in a part of a network while the originator reboots and | "stuck" in a part of a network while the originator reboots and | |||
reissues TIEs many times to the point its sequence number rolls over | reissues TIEs many times to the point its sequence number rolls over | |||
and forms an incomparable distance to the "stuck" copy), which | and forms an incomparable distance to the "stuck" copy), which | |||
implies that a comparison relation is possible between two elements. | implies that a comparison relation is possible between two elements. | |||
With that, it is implicitly possible to compare TIEs, TIEHeaders, and | With that, it is implicitly possible to compare TIEs, TIEHeaders, and | |||
TIEIDs to each other, whereas the shortest viable key is always | TIEIDs to each other, whereas the shortest viable key is always | |||
implied. | implied. | |||
6.3.3.1.2.1. TIDE Generation | 6.3.3.1.2.1. TIDE Generation | |||
As given by the timer constant, periodically generate TIDEs by: | ||||
NEXT_TIDE_ID: ID of the next TIE to be sent in the TIDE. | NEXT_TIDE_ID: ID of the next TIE to be sent in the TIDE. | |||
As given by the timer constant, periodically generate TIDEs by: | ||||
1. NEXT_TIDE_ID = MIN_TIEID | 1. NEXT_TIDE_ID = MIN_TIEID | |||
2. while NEXT_TIDE_ID is not equal to MAX_TIEID, do the following: | 2. while NEXT_TIDE_ID is not equal to MAX_TIEID do: | |||
a. HEADERS = Exactly TIRDEs_PER_PKT headers from FILTERED_TIEDB | a. HEADERS = Exactly TIRES_PER_TIDE_PKT headers from | |||
starting at NEXT_TIDE_ID, unless fewer than TIRDEs_PER_PKT | FILTERED_TIEDB starting at NEXT_TIDE_ID, unless fewer than | |||
remain, in which case all remaining headers. | TIRES_PER_TIDE_PKT remain, in which case all remaining | |||
headers. | ||||
b. if HEADERS is empty, then START = MIN_TIEID, else START = | b. if HEADERS is empty, then START = MIN_TIEID, else START = | |||
first element in HEADERS | first element in HEADERS | |||
c. if HEADERS' size is less than TIRDEs_PER_PKT, then END = | c. if HEADERS size is less than TIRES_PER_TIDE_PKT, then END = | |||
MAX_TIEID, else END = last element in HEADERS | MAX_TIEID, else END = last element in HEADERS | |||
d. send *sorted* HEADERS the as TIDE, setting START and END as | d. send *sorted* HEADERS as TIDE, setting START and END as its | |||
its range | range | |||
e. NEXT_TIDE_ID = END | e. NEXT_TIDE_ID = END | |||
The constant _TIRDEs_PER_PKT_ SHOULD be computed per interface and | The constant _TIRES_PER_TIDE_PKT_ SHOULD be computed per interface | |||
used by the implementation to limit the amount of TIE headers per | and used by the implementation to limit the amount of TIE headers per | |||
TIDE so the sent TIDE PDU does not exceed the interface of MTU. | TIDE so the sent TIDE PDU does not exceed the MTU of the interface. | |||
TIDE PDUs SHOULD be spaced on sending to prevent packet drops. | TIDE PDUs SHOULD be transmitted at a rate that does not lead to | |||
packet drops. | ||||
The algorithm will intentionally enter the loop once and send a | The algorithm will intentionally enter the loop once and send a | |||
single TIDE, even when the database is empty; otherwise, no TIDEs | single TIDE, even when the database is empty; otherwise, no TIDEs | |||
would be sent for in case of an empty database and break the intended | would be sent for in case of an empty database and break the intended | |||
synchronization. | synchronization. | |||
6.3.3.1.2.2. TIDE Processing | 6.3.3.1.2.2. TIDE Processing | |||
On reception of TIDEs, the following processing is performed: | ||||
TXKEYS: Collection of TIE headers to be sent after processing of the | TXKEYS: Collection of TIE headers to be sent after processing of the | |||
packet | packet | |||
REQKEYS: Collection of TIEIDs to be requested after processing of | REQKEYS: Collection of TIEIDs to be requested after processing of | |||
the packet | the packet | |||
CLEARKEYS: Collection of TIEIDs to be removed from flood state | CLEARKEYS: Collection of TIEIDs to be removed from flood state | |||
queues | queues | |||
LASTPROCESSED: Last processed TIEID in the TIDE | LASTPROCESSED: Last processed TIEID in the TIDE | |||
DBTIE: TIE in the Link State Database (LSDB), if found | DBTIE: TIE in the LSDB, if found | |||
On reception of TIDEs, the following processing is performed: | ||||
1. LASTPROCESSED = TIDE.start_range | 1. LASTPROCESSED = TIDE.start_range | |||
2. For every HEADER in the TIDE, do the following: | 2. For every HEADER in the TIDE do: | |||
a. DBTIE = find HEADER in the current LSDB | a. DBTIE = find HEADER in the current LSDB | |||
b. if HEADER < LASTPROCESSED, then report the error and reset | b. if HEADER < LASTPROCESSED, then report an error and reset the | |||
the adjacency and return | adjacency and return | |||
c. put all TIEs in LSDB, where TIE.HEADER > LASTPROCESSED and | c. put all TIEs in LSDB, where (TIE.HEADER > LASTPROCESSED and | |||
TIE.HEADER < HEADER, into TXKEYS | TIE.HEADER < HEADER) into TXKEYS | |||
d. LASTPROCESSED = HEADER | d. LASTPROCESSED = HEADER | |||
e. if DBTIE is not found, then | e. if DBTIE is not found, then | |||
i. if originator is this node, then bump_own_tie | i. if originator is this node, then bump_own_tie | |||
ii. else put HEADER into REQKEYS | ii. else put HEADER into REQKEYS | |||
f. if DBTIE.HEADER < HEADER, then | f. if DBTIE.HEADER < HEADER then | |||
i. if the originator is this node, then bump_own_tie, else | i. if the originator is this node, then bump_own_tie, else | |||
1. if this is a North TIE header from a northbound | 1. if this is a North TIE header from a northbound | |||
neighbor, then override DBTIE in LSDB with HEADER | neighbor, then override DBTIE in LSDB with HEADER | |||
2. else put HEADER into REQKEYS | 2. else put HEADER into REQKEYS | |||
g. if DBTIE.HEADER > HEADER, then put DBTIE.HEADER into TXKEYS | g. if DBTIE.HEADER > HEADER, then put DBTIE.HEADER into TXKEYS | |||
h. if DBTIE.HEADER = HEADER, then | h. if DBTIE.HEADER = HEADER, then | |||
i. if DBTIE has content already, then put DBTIE.HEADER into | i. if DBTIE has content already, then put DBTIE.HEADER into | |||
CLEARKEYS, else | CLEARKEYS, else | |||
ii. put HEADER into REQKEYS | ii. put HEADER into REQKEYS | |||
3. put all TIEs in LSDB, where TIE.HEADER > LASTPROCESSED and | 3. put all TIEs in LSDB, where (TIE.HEADER > LASTPROCESSED and | |||
TIE.HEADER <= TIDE.end_range, into TXKEYS | TIE.HEADER <= TIDE.end_range) into TXKEYS | |||
4. for all TIEs in TXKEYS, try_to_transmit_tie(TIE) | 4. for all TIEs in TXKEYS, try_to_transmit_tie(TIE) | |||
5. for all TIEs in REQKEYS, request_tie(TIE) | 5. for all TIEs in REQKEYS, request_tie(TIE) | |||
6. for all TIEs in CLEARKEYS, remove_from_all_queues(TIE) | 6. for all TIEs in CLEARKEYS, remove_from_all_queues(TIE) | |||
6.3.3.1.3. TIREs | 6.3.3.1.3. TIREs | |||
6.3.3.1.3.1. TIRE Generation | 6.3.3.1.3.1. TIRE Generation | |||
Elements from both TIES_REQ and TIES_ACK MUST be collected and sent | Elements from both TIES_REQ and TIES_ACK MUST be collected and sent | |||
out as fast as feasible as TIREs. When sending TIREs with elements | out as fast as feasible as TIREs. When sending TIREs with elements | |||
from TIES_REQ, the _remaining_lifetime_ field in | from TIES_REQ, the _remaining_lifetime_ field in | |||
_TIEHeaderWithLifeTime_ MUST be set to 0 to force reflooding from the | _TIEHeaderWithLifeTime_ MUST be set to 0 to force reflooding from the | |||
neighbor even if the TIEs seem to be the same. | neighbor even if the TIEs seem to be the same. | |||
6.3.3.1.3.2. TIRE Processing | 6.3.3.1.3.2. TIRE Processing | |||
On reception of TIREs, the following processing is performed: | ||||
TXKEYS: Collection of TIE headers to be sent after processing of the | TXKEYS: Collection of TIE headers to be sent after processing of the | |||
packet | packet | |||
REQKEYS: Collection of TIEIDs to be requested after processing of | REQKEYS: Collection of TIEIDs to be requested after processing of | |||
the packet | the packet | |||
ACKKEYS: Collection of TIEIDs that have been acknowledged | ACKKEYS: Collection of TIEIDs that have been acknowledged | |||
DBTIE: TIE in the LSDB, if found | DBTIE: TIE in the LSDB, if found | |||
1. for every HEADER in TIRE, do the following: | On reception of TIREs, the following processing is performed: | |||
1. for every HEADER in TIRE do: | ||||
a. DBTIE = find HEADER in the current LSDB | a. DBTIE = find HEADER in the current LSDB | |||
b. if DBTIE is not found, then do nothing | b. if DBTIE is not found, then do nothing | |||
c. if DBTIE.HEADER < HEADER, then put HEADER into REQKEYS | c. if DBTIE.HEADER < HEADER, then put HEADER into REQKEYS | |||
d. if DBTIE.HEADER > HEADER, then put DBTIE.HEADER into TXKEYS | d. if DBTIE.HEADER > HEADER, then put DBTIE.HEADER into TXKEYS | |||
e. if DBTIE.HEADER = HEADER, then put DBTIE.HEADER into ACKKEYS | e. if DBTIE.HEADER = HEADER, then put DBTIE.HEADER into ACKKEYS | |||
skipping to change at line 2886 ¶ | skipping to change at line 2848 ¶ | |||
TXTIE: TIE to transmit | TXTIE: TIE to transmit | |||
DBTIE: TIE in the LSDB, if found | DBTIE: TIE in the LSDB, if found | |||
1. DBTIE = find TIE in the current LSDB | 1. DBTIE = find TIE in the current LSDB | |||
2. if DBTIE is not found, then | 2. if DBTIE is not found, then | |||
a. if the originator is this node, then bump_own_tie with a | a. if the originator is this node, then bump_own_tie with a | |||
short remaining lifetime, else | short remaining lifetime | |||
b. insert TIE into LSDB and ACKTIE = TIE | b. else insert TIE into LSDB and ACKTIE = TIE | |||
else | else | |||
a. if DBTIE.HEADER = TIE.HEADER, then | a. if DBTIE.HEADER = TIE.HEADER, then | |||
i. if DBTIE has content already, then ACKTIE = TIE, else | i. if DBTIE has content already, then ACKTIE = TIE | |||
ii. process like the "DBTIE.HEADER < TIE.HEADER" case | ii. else process like the "DBTIE.HEADER < TIE.HEADER" case | |||
b. if DBTIE.HEADER < TIE.HEADER, then | b. if DBTIE.HEADER < TIE.HEADER, then | |||
i. if the originator is this node, then bump_own_tie, else | i. if the originator is this node, then bump_own_tie | |||
ii. insert TIE into LSDB and ACKTIE = TIE | ii. else insert TIE into LSDB and ACKTIE = TIE | |||
c. if DBTIE.HEADER > TIE.HEADER, then | c. if DBTIE.HEADER > TIE.HEADER, then | |||
i. if DBTIE has content already, then TXTIE = DBTIE, else | i. if DBTIE has content already, then TXTIE = DBTIE | |||
ii. ACKTIE = DBTIE | ii. else ACKTIE = DBTIE | |||
3. if TXTIE is set, then try_to_transmit_tie(TXTIE) | 3. if TXTIE is set, then try_to_transmit_tie(TXTIE) | |||
4. if ACKTIE is set, then ack_tie(TIE) | 4. if ACKTIE is set, then ack_tie(TIE) | |||
6.3.3.1.5. Sending TIEs | 6.3.3.1.5. Sending TIEs | |||
On a periodic basis, all TIEs with a lifetime of > 0 left MUST be | On a periodic basis, all TIEs with a lifetime of > 0 left MUST be | |||
sent out on the adjacency, removed from the TIES_TX list, and | sent out on the adjacency, removed from the TIES_TX list, and | |||
requeued onto TIES_RTX list. The specific period is out of scope for | requeued onto TIES_RTX list. The specific period is out of scope for | |||
this document. | this document. | |||
6.3.3.1.6. TIEs Processing in LSDB | 6.3.3.1.6. TIEs Processing in LSDB | |||
The Link State Database (LSDB) holds the most recent copy of TIEs | The LSDB holds the most recent copy of TIEs received via flooding | |||
received via flooding from according peers. Consecutively, after | from according peers. Consecutively, after version tie-breaking by | |||
version tie-breaking by LSDB, a peer receives from the LSDB the | LSDB, a peer receives from the LSDB the newest versions of TIEs | |||
newest versions of TIEs received by other peers and processes them | received by other peers and processes them (without any filtering) | |||
(without any filtering) just like receiving TIEs from its remote | just like receiving TIEs from its remote peer. Such a publisher | |||
peer. Such a publisher model can be implemented in several ways, | model can be implemented in several ways, either in a single thread | |||
either in a single thread of execution or in multiple parallel | of execution or in multiple parallel threads. | |||
threads. | ||||
LSDB can be logically considered as the entity aging out TIEs, i.e., | LSDB can be logically considered as the entity aging out TIEs, i.e., | |||
being responsible to discard TIEs that are stored longer than | being responsible to discard TIEs that are stored longer than | |||
_remaining_lifetime_ on their reception. | _remaining_lifetime_ on their reception. | |||
LSDB is also expected to periodically reoriginate the node's own | LSDB is also expected to periodically reoriginate the node's own | |||
TIEs. Originating at an interval significantly shorter than | TIEs. Originating at an interval significantly shorter than | |||
_default_lifetime_ is RECOMMENDED to prevent TIE expiration by other | _default_lifetime_ is RECOMMENDED to prevent TIE expiration by other | |||
nodes in the network, which can lead to instabilities. | nodes in the network, which can lead to instabilities. | |||
6.3.4. TIE Flooding Scopes | 6.3.4. TIE Flooding Scopes | |||
In a somewhat analogous fashion to link-local, area, and domain | In a somewhat analogous fashion to link-local, area, and domain | |||
flooding scopes, RIFT defines several complex "flooding scopes", | flooding scopes, RIFT defines several complex "flooding scopes", | |||
depending on the direction and type of TIE propagated. | depending on the direction and type of TIE propagated. | |||
Every North TIE is flooded northbound, providing a node at a given | Every North TIE is flooded northbound, providing a node at a given | |||
level with the complete topology of the Clos or Fat Tree network that | level with the complete topology of the Clos or fat tree network that | |||
is reachable southwards of it, including all specific prefixes. This | is reachable southwards of it, including all specific prefixes. This | |||
means that a packet received from a node at the same or lower level | means that a packet received from a node at the same or lower level | |||
whose destination is covered by one of those specific prefixes will | whose destination is covered by one of those specific prefixes will | |||
be routed directly towards the node advertising that prefix, rather | be routed directly towards the node advertising that prefix, rather | |||
than sending the packet to a node at a higher level. | than sending the packet to a node at a higher level. | |||
A node's Node South TIEs, consisting of all node's adjacencies and | A node's South Node TIEs, consisting of all node's adjacencies and | |||
prefix South TIEs limited to those related to default IP prefix and | South Prefix TIEs limited to those related to default IP prefix and | |||
disaggregated prefixes, are flooded southbound in order to inform | disaggregated prefixes, are flooded southbound in order to inform | |||
nodes one level down of connectivity of the higher level as well as | nodes one level down of connectivity of the higher level as well as | |||
reachability to the rest of the fabric. In order to allow an E-W | reachability to the rest of the fabric. In order to allow an E-W | |||
disconnected node in a given level to receive the South TIEs of other | disconnected node in a given level to receive the South TIEs of other | |||
nodes at its level, every Node South TIE is "reflected" northbound to | nodes at its level, every South Node TIE is "reflected" northbound to | |||
the level from which it was received. It should be noted that East- | the level from which it was received. It should be noted that East- | |||
West links are included in South TIE flooding (except at the ToF | West links are included in South TIE flooding (except at the ToF | |||
level); those TIEs need to be flooded to satisfy the algorithms | level); those TIEs need to be flooded to satisfy the algorithms | |||
described in Section 6.4. In that way, nodes at same level can learn | described in Section 6.4. In that way, nodes at same level can learn | |||
about each other without using a lower level except in case of leaf | about each other without using a lower level except in case of leaf | |||
level. The precise, normative flooding scopes are given in Table 3. | level. The precise, normative flooding scopes are given in Table 3. | |||
Those rules also govern what SHOULD be included in TIDEs on the | Those rules also govern what SHOULD be included in TIDEs on the | |||
adjacency. Again, East-West flooding scopes are identical to | adjacency. Again, East-West flooding scopes are identical to | |||
southern flooding scopes, except in case of ToF East-West links | southern flooding scopes, except in case of ToF East-West links | |||
(rings), which are basically performing northbound flooding. | (rings), which are basically performing northbound flooding. | |||
Node South TIE "south reflection" enables support of positive | South Node TIE "south reflection" enables support of positive | |||
disaggregation on failures, as described in Section 6.5, and flooding | disaggregation on failures, as described in Section 6.5, and flooding | |||
reduction, as described in Section 6.3.9. | reduction, as described in Section 6.3.9. | |||
+===========+======================+==============+=================+ | +===========+======================+==============+=================+ | |||
| Type / | South | North | East-West | | | Type / | South | North | East-West | | |||
| Direction | | | | | | Direction | | | | | |||
+===========+======================+==============+=================+ | +===========+======================+==============+=================+ | |||
| Node | flood if the level | flood if the | flood only if | | | South | flood if the level | flood if the | flood only if | | |||
| South TIE | of the originator | level of the | this node is | | | Node TIE | of the originator | level of the | this node is | | |||
| | is equal to this | originator | not ToF | | | | is equal to this | originator | not ToF | | |||
| | node | is higher | | | | | node | is higher | | | |||
| | | than this | | | | | | than this | | | |||
| | | node | | | | | | node | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| non-Node | flood self- | flood only | flood only if | | | non-Node | flood self- | flood only | flood only if | | |||
| South TIE | originated only | if the | it is self- | | | South TIE | originated only | if the | it is self- | | |||
| | | neighbor is | originated and | | | | | neighbor is | originated and | | |||
| | | the | this node is | | | | | the | this node is | | |||
| | | originator | not ToF | | | | | originator | not ToF | | |||
| | | of TIE | | | | | | of TIE | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| all North | never flood | flood always | flood only if | | | all North | never flood | flood always | flood only if | | |||
| TIEs | | | this node is | | | TIEs | | | this node is | | |||
| | | | ToF | | | | | | ToF | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| TIDE | include at least | include at | if this node is | | | TIDE | include at least | include at | if this node is | | |||
| | all non-self- | least all | ToF, then | | | | all non-self- | least all | ToF, then | | |||
| | originated North | Node South | include all | | | | originated North | South Node | include all | | |||
| | TIE headers and | TIEs and all | North TIEs; | | | | TIE headers and | TIEs and all | North TIEs; | | |||
| | self-originated | South TIEs | otherwise, only | | | | self-originated | South TIEs | otherwise, only | | |||
| | South TIE headers | originated | include self- | | | | South TIE headers | originated | include self- | | |||
| | and Node South TIEs | by a peer | originated TIEs | | | | and South Node TIEs | by a peer | originated TIEs | | |||
| | of nodes at same | and all | | | | | of nodes at same | and all | | | |||
| | level | North TIEs | | | | | level | North TIEs | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| TIRE as | request all North | request all | if this node is | | | TIRE as | request all North | request all | if this node is | | |||
| Request | TIEs and all peer's | South TIEs | ToF, then apply | | | Request | TIEs and all peer's | South TIEs | ToF, then apply | | |||
| | self-originated | | north scope | | | | self-originated | | north scope | | |||
| | TIEs and all Node | | rules; | | | | TIEs and all South | | rules; | | |||
| | South TIEs | | otherwise, | | | | Node TIEs | | otherwise, | | |||
| | | | apply south | | | | | | apply south | | |||
| | | | scope rules | | | | | | scope rules | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
| TIRE as | Ack all received | Ack all | Ack all | | | TIRE as | Ack all received | Ack all | Ack all | | |||
| Ack | TIEs | received | received TIEs | | | Ack | TIEs | received | received TIEs | | |||
| | | TIEs | | | | | | TIEs | | | |||
+-----------+----------------------+--------------+-----------------+ | +-----------+----------------------+--------------+-----------------+ | |||
Table 3: Normative Flooding Scopes | Table 3: Normative Flooding Scopes | |||
skipping to change at line 3039 ¶ | skipping to change at line 3000 ¶ | |||
To illustrate these rules, consider using the topology in Figure 2, | To illustrate these rules, consider using the topology in Figure 2, | |||
with the optional link between spine 111 and spine 112, and the | with the optional link between spine 111 and spine 112, and the | |||
associated TIEs given in Figure 15. The flooding from particular | associated TIEs given in Figure 15. The flooding from particular | |||
nodes of the TIEs is given in Table 4. | nodes of the TIEs is given in Table 4. | |||
+============+==========+===========================================+ | +============+==========+===========================================+ | |||
| Local | Neighbor | TIEs Flooded from Local to Neighbor Node | | | Local | Neighbor | TIEs Flooded from Local to Neighbor Node | | |||
| Node | Node | | | | Node | Node | | | |||
+============+==========+===========================================+ | +============+==========+===========================================+ | |||
| Leaf111 | Spine | Leaf111 North TIEs, Spine 111 Node South | | | Leaf111 | Spine | Leaf111 North TIEs, Spine 111 South Node | | |||
| | 112 | TIE | | | | 112 | TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Leaf111 | Spine | Leaf111 North TIEs, Spine 112 Node South | | | Leaf111 | Spine | Leaf111 North TIEs, Spine 112 South Node | | |||
| | 111 | TIE | | | | 111 | TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ... | ... | ... | | | ... | ... | ... | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | Leaf111 | Spine 111 South TIEs | | | Spine | Leaf111 | Spine 111 South TIEs | | |||
| 111 | | | | | 111 | | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | Leaf112 | Spine 111 South TIEs | | | Spine | Leaf112 | Spine 111 South TIEs | | |||
| 111 | | | | | 111 | | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | Spine | Spine 111 South TIEs | | | Spine | Spine | Spine 111 South TIEs | | |||
| 111 | 112 | | | | 111 | 112 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | ToF 21 | Spine 111 North TIEs, Leaf111 North TIEs, | | | Spine | ToF 21 | Spine 111 North TIEs, Leaf111 North TIEs, | | |||
| 111 | | Leaf112 North TIEs, ToF 22 Node South TIE | | | 111 | | Leaf112 North TIEs, ToF 22 South Node TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| Spine | ToF 22 | Spine 111 North TIEs, Leaf111 North TIEs, | | | Spine | ToF 22 | Spine 111 North TIEs, Leaf111 North TIEs, | | |||
| 111 | | Leaf112 North TIEs, ToF 21 Node South TIE | | | 111 | | Leaf112 North TIEs, ToF 21 South Node TIE | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ... | ... | ... | | | ... | ... | ... | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ToF 21 | Spine | ToF 21 South TIEs | | | ToF 21 | Spine | ToF 21 South TIEs | | |||
| | 111 | | | | | 111 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ToF 21 | Spine | ToF 21 South TIEs | | | ToF 21 | Spine | ToF 21 South TIEs | | |||
| | 112 | | | | | 112 | | | |||
+------------+----------+-------------------------------------------+ | +------------+----------+-------------------------------------------+ | |||
| ToF 21 | Spine | ToF 21 South TIEs | | | ToF 21 | Spine | ToF 21 South TIEs | | |||
skipping to change at line 3105 ¶ | skipping to change at line 3066 ¶ | |||
guarantees correct behavior of algorithms like disaggregation or | guarantees correct behavior of algorithms like disaggregation or | |||
default route origination. Furthermore though, the use of this bit | default route origination. Furthermore though, the use of this bit | |||
presents an inherent trade-off between processing load and | presents an inherent trade-off between processing load and | |||
convergence speed since significantly slowing down flooding of | convergence speed since significantly slowing down flooding of | |||
northbound prefixes from neighbors for an extended time will lead to | northbound prefixes from neighbors for an extended time will lead to | |||
traffic losses. | traffic losses. | |||
6.3.6. Initial and Periodic Database Synchronization | 6.3.6. Initial and Periodic Database Synchronization | |||
The initial exchange of RIFT includes periodic TIDE exchanges that | The initial exchange of RIFT includes periodic TIDE exchanges that | |||
contain descriptions of the link state database and TIREs, which | contain descriptions of the LSDB and TIREs, which perform the | |||
perform the function of requesting unknown TIEs as well as confirming | function of requesting unknown TIEs as well as confirming the | |||
the reception of flooded TIEs. The content of TIDEs and TIREs is | reception of flooded TIEs. The content of TIDEs and TIREs is | |||
governed by Table 3. | governed by Table 3. | |||
6.3.7. Purging and Rollovers | 6.3.7. Purging and Rollovers | |||
When a node exits in the network, if "unpurged", residual stale TIEs | When a node exits the network, if "unpurged", residual stale TIEs may | |||
may exist in the network until their lifetimes expire (which in case | exist in the network until their lifetimes expire (which in case of | |||
of RIFT is by default a rather long period to prevent ongoing | RIFT is by default a rather long period to prevent ongoing | |||
reorigination of TIEs in very large topologies). RIFT does not have | reorigination of TIEs in very large topologies). RIFT does not have | |||
a "purging mechanism" based on sending specialized "purge" packets. | a "purging mechanism" based on sending specialized "purge" packets. | |||
In other routing protocols, such a mechanism has proven to be complex | In other routing protocols, such a mechanism has proven to be complex | |||
and fragile based on many years of experience. RIFT simply issues a | and fragile based on many years of experience. RIFT simply issues a | |||
new, i.e., higher sequence number, empty version of the TIE with a | new, i.e., higher sequence number, empty version of the TIE with a | |||
short lifetime given by the _purge_lifetime_ constant and relies on | short lifetime given by the _purge_lifetime_ constant and relies on | |||
each node to age out and delete each TIE copy independently. | each node to age out and delete each TIE copy independently. | |||
Abundant amounts of memory are available today, even on low-end | Abundant amounts of memory are available today, even on low-end | |||
platforms, and hence, keeping those relatively short-lived extra | platforms, and hence, keeping those relatively short-lived extra | |||
copies for a while is acceptable. The information will age out and, | copies for a while is acceptable. The information will age out and, | |||
skipping to change at line 3154 ¶ | skipping to change at line 3115 ¶ | |||
propagation and processing delay by all the nodes that are within the | propagation and processing delay by all the nodes that are within the | |||
TIE's flooding scope. | TIE's flooding scope. | |||
TIE sequence numbers are rolled over using the method described in | TIE sequence numbers are rolled over using the method described in | |||
Appendix A . The first sequence number of any spontaneously | Appendix A . The first sequence number of any spontaneously | |||
originated TIE (i.e., not originated to override a detected older | originated TIE (i.e., not originated to override a detected older | |||
copy in the network) MUST be a reasonably unpredictable random number | copy in the network) MUST be a reasonably unpredictable random number | |||
(for example, [RFC4086]) in the interval [0, 2^30-1], which will | (for example, [RFC4086]) in the interval [0, 2^30-1], which will | |||
prevent otherwise identical TIE headers to remain "stuck" in the | prevent otherwise identical TIE headers to remain "stuck" in the | |||
network with content different from the TIE originated after reboot. | network with content different from the TIE originated after reboot. | |||
In traditional link-state protocols, this is delegated to a 16-bit | In typical link-state protocols, this is delegated to a 16-bit | |||
checksum on packet content. RIFT avoids this design due to the CPU | checksum on packet content. RIFT avoids this design due to the CPU | |||
burden presented by computation of such checksums and additional | burden presented by computation of such checksums and additional | |||
complications tied to the fact that the checksum must be "patched" | complications tied to the fact that the checksum must be "patched" | |||
into the packet after the generation of the content, which is a | into the packet after the generation of the content, which is a | |||
difficult proposition in binary, hand-crafted formats already and | difficult proposition in binary, hand-crafted formats already and | |||
highly incompatible with model-based, serialized formats. The | highly incompatible with model-based, serialized formats. The | |||
sequence number space is hence consciously chosen to be 64-bits wide | sequence number space is hence consciously chosen to be 64-bits wide | |||
to make the occurrence of a TIE with the same sequence number but | to make the occurrence of a TIE with the same sequence number but | |||
different content as much or even more unlikely than the checksum | different content as much or even more unlikely than the checksum | |||
method. To emulate the "checksum behavior", an implementation could | method. To emulate the "checksum behavior", an implementation could | |||
skipping to change at line 3180 ¶ | skipping to change at line 3141 ¶ | |||
Under certain conditions, nodes issue a default route in their South | Under certain conditions, nodes issue a default route in their South | |||
Prefix TIEs with costs as computed in Section 6.8.7.1. | Prefix TIEs with costs as computed in Section 6.8.7.1. | |||
A node X that | A node X that | |||
1. is *not* overloaded *and* | 1. is *not* overloaded *and* | |||
2. has southbound or East-West adjacencies | 2. has southbound or East-West adjacencies | |||
SHOULD originate such a default route in its south prefix TIE if and | SHOULD originate such a default route in its South Prefix TIE if and | |||
only if | only if | |||
1. all other nodes at X's' level are overloaded, | 1. all other nodes at X's level are overloaded *or* | |||
2. all other nodes at X's' level have NO northbound adjacencies, | 2. all other nodes at X's level have NO northbound adjacencies, *or* | |||
*or* | ||||
3. X has computed reachability to a default route during N-SPF. | 3. X has computed reachability to a default route during N-SPF. | |||
The term "all other nodes at X's' level " obviously describes just | The term "all other nodes at X's level" obviously describes just the | |||
the nodes at the same level in the PoD with a viable lower level | nodes at the same level in the PoD with a viable lower level | |||
(otherwise, the Node South TIEs cannot be reflected; the nodes in PoD | (otherwise, the South Node TIEs cannot be reflected; the nodes in PoD | |||
1 and PoD 2 are "invisible" to each other). | 1 and PoD 2 are "invisible" to each other). | |||
A node originating a southbound default route SHOULD install a | A node originating a southbound default route SHOULD install a | |||
default discard route if it did not compute a default route during | default discard route if it did not compute a default route during | |||
N-SPF. This basically means that the top of the fabric will drop | N-SPF. This basically means that the top of the fabric will drop | |||
traffic for unreachable addresses. | traffic for unreachable addresses. | |||
6.3.9. Northbound TIE Flooding Reduction | 6.3.9. Northbound TIE Flooding Reduction | |||
RIFT chooses only a subset of northbound nodes to propagate flooding | RIFT chooses only a subset of northbound nodes to propagate flooding | |||
skipping to change at line 3258 ¶ | skipping to change at line 3218 ¶ | |||
In a fully connected Clos network, this means that a node selects one | In a fully connected Clos network, this means that a node selects one | |||
arbitrary parent as the FR and then a second one for redundancy. The | arbitrary parent as the FR and then a second one for redundancy. The | |||
computation can be relatively simple and completely distributed | computation can be relatively simple and completely distributed | |||
without any need for synchronization among nodes. In a "PoD" | without any need for synchronization among nodes. In a "PoD" | |||
structure, where the level L+2 is partitioned into silos of | structure, where the level L+2 is partitioned into silos of | |||
equivalent grandparents that are only reachable from respective | equivalent grandparents that are only reachable from respective | |||
parents, this means treating each silo as a fully connected Clos | parents, this means treating each silo as a fully connected Clos | |||
network and solving the problem within the silo. | network and solving the problem within the silo. | |||
In terms of signaling, a node has enough information to select its | In terms of signaling, a node has enough information to select its | |||
set of FRs; this information is derived from the node's parents' Node | set of FRs; this information is derived from the node's parents' | |||
South TIEs, which indicate the parent's reachable northbound | South Node TIEs, which indicate the parent's reachable northbound | |||
adjacencies to its own parents (the node's grandparents). A node may | adjacencies to its own parents (the node's grandparents). A node may | |||
send a LIE to a northbound neighbor with the optional boolean field | send a LIE to a northbound neighbor with the optional boolean field | |||
_you_are_flood_repeater_ set to false to indicate that the northbound | _you_are_flood_repeater_ set to false to indicate that the northbound | |||
neighbor is not a flood repeater for the node that sent the LIE. In | neighbor is not a flood repeater for the node that sent the LIE. In | |||
that case, the northbound neighbor SHOULD NOT reflood northbound TIEs | that case, the northbound neighbor SHOULD NOT reflood northbound TIEs | |||
received from the node that sent the LIE. If | received from the node that sent the LIE. If | |||
_you_are_flood_repeater_ is absent or _you_are_flood_repeater_ is set | _you_are_flood_repeater_ is absent or _you_are_flood_repeater_ is set | |||
to true, then the northbound neighbor is a flood repeater for the | to true, then the northbound neighbor is a flood repeater for the | |||
node that sent the LIE and MUST reflood northbound TIEs received from | node that sent the LIE and MUST reflood northbound TIEs received from | |||
that node. The element _you_are_flood_repeater_ MUST be ignored if | that node. The element _you_are_flood_repeater_ MUST be ignored if | |||
skipping to change at line 3299 ¶ | skipping to change at line 3259 ¶ | |||
bidirectionally reachable over adjacency ADJ(N, P); | bidirectionally reachable over adjacency ADJ(N, P); | |||
* let G be a grandparent node of N, reachable transitively via a | * let G be a grandparent node of N, reachable transitively via a | |||
parent P over adjacencies ADJ(N, P) and ADJ(P, G). Observe that N | parent P over adjacencies ADJ(N, P) and ADJ(P, G). Observe that N | |||
does not have enough information to check bidirectional | does not have enough information to check bidirectional | |||
reachability of ADJ(P, G); | reachability of ADJ(P, G); | |||
* let R be a redundancy constant integer; a value of 2 or higher for | * let R be a redundancy constant integer; a value of 2 or higher for | |||
R is RECOMMENDED; | R is RECOMMENDED; | |||
* let S be a similarly constant integer; a value in range 0 .. 2 for | * let S be a similarity constant integer; a value in range 0 .. 2 | |||
S is RECOMMENDED, and the value of 1 SHOULD be used. Two | for S is RECOMMENDED, and the value of 1 SHOULD be used. Two | |||
cardinalities are considered as equivalent if their absolute | cardinalities are considered as equivalent if their absolute | |||
difference is less than or equal to S, i.e., |a-b|<=S; and | difference is less than or equal to S, i.e., |a-b|<=S | |||
* let RND be a 64-bit random number (for example, as described in | * let RND be a 64-bit random number (for example, as described in | |||
[RFC4086]) generated by the system once on startup. | [RFC4086]) generated by the system once on startup. | |||
The algorithm consists of the following steps: | The algorithm consists of the following steps: | |||
1. Derive a 64-bit number by XORing N's System ID with RND. | 1. Derive a 64-bit number by XORing N's System ID with RND. | |||
2. Derive a 16-bit pseudo-random unsigned integer PR(N) from the | 2. Derive a 16-bit pseudo-random unsigned integer PR(N) from the | |||
resulting 64-bit number by splitting it into 16-bit-long words | resulting 64-bit number by splitting it into 16-bit-long words | |||
W1, W2, W3, W4 (where W1 are the least significant 16 bits of the | W1, W2, W3, W4 (where W1 are the least significant 16 bits of the | |||
64-bit number, and W4 are the most significant 16 bits) and then | 64-bit number, and W4 are the most significant 16 bits) and then | |||
XORing the circularly shifted resulting words together: | XORing the circularly shifted resulting words together: | |||
(W1<<1) xor (W2<<2) xor (W3<<3) xor (W4<<4); where << is the | A. (W1<<1) xor (W2<<2) xor (W3<<3) xor (W4<<4); | |||
circular shift operator. | ||||
where << is the circular shift operator. | ||||
3. Sort the parents by decreasing number of northbound adjacencies | 3. Sort the parents by decreasing number of northbound adjacencies | |||
(using decreasing System ID of the parent as a tie-breaker): | (using decreasing System ID of the parent as a tie-breaker): | |||
sort |P(N) by decreasing CN(P), for all P in |P(N), as the | sort |P(N) by decreasing CN(P), for all P in |P(N), as the | |||
ordered array |A(N) | ordered array |A(N) | |||
4. Partition |A(N) in subarrays |A_k(N) of parents with equivalent | 4. Partition |A(N) in subarrays |A_k(N) of parents with equivalent | |||
cardinality of northbound adjacencies (in other words, with | cardinality of northbound adjacencies (in other words, with | |||
equivalent number of grandparents they can reach): | equivalent number of grandparents they can reach): | |||
a. set k=0; // k is the ID of the subarray | a. set k=0; // k is the ID of the subarray | |||
b. set i=0; | b. set i=0; | |||
c. while i < CN(N) do the following: | c. while i < CN(N) do | |||
i. set j=i; | i. set j=i; | |||
ii. while i < CN(N) and CN(|A(N)[j]) - CN(|A(N)[i]) <= S: | ii. while i < CN(N) and CN(|A(N)[j]) - CN(|A(N)[i]) <= S: | |||
1. place |A(N)[i] in |A_k(N) // abstract action, maybe | 1. place |A(N)[i] in |A_k(N) // abstract action, maybe | |||
noop | noop | |||
2. set i=i+1; | 2. set i=i+1; | |||
iii. /* At this point, j is the index in |A(N) of the first | iii. /* At this point, j is the index in |A(N) of the first | |||
member of |A_k(N) and (i-j) is C_k(N) defined as the | member of |A_k(N) and (i-j) is C_k(N) defined as the | |||
cardinality of |A_k(N). */ | cardinality of |A_k(N). */ | |||
set k=k+1. | set k=k+1; | |||
/* At this point, k is the total number of subarrays, initialized | /* At this point, k is the total number of subarrays, initialized | |||
for the shuffling operation below. */ | for the shuffling operation below. */ | |||
5. Shuffle each subarrays |A_k(N) of cardinality C_k(N) within |A(N) | 5. Shuffle each subarrays |A_k(N) of cardinality C_k(N) within |A(N) | |||
individually using the Durstenfeld variation of the Fisher-Yates | individually using the Durstenfeld variation of the Fisher-Yates | |||
algorithm that depends on N's System ID: | algorithm that depends on N's System ID: | |||
a. while k > 0 do the following: | a. while k > 0 do | |||
i. for i from C_k(N)-1 to 1 decrementing by 1, do the | i. for i from C_k(N)-1 to 1 decrementing by 1 do | |||
following: | ||||
1. set j to PR(N) modulo i; | 1. set j to PR(N) modulo i; | |||
2. exchange |A_k[j] and |A_k[i]; | 2. exchange |A_k[j] and |A_k[i]; | |||
ii. set k=k-1. | ii. set k=k-1; | |||
6. For each grandparent G, initialize a counter c(G) with the number | 6. For each grandparent G, initialize a counter c(G) with the number | |||
of its southbound adjacencies to elected flood repeaters (which | of its southbound adjacencies to elected flood repeaters (which | |||
is initially zero): | is initially zero): | |||
a. for each G in |G(N), set c(G) = 0. | a. for each G in |G(N), set c(G) = 0; | |||
7. Finally, only keep FRs as parents that are needed to maintain the | 7. Finally, only keep FRs as parents that are needed to maintain the | |||
number of adjacencies between the FRs and any grandparent G equal | number of adjacencies between the FRs and any grandparent G equal | |||
or above the redundancy constant R: | or above the redundancy constant R: | |||
a. for each P in reshuffled |A(N): | a. for each P in reshuffled |A(N): | |||
i. if there exists an adjacency ADJ(P, G) in |NA(P) such | i. if there exists an adjacency ADJ(P, G) in |NA(P) such | |||
that c(G) < R, then | that c(G) < R, then | |||
skipping to change at line 3426 ¶ | skipping to change at line 3386 ¶ | |||
Item 6. | Item 6. | |||
5. The indication of flood reduction capability MUST be carried in | 5. The indication of flood reduction capability MUST be carried in | |||
the Node TIEs in the _flood_reduction_ element and MAY be used to | the Node TIEs in the _flood_reduction_ element and MAY be used to | |||
optimize the algorithm to account for nodes that will flood | optimize the algorithm to account for nodes that will flood | |||
regardless. | regardless. | |||
6. A node generates TIDEs as usual, but when receiving TIREs or | 6. A node generates TIDEs as usual, but when receiving TIREs or | |||
TIDEs resulting in requests for a TIE of which the newest | TIDEs resulting in requests for a TIE of which the newest | |||
received copy came on an adjacency where the node was not a flood | received copy came on an adjacency where the node was not a flood | |||
repeater, it SHOULD ignore such requests on first and only first | repeater, it SHOULD ignore such requests on only the first | |||
request. Normally, the nodes that received the TIEs as flooding | request. Normally, the nodes that received the TIEs as flooding | |||
repeaters should satisfy the requesting node and, with that, no | repeaters should satisfy the requesting node and, with that, no | |||
further TIREs for such TIEs will be generated. Otherwise, the | further TIREs for such TIEs will be generated. Otherwise, the | |||
next set of TIDEs and TIREs MUST lead to flooding independent of | next set of TIDEs and TIREs MUST lead to flooding independent of | |||
the flood repeater status. This solves a very difficult "incast" | the flood repeater status. This solves a very difficult "incast" | |||
problem on nodes restarting with a very wide fanout, especially | problem on nodes restarting with a very wide fanout, especially | |||
northbound. To retrieve the full database, they often end up | northbound. To retrieve the full database, they often end up | |||
processing many inrushing copies, whereas this approach load | processing many inrushing copies, whereas this approach load | |||
balances the incoming database between adjacent nodes and flood | balances the incoming database between adjacent nodes and flood | |||
repeaters and should guarantee that two copies are sent by | repeaters and should guarantee that two copies are sent by | |||
skipping to change at line 3521 ¶ | skipping to change at line 3481 ¶ | |||
Prefixes are carried in different types of TIEs indicating their | Prefixes are carried in different types of TIEs indicating their | |||
type. For the same prefix being included in different TIE types, | type. For the same prefix being included in different TIE types, | |||
tie-breaking is performed according to Section 6.8.1. If the same | tie-breaking is performed according to Section 6.8.1. If the same | |||
prefix is included multiple times in multiple TIEs of the same type | prefix is included multiple times in multiple TIEs of the same type | |||
originating at the same node, the resulting behavior is unspecified. | originating at the same node, the resulting behavior is unspecified. | |||
6.4.1. Northbound Reachability SPF | 6.4.1. Northbound Reachability SPF | |||
N-SPF MUST use exclusively northbound and East-West adjacencies in | N-SPF MUST use exclusively northbound and East-West adjacencies in | |||
the computing node's node North TIEs (since if the node is a leaf, it | the computing node's North Node TIEs (since if the node is a leaf, it | |||
may not have generated a Node South TIE) when starting SPF. Observe | may not have generated a South Node TIE) when starting SPF. Observe | |||
that N-SPF is really just a one-hop variety since Node South TIEs are | that N-SPF is really just a one-hop variety since South Node TIEs are | |||
not reflooded southbound beyond a single level (or East-West), and | not reflooded southbound beyond a single level (or East-West), and | |||
with that, the computation cannot progress beyond adjacent nodes. | with that, the computation cannot progress beyond adjacent nodes. | |||
Once progressing, the computation uses the next higher level's Node | Once progressing, the computation uses the next higher level's South | |||
South TIEs to find corresponding adjacencies to verify backlink | Node TIEs to find corresponding adjacencies to verify backlink | |||
connectivity. Two unidirectional links MUST be associated to confirm | connectivity. Two unidirectional links MUST be associated to confirm | |||
bidirectional connectivity, a process often known as "backlink | bidirectional connectivity, a process often known as "backlink | |||
check". As part of the check, both Node TIEs MUST contain the | check". As part of the check, both Node TIEs MUST contain the | |||
correct System IDs *and* expected levels. | correct System IDs *and* expected levels. | |||
The default route found when crossing an E-W link SHOULD be used if | The default route found when crossing an E-W link SHOULD be used if | |||
and only if: | and only if: | |||
1. the node itself does *not* have any northbound adjacencies *and* | 1. the node itself does *not* have any northbound adjacencies *and* | |||
skipping to change at line 3565 ¶ | skipping to change at line 3525 ¶ | |||
That is, the E-W link can be used as a gateway of last resort for a | That is, the E-W link can be used as a gateway of last resort for a | |||
specific prefix only. Using south prefixes across an E-W link can be | specific prefix only. Using south prefixes across an E-W link can be | |||
beneficial, e.g., on automatic disaggregation in pathological fabric | beneficial, e.g., on automatic disaggregation in pathological fabric | |||
partitioning scenarios. | partitioning scenarios. | |||
A detailed example can be found in Appendix B.4. | A detailed example can be found in Appendix B.4. | |||
6.4.2. Southbound Reachability SPF | 6.4.2. Southbound Reachability SPF | |||
S-SPF MUST use the southbound adjacencies in the Node South TIEs | S-SPF MUST use the southbound adjacencies in the South Node TIEs | |||
exclusively, i.e., progresses towards nodes at lower levels. Observe | exclusively, i.e., progresses towards nodes at lower levels. Observe | |||
that E-W adjacencies are NEVER used in this computation. This | that E-W adjacencies are NEVER used in this computation. This | |||
enforces the requirement that a packet traversing in a southbound | enforces the requirement that a packet traversing in a southbound | |||
direction must never change its direction. | direction must never change its direction. | |||
S-SPF MUST use northbound adjacencies in node North TIEs to verify | S-SPF MUST use northbound adjacencies in North Node TIEs to verify | |||
backlink connectivity by checking for the presence of the link beside | backlink connectivity by checking for the presence of the link beside | |||
the correct System ID and level. | the correct System ID and level. | |||
6.4.3. East-West Forwarding Within a Non-ToF Level | 6.4.3. East-West Forwarding Within a Non-ToF Level | |||
Using south prefixes over horizontal links MAY occur if the N-SPF | Using south prefixes over horizontal links MAY occur if the N-SPF | |||
includes East-West adjacencies in computation. It can protect | includes East-West adjacencies in computation. It can protect | |||
against pathological fabric partitioning cases that leave only paths | against pathological fabric partitioning cases that leave only paths | |||
to destinations that would necessitate multiple changes of the | to destinations that would necessitate multiple changes of forwarding | |||
forwarding direction between north and south. | direction between north and south. | |||
6.4.4. East-West Links Within a ToF Level | 6.4.4. East-West Links Within a ToF Level | |||
E-W ToF links behave in terms of flooding scopes defined in | E-W ToF links behave in terms of flooding scopes defined in | |||
Section 6.3.4 like northbound links and MUST be used exclusively for | Section 6.3.4 like northbound links and MUST be used exclusively for | |||
control plane information flooding. Even though a ToF node could be | control plane information flooding. Even though a ToF node could be | |||
tempted to use those links during southbound SPF and carry traffic | tempted to use those links during southbound SPF and carry traffic | |||
over them, this MUST NOT be attempted since it may, in anycast cases, | over them, this MUST NOT be attempted since it may, in anycast cases, | |||
lead to routing loops. An implementation MAY try to resolve the | lead to routing loops. An implementation MAY try to resolve the | |||
looping problem by following on the ring strictly tie-broken | looping problem by following on the ring strictly tie-broken | |||
skipping to change at line 3628 ¶ | skipping to change at line 3588 ¶ | |||
node or link failures can lead to several independent instances of | node or link failures can lead to several independent instances of | |||
positive disaggregation necessary to prevent looping or bow-tying the | positive disaggregation necessary to prevent looping or bow-tying the | |||
fabric. | fabric. | |||
A node determines the set of prefixes needing disaggregation using | A node determines the set of prefixes needing disaggregation using | |||
the following steps: | the following steps: | |||
1. A DAG computation in the southern direction is performed first. | 1. A DAG computation in the southern direction is performed first. | |||
The North TIEs are used to find all of the prefixes it can reach | The North TIEs are used to find all of the prefixes it can reach | |||
and the set of next hops in the lower level for each of them. | and the set of next hops in the lower level for each of them. | |||
Such a computation can be easily performed on a Fat Tree by | Such a computation can be easily performed on a fat tree by | |||
setting all link costs in the southern direction to 1 and all | setting all link costs in the southern direction to 1 and all | |||
northern directions to infinity. The set of those prefixes is | northern directions to infinity. The set of those prefixes is | |||
referred to as |R; for each prefix r in |R, its set of next hops | referred to as |R; for each prefix r in |R, its set of next hops | |||
is |H(r). | is referred to as |H(r). | |||
2. The node uses reflected South TIEs to find all nodes at the same | 2. The node uses reflected South TIEs to find all nodes at the same | |||
level in the same PoD and the set of southbound adjacencies for | level in the same PoD and the set of southbound adjacencies for | |||
each. The set of nodes at the same level is termed |N, and for | each. The set of nodes at the same level is termed |N, and for | |||
each node, n, in |N, its set of southbound adjacencies is defined | each node, n, in |N, its set of southbound adjacencies is defined | |||
to be |A(n). | to be |A(n). | |||
3. For a given r, if the intersection of |H(r) and |A(n), for any n, | 3. For a given r, if the intersection of |H(r) and |A(n), for any n, | |||
is empty, then that prefix r must be explicitly advertised by the | is empty, then that prefix r must be explicitly advertised by the | |||
node in a South TIE. | node in a South TIE. | |||
skipping to change at line 3741 ¶ | skipping to change at line 3701 ¶ | |||
the algorithms to keep them more tractable: | the algorithms to keep them more tractable: | |||
1. All neighbor relationships MUST perform backlink checks. | 1. All neighbor relationships MUST perform backlink checks. | |||
2. The overload flag as introduced in Section 6.8.2 and carried in | 2. The overload flag as introduced in Section 6.8.2 and carried in | |||
the _overload_ schema element has to be respected during the | the _overload_ schema element has to be respected during the | |||
computation. Nodes advertising themselves as overloaded MUST NOT | computation. Nodes advertising themselves as overloaded MUST NOT | |||
be transited in reachability computation but MUST be used as | be transited in reachability computation but MUST be used as | |||
terminal nodes with prefixes they advertise being reachable. | terminal nodes with prefixes they advertise being reachable. | |||
3. All the lower-level nodes are flooded to the same disaggregated | 3. All the lower-level nodes are flooded the same disaggregated | |||
prefixes since RIFT does not build a South TIE per node, which | prefixes since RIFT does not build a South TIE per node, which | |||
would complicate things unnecessarily. The lower-level node that | would complicate things unnecessarily. The lower-level node that | |||
can compute a southbound route to the prefix will prefer it to | can compute a southbound route to the prefix will prefer it to | |||
the disaggregated route anyway based on route preference rules. | the disaggregated route anyway based on route preference rules. | |||
4. Positively disaggregated prefixes do *not* have to propagate to | 4. Positively disaggregated prefixes do *not* have to propagate to | |||
lower levels. With that, the disturbance in terms of new | lower levels. With that, the disturbance in terms of new | |||
flooding is contained to a single level experiencing failures. | flooding is contained to a single level experiencing failures. | |||
5. Disaggregated Prefix South TIEs are not "reflected" by the lower | 5. Disaggregated South Prefix TIEs are not "reflected" by the lower | |||
level. Nodes within the same level do *not* need to be aware of | level. Nodes within the same level do *not* need to be aware of | |||
which node computed the need for disaggregation. | which node computed the need for disaggregation. | |||
6. The fabric is still supporting maximum load balancing properties | 6. The fabric is still supporting maximum load balancing properties | |||
while not trying to send traffic northbound unless necessary. | while not trying to send traffic northbound unless necessary. | |||
In case positive disaggregation is triggered and due to the very | In case positive disaggregation is triggered and due to the very | |||
stable but unsynchronized nature of the algorithm, the nodes may | stable but unsynchronized nature of the algorithm, the nodes may | |||
issue the necessary disaggregated prefixes at different points in | issue the necessary disaggregated prefixes at different points in | |||
time. For a short time, this can lead to an "incast" behavior where | time. For a short time, this can lead to an "incast" behavior where | |||
the first advertising router based on the nature of the longest | the first advertising router based on the nature of the longest | |||
prefix match will attract all the traffic. Different implementation | prefix match will attract all the traffic. Different implementation | |||
strategies can be used to lessen that effect, but those are outside | strategies can be used to lessen that effect, but those are outside | |||
the scope of this specification. | the scope of this specification. | |||
It is worth observing that, in a single plane ToF, this | It is worth observing that, in a single-plane ToF, this | |||
disaggregation prevents traffic loss up to (K_LEAF * P) link failures | disaggregation prevents traffic loss up to (K_LEAF * P) link failures | |||
in terms of Section 5.2 or, in other terms, it takes at minimum that | in terms of Section 5.2 or, in other terms, it takes at minimum that | |||
many link failures to partition the ToF into multiple planes. | many link failures to partition the ToF into multiple planes. | |||
6.5.2. Negative, Transitive Disaggregation for Fallen Leaves | 6.5.2. Negative, Transitive Disaggregation for Fallen Leaves | |||
As explained in Section 5.3, failures in multi-plane ToF or more than | As explained in Section 5.3, failures in multi-plane ToF or more than | |||
(K_LEAF * P) links failing in single plane design can generate fallen | (K_LEAF * P) links failing in single-plane design can generate fallen | |||
leaves. Such scenario cannot be addressed by positive disaggregation | leaves. Such scenario cannot be addressed by positive disaggregation | |||
only and needs a further mechanism. | only and needs a further mechanism. | |||
6.5.2.1. Cabling of Multiple ToF Planes | 6.5.2.1. Cabling of Multiple ToF Planes | |||
Returning in this section to designs with multiple planes as shown | Returning in this section to designs with multiple planes as shown | |||
originally in Figure 3, Figure 18 highlights how the ToF is cabled in | originally in Figure 3, Figure 18 highlights how the ToF is cabled in | |||
case of two planes by the means of dual-rings to distribute all the | case of two planes by the means of dual-rings to distribute all the | |||
North TIEs within both planes. | North TIEs within both planes. | |||
____________________________________________________________________________ | _______________________________________________________________________ | |||
| [Plane A] . [Plane B] . [Plane C] . [Plane D] | | | [Plane A] . [Plane B] . [Plane C] . [Plane D] | | |||
|..........................................................................| | |.....................................................................| | |||
| +-------------------------------------------------------------+ | | | +------------------------------------------------------------+ | | |||
| | +---+ . +---+ . +---+ . +---+ | | | | | +---+ . +---+ . +---+ . +---+ | | | |||
| +-+ n +-------------+ n +-------------+ n +-------------+ n +-+ | | | +-+ n +-------------+ n +-------------+ n +------------+ n +-+ | | |||
| +--++ . +-+++ . +-+++ . +--++ | | | +--++ . +-+++ . +-+++ . +--++ | | |||
| || . || . || . || | | | || . || . || . || | | |||
| +---------||---------------||----------------||---------------+ || | | | +---------||---------------||----------------||--------------+ || | | |||
| | +---+ || . +---+ || . +---+ || . +---+ | || | | | | +---+ || . +---+ || . +---+ || . +---+ | || | | |||
| +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||---------+ 1 +-+ || | | | +-+ 1 +---||--------+ 1 +--||---------+ 1 +--||--------+ 1 +-+ || | | |||
| +--++ || . +-+++ || . +-+++ || . +-+++ || | | | +--++ || . +-+++ || . +-+++ || . +-+++ || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
| || || . || || . || || . || || | | | || || . || || . || || . || || | | |||
Figure 18: Topologically Connected Planes | Figure 18: Topologically Connected Planes | |||
Section 5.3 already describes how failures in multi-plane fabrics can | Section 5.3 already describes how failures in multi-plane fabrics can | |||
lead to traffic loss that normal positive disaggregation cannot fix. | lead to traffic loss that normal positive disaggregation cannot fix. | |||
The mechanism of negative, transitive disaggregation incorporated in | The mechanism of negative, transitive disaggregation incorporated in | |||
RIFT provides the corresponding solution, and the next section | RIFT provides the corresponding solution, and the next section | |||
explains the involved mechanisms in more detail. | explains the involved mechanisms in more detail. | |||
6.5.2.2. Transitive Advertisement of Negative Disaggregates | 6.5.2.2. Transitive Advertisement of Negative Disaggregates | |||
A ToF node discovering that it cannot reach a fallen leaf SHOULD | A ToF node discovering that it cannot reach a fallen leaf SHOULD | |||
disaggregate all the prefixes of that leaf. For that purpose, it | disaggregate all the prefixes of that leaf. For that purpose, it | |||
uses negative prefix South TIEs that are, as usual, flooded | uses negative South Prefix TIEs that are, as usual, flooded | |||
southwards with the scope defined in Section 6.3.4. | southwards with the scope defined in Section 6.3.4. | |||
Transitively, a node explicitly loses connectivity to a prefix when | Transitively, a node explicitly loses connectivity to a prefix when | |||
none of its children advertises it and when the prefix is negatively | none of its children advertises it and when the prefix is negatively | |||
disaggregated by all of its parents. When that happens, the node | disaggregated by all of its parents. When that happens, the node | |||
originates the negative prefix further down south. Since the | originates the negative prefix further down south. Since the | |||
mechanism applies recursively south, the negative prefix may | mechanism applies recursively south, the negative prefix may | |||
propagate transitively all the way down to the leaf. This is | propagate transitively all the way down to the leaf. This is | |||
necessary since leaves connected to multiple planes by means of | necessary since leaves connected to multiple planes by means of | |||
disjointed paths may have to choose the correct plane at the very | disjointed paths may have to choose the correct plane at the very | |||
skipping to change at line 3837 ¶ | skipping to change at line 3797 ¶ | |||
When connectivity is restored, a node that disaggregated a prefix | When connectivity is restored, a node that disaggregated a prefix | |||
withdraws the negative disaggregation by the usual mechanism of re- | withdraws the negative disaggregation by the usual mechanism of re- | |||
advertising TIEs omitting the negative prefix. | advertising TIEs omitting the negative prefix. | |||
6.5.2.3. Computation of Negative Disaggregates | 6.5.2.3. Computation of Negative Disaggregates | |||
Negative prefixes can in fact be advertised due to two different | Negative prefixes can in fact be advertised due to two different | |||
triggers. This will be described consecutively. | triggers. This will be described consecutively. | |||
The first origination reason is a computation that uses all the node | The first origination reason is a computation that uses all the North | |||
North TIEs to build the set of all reachable nodes by reachability | Node TIEs to build the set of all reachable nodes by reachability | |||
computation over the complete graph, including horizontal ToF links. | computation over the complete graph, including horizontal ToF links. | |||
The computation uses the node itself as the root. This is compared | The computation uses the node itself as the root. This is compared | |||
with the result of the normal southbound SPF as described in | with the result of the normal southbound SPF as described in | |||
Section 6.4.2. The differences are the fallen leaves and all their | Section 6.4.2. The differences are the fallen leaves and all their | |||
attached prefixes are advertised as negative prefixes southbound if | attached prefixes are advertised as negative prefixes southbound if | |||
the node does not consider the prefix to be reachable within the | the node does not consider the prefix to be reachable within the | |||
southbound SPF. | southbound SPF. | |||
The second origination reason hinges on the understanding of how the | The second origination reason hinges on the understanding of how the | |||
negative prefixes are used within the computation as described in | negative prefixes are used within the computation as described in | |||
skipping to change at line 3875 ¶ | skipping to change at line 3835 ¶ | |||
that node's next-hop set and a distance equal to the prefix's cost | that node's next-hop set and a distance equal to the prefix's cost | |||
plus the node's minimized path distance. The RIFT route database, a | plus the node's minimized path distance. The RIFT route database, a | |||
set of (prefix, prefix-type, attributes, path_distance, next-hop | set of (prefix, prefix-type, attributes, path_distance, next-hop | |||
set), accumulates these results. | set), accumulates these results. | |||
N-SPF prefixes from each South TIE need to also be added to the RIFT | N-SPF prefixes from each South TIE need to also be added to the RIFT | |||
route database. The N-SPF is really just a stub so the computing | route database. The N-SPF is really just a stub so the computing | |||
node simply needs to determine, for each prefix in a South TIE that | node simply needs to determine, for each prefix in a South TIE that | |||
originated from adjacent node, what next hops to use to reach that | originated from adjacent node, what next hops to use to reach that | |||
node. Since there may be parallel links, the next hops to use can be | node. Since there may be parallel links, the next hops to use can be | |||
a set; the presence of the computing node in the associated Node | a set; the presence of the computing node in the associated South | |||
South TIE is sufficient to verify that at least one link has | Node TIE is sufficient to verify that at least one link has | |||
bidirectional connectivity. The set of minimum cost next hops from | bidirectional connectivity. The set of minimum cost next hops from | |||
the computing node X to the originating adjacent node is determined. | the computing node X to the originating adjacent node is determined. | |||
Each prefix has its cost adjusted before being added into the RIFT | Each prefix has its cost adjusted before being added into the RIFT | |||
route database. The cost of the prefix is set to the cost received | route database. The cost of the prefix is set to the cost received | |||
plus the cost of the minimum distance next hop to that neighbor while | plus the cost of the minimum distance next hop to that neighbor while | |||
considering its attributes such as mobility per Section 6.8.4. Then | considering its attributes such as mobility per Section 6.8.4. Then | |||
each prefix can be added into the RIFT route database with the next- | each prefix can be added into the RIFT route database with the next- | |||
hop set; ties are broken based upon type first and then distance and | hop set; ties are broken based upon type first and then distance and | |||
further on _PrefixAttributes_. Only the best combination is used for | further on _PrefixAttributes_. Only the best combination is used for | |||
skipping to change at line 3930 ¶ | skipping to change at line 3890 ¶ | |||
end for | end for | |||
end for | end for | |||
Figure 19: Adding Routes from South TIE Positive and Negative | Figure 19: Adding Routes from South TIE Positive and Negative | |||
Prefixes | Prefixes | |||
After the positive prefixes are attached and tie-broken, negative | After the positive prefixes are attached and tie-broken, negative | |||
prefixes are attached and used in case of northbound computation, | prefixes are attached and used in case of northbound computation, | |||
ideally from the shortest length to the longest. The next-hop | ideally from the shortest length to the longest. The next-hop | |||
adjacencies for a negative prefix are inherited from the longest | adjacencies for a negative prefix are inherited from the longest | |||
positive prefix that aggregates it, and subsequently adjacencies to | positive prefix that aggregates it; subsequently, adjacencies to | |||
nodes that advertised negative for this prefix are removed. | nodes that advertised negative disaggregation for this prefix are | |||
removed. | ||||
The rule of inheritance MUST be maintained when the next-hop list for | The rule of inheritance MUST be maintained when the next-hop list for | |||
a prefix is modified, as the modification may affect the entries for | a prefix is modified, as the modification may affect the entries for | |||
matching negative prefixes of immediate longer prefix length. For | matching negative prefixes of immediate longer prefix length. For | |||
instance, if a next hop is added, then by inheritance, it must be | instance, if a next hop is added, then by inheritance, it must be | |||
added to all the negative routes of immediate longer prefixes length | added to all the negative routes of immediate longer prefixes length | |||
unless it is pruned due to a negative advertisement for the same next | unless it is pruned due to a negative advertisement for the same next | |||
hop. Similarly, if a next hop is deleted for a given prefix, then it | hop. Similarly, if a next hop is deleted for a given prefix, then it | |||
is deleted for all the immediately aggregated negative routes. This | is deleted for all the immediately aggregated negative routes. This | |||
will recurse in the case of nested negative prefix aggregations. | will recurse in the case of nested negative prefix aggregations. | |||
skipping to change at line 4199 ¶ | skipping to change at line 4160 ¶ | |||
+---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | +---> | Via S4 | +---> | Via S4 | +---> | Via S4 | | |||
+--------+ +--------+ +--------+ | +--------+ +--------+ +--------+ | |||
Figure 27: Abstract FIB After Negative 2001:db8:2::/48 from S4 | Figure 27: Abstract FIB After Negative 2001:db8:2::/48 from S4 | |||
6.7. Optional Zero Touch Provisioning (RIFT ZTP) | 6.7. Optional Zero Touch Provisioning (RIFT ZTP) | |||
Each RIFT node can operate in Zero Touch Provisioning (ZTP) mode, | Each RIFT node can operate in Zero Touch Provisioning (ZTP) mode, | |||
i.e., it has no RIFT-specific configuration (unless it is a ToF or it | i.e., it has no RIFT-specific configuration (unless it is a ToF or it | |||
is explicitly configured to operate in the overall topology as a leaf | is explicitly configured to operate in the overall topology as a leaf | |||
and/or support leaf-to-leaf procedures), and it will fully, | and/or support L2L procedures), and it will fully, automatically | |||
automatically derive necessary RIFT parameters itself after being | derive necessary RIFT parameters itself after being attached to the | |||
attached to the topology. Manually configured nodes and nodes | topology. Manually configured nodes and nodes operating using RIFT | |||
operating using RIFT ZTP can be mixed freely and will form a valid | ZTP can be mixed freely and will form a valid topology if achievable. | |||
topology if achievable. | ||||
The derivation of the level of each node happens based on offers | The derivation of the level of each node happens based on offers | |||
received from its neighbors, whereas each node (with the possible | received from its neighbors, whereas each node (with the possible | |||
exception of nodes configured as leaves) tries to attach at the | exception of nodes configured as leaves) tries to attach at the | |||
highest possible point in the fabric. This guarantees that even if | highest possible point in the fabric. This guarantees that even if | |||
the diffusion front of offers reaches a node from "below" faster than | the diffusion front of offers reaches a node from "below" faster than | |||
from "above", it will greedily abandon an already negotiated level | from "above", it will greedily abandon an already negotiated level | |||
derived from nodes topologically below it and properly peer with | derived from nodes topologically below it and properly peer with | |||
nodes above. | nodes above. | |||
skipping to change at line 4393 ¶ | skipping to change at line 4353 ¶ | |||
1. It advertises its LEVEL_VALUE on all LIEs (observe that this can | 1. It advertises its LEVEL_VALUE on all LIEs (observe that this can | |||
be UNDEFINED_LEVEL, which in terms of the schema, is simply an | be UNDEFINED_LEVEL, which in terms of the schema, is simply an | |||
omitted optional value). | omitted optional value). | |||
2. It computes HAL as the numerically highest available level in all | 2. It computes HAL as the numerically highest available level in all | |||
VOLs. | VOLs. | |||
3. Then, it chooses MAX(HAL-1,0) as its DERIVED_LEVEL. The node | 3. Then, it chooses MAX(HAL-1,0) as its DERIVED_LEVEL. The node | |||
then starts to advertise this derived level. | then starts to advertise this derived level. | |||
4. A node that lost all adjacencies with the HAL value MUST hold | 4. A node that lost all adjacencies with the HAL value MUST holddown | |||
down computation of the new DERIVED_LEVEL for at least one second | computation of the new DERIVED_LEVEL for at least one second | |||
unless it has no VOLs from southbound adjacencies. After the | unless it has no VOLs from southbound adjacencies. After the | |||
holddown timer expired, it MUST discard all received offers, | holddown timer expired, it MUST discard all received offers, | |||
recompute DERIVED_LEVEL, and announce it to all neighbors. | recompute DERIVED_LEVEL, and announce it to all neighbors. | |||
5. A node MUST reset any adjacency that has changed the level it is | 5. A node MUST reset any adjacency that has changed the level it is | |||
offering and is in _ThreeWay_ state. | offering and is in _ThreeWay_ state. | |||
6. A node that changed its defined level value MUST re-advertise its | 6. A node that changed its defined level value MUST re-advertise its | |||
own TIEs (since the new _PacketHeader_ will contain a different | own TIEs (since the new _PacketHeader_ will contain a different | |||
level than before). The sequence number of each TIE MUST be | level than before). The sequence number of each TIE MUST be | |||
increased. | increased. | |||
7. After a level has been derived, the node MUST set the | 7. After a level has been derived, the node MUST set the | |||
_not_a_ztp_offer_ on LIEs towards all systems offering a VOL for | _not_a_ztp_offer_ on LIEs towards all systems offering a VOL for | |||
HAL. | HAL. | |||
8. A node that changed its level SHOULD flush TIEs of all other | 8. A node that changed its level SHOULD flush TIEs of all other | |||
nodes from its link state database; otherwise, stale information | nodes from its LSDB; otherwise, stale information may persist on | |||
may persist on "direction reversal", i.e., nodes that seemed | "direction reversal", i.e., nodes that seemed south are now north | |||
south are now north or east-west. This will not prevent the | or east-west. This will not prevent the correct operation of the | |||
correct operation of the protocol but could be slightly confusing | protocol but could be slightly confusing operationally. | |||
operationally. | ||||
A node starting with LEVEL_VALUE being 0 (i.e., it assumes a leaf | A node starting with LEVEL_VALUE being 0 (i.e., it assumes a leaf | |||
function by being configured with the appropriate flags or has a | function by being configured with the appropriate flags or has a | |||
CONFIGURED_LEVEL of 0) MUST follow this additional procedure: | CONFIGURED_LEVEL of 0) MUST follow this additional procedure: | |||
1. It computes HAT per the procedures above but does *not* use it to | 1. It computes HAT per the procedures above but does *not* use it to | |||
compute DERIVED_LEVEL. HAT is used to limit adjacency formation | compute DERIVED_LEVEL. HAT is used to limit adjacency formation | |||
per Section 6.2. | per Section 6.2. | |||
It MAY also follow this modified procedure: | It MAY also follow this modified procedure: | |||
skipping to change at line 4879 ¶ | skipping to change at line 4838 ¶ | |||
optional sequence number field. In case of a negatively distributed | optional sequence number field. In case of a negatively distributed | |||
prefix, this attribute MUST NOT be included by the originator and it | prefix, this attribute MUST NOT be included by the originator and it | |||
MUST be ignored by all nodes during computation. When this attribute | MUST be ignored by all nodes during computation. When this attribute | |||
is present (observe that per data schema, the attribute itself is | is present (observe that per data schema, the attribute itself is | |||
optional, but in case it is included, the "timestamp" field is | optional, but in case it is included, the "timestamp" field is | |||
required): | required): | |||
* The leaf node MAY advertise a timestamp of the latest sighting of | * The leaf node MAY advertise a timestamp of the latest sighting of | |||
a prefix, e.g., by snooping IP protocols or the node using the | a prefix, e.g., by snooping IP protocols or the node using the | |||
time at which it advertised the prefix. RIFT transports the | time at which it advertised the prefix. RIFT transports the | |||
timestamp within the desired Prefix North TIEs as the | timestamp within the desired North Prefix TIEs as the | |||
[IEEEstd1588] timestamp. | [IEEEstd1588] timestamp. | |||
* RIFT MAY interoperate with "Registration Extensions for 6LoWPAN | * RIFT MAY interoperate with "Registration Extensions for 6LoWPAN | |||
Neighbor Discovery" [RFC8505], which provides a method for | Neighbor Discovery" [RFC8505], which provides a method for | |||
registering a prefix with a sequence number called a Transaction | registering a prefix with a sequence number called a Transaction | |||
ID (TID). In such cases, RIFT SHOULD transport the derived TID | ID (TID). In such cases, RIFT SHOULD transport the derived TID | |||
without modification. | without modification. | |||
* RIFT also defines an abstract negative clock (ASNC) (also called | * RIFT also defines an abstract negative clock (ASNC) (also called | |||
an "undefined" clock). The ASNC MUST be considered older than any | an "undefined" clock). The ASNC MUST be considered older than any | |||
other defined clock. By default, when a node receives a Prefix | other defined clock. By default, when a node receives a North | |||
North TIE that does not contain a 'PrefixSequenceType' attribute, | Prefix TIE that does not contain a 'PrefixSequenceType' attribute, | |||
it MUST interpret the absence as the ASNC. | it MUST interpret the absence as the ASNC. | |||
* Any prefix present on the fabric in multiple nodes that have the | * Any prefix present on the fabric in multiple nodes that have the | |||
*same* clock is considered as anycast. | *same* clock is considered as anycast. | |||
* The RIFT specification assumes that all nodes are being | * The RIFT specification assumes that all nodes are being | |||
synchronized within at least 200 milliseconds or less. This is | synchronized within at least 200 milliseconds or less. This is | |||
achievable through the use of NTP [RFC5905]. An implementation | achievable through the use of NTP [RFC5905]. An implementation | |||
MAY provide a way to reconfigure a domain to a different value and | MAY provide a way to reconfigure a domain to a different value and | |||
provides a variable called MAXIMUM_CLOCK_DELTA for this purpose. | provides a variable called MAXIMUM_CLOCK_DELTA for this purpose. | |||
6.8.4.1. Clock Comparison | 6.8.4.1. Clock Comparison | |||
All monotonic clock values MUST be compared to each other using the | All monotonic clock values MUST be compared to each other using the | |||
following rules: | following rules: | |||
1. The ASNC is older than any other value except ASNC, | 1. The ASNC is older than any other value except ASNC *and* | |||
2. Clocks with timestamps differing by more than MAXIMUM_CLOCK_DELTA | 2. Clocks with timestamps differing by more than MAXIMUM_CLOCK_DELTA | |||
are comparable by using the timestamps only, | are comparable by using the timestamps only *and* | |||
3. Clocks with timestamps differing by less than MAXIMUM_CLOCK_DELTA | 3. Clocks with timestamps differing by less than MAXIMUM_CLOCK_DELTA | |||
are comparable by using their TIDs only, *and* | are comparable by using their TIDs only, *and* | |||
4. An undefined TID is always older than any other TID, *and* | 4. An undefined TID is always older than any other TID, *and* | |||
5. TIDs are compared using rules of [RFC8505]. | 5. TIDs are compared using rules of [RFC8505]. | |||
6.8.4.2. Interaction Between Timestamps and Sequence Counters | 6.8.4.2. Interaction Between Timestamps and Sequence Counters | |||
For attachment changes that occur less frequently (e.g., once per | For attachment changes that occur less frequently (e.g., once per | |||
second), the timestamp that the RIFT infrastructure captures should | second), the timestamp that the RIFT infrastructure captures should | |||
be enough to determine the most current discovery. If the point of | be enough to determine the most current discovery. If the point of | |||
attachment changes faster than the maximum drift of the timestamping | attachment changes faster than the maximum drift of the timestamping | |||
mechanism (i.e., MAXIMUM_CLOCK_DELTA), then a sequence number SHOULD | mechanism (i.e., MAXIMUM_CLOCK_DELTA), then a sequence number SHOULD | |||
be used to enable necessary precision to determine currency. | be used to enable necessary precision to determine currency. | |||
The sequence counter in [RFC8505] is encoded as one octet and wraps | The sequence counter in [RFC8505] is encoded as one octet and wraps | |||
around using Appendix A. | around using the arithmetic defined in Appendix A. | |||
Within the resolution of MAXIMUM_CLOCK_DELTA, sequence counter values | Within the resolution of MAXIMUM_CLOCK_DELTA, sequence counter values | |||
captured during 2 sequential iterations of the same timestamp SHOULD | captured during 2 sequential iterations of the same timestamp SHOULD | |||
be comparable. This means that with default values, a node may move | be comparable. This means that with default values, a node may move | |||
up to 127 times in a 200-millisecond period and the clocks will | up to 127 times in a 200-millisecond period and the clocks will | |||
remain comparable. This allows the RIFT infrastructure to explicitly | remain comparable. This allows the RIFT infrastructure to explicitly | |||
assert the most up-to-date advertisement. | assert the most up-to-date advertisement. | |||
6.8.4.3. Anycast vs. Unicast | 6.8.4.3. Anycast vs. Unicast | |||
skipping to change at line 5031 ¶ | skipping to change at line 4990 ¶ | |||
[RFC5881] to react quickly to link failures. In such case, the | [RFC5881] to react quickly to link failures. In such case, the | |||
following procedures are introduced: | following procedures are introduced: | |||
1. After RIFT _ThreeWay_ hello adjacency convergence, a BFD session | 1. After RIFT _ThreeWay_ hello adjacency convergence, a BFD session | |||
MAY be formed automatically between the RIFT endpoints without | MAY be formed automatically between the RIFT endpoints without | |||
further configuration using the exchanged discriminators that are | further configuration using the exchanged discriminators that are | |||
equal to the _local_id_ in the _LIEPacket_. The capability of the | equal to the _local_id_ in the _LIEPacket_. The capability of the | |||
remote side to support BFD is carried in the LIEs in | remote side to support BFD is carried in the LIEs in | |||
_LinkCapabilities_. | _LinkCapabilities_. | |||
2. In case an established BFD session goes down after it was up, | 2. In case an established BFD session goes Down after it was Up, | |||
RIFT adjacency SHOULD be re-initialized and subsequently started | RIFT adjacency SHOULD be re-initialized and subsequently started | |||
from Init after it receives a consecutive BFD Up. | from Init after it receives a consecutive BFD Up. | |||
3. In case of parallel links between nodes, each link MAY run its | 3. In case of parallel links between nodes, each link MAY run its | |||
own independent BFD session or they MAY share a session. The | own independent BFD session or they MAY share a session. The | |||
specific manner in which this is implemented is outside the scope | specific manner in which this is implemented is outside the scope | |||
of this document. | of this document. | |||
4. If link identifiers or BFD capabilities change, both the LIE and | 4. If link identifiers or BFD capabilities change, both the LIE and | |||
any BFD sessions SHOULD be brought down and back up again. In | any BFD sessions SHOULD be brought down and back up again. In | |||
skipping to change at line 5111 ¶ | skipping to change at line 5070 ¶ | |||
Spine 111, and as a result, Leaf 111 wants to forward more traffic | Spine 111, and as a result, Leaf 111 wants to forward more traffic | |||
towards Spine 112. Additionally, it includes an uplink failure on | towards Spine 112. Additionally, it includes an uplink failure on | |||
Spine 111. | Spine 111. | |||
The local modification of the received default route distance from | The local modification of the received default route distance from | |||
the upper level is achieved by running a relatively simple algorithm | the upper level is achieved by running a relatively simple algorithm | |||
where the bandwidth is weighted exponentially, while the distance on | where the bandwidth is weighted exponentially, while the distance on | |||
the default route represents a multiplier for the bandwidth weight | the default route represents a multiplier for the bandwidth weight | |||
for easy operational adjustments. | for easy operational adjustments. | |||
On a node, L, use Node TIEs to compute from each non-overloaded | On a node, L, use Node TIEs to compute 3 values from each non- | |||
northbound neighbor N to compute 3 values: | overloaded northbound neighbor, N: | |||
1. L_N_u: sum of the bandwidth available from L to N (to account for | 1. L_N_u: sum of the bandwidth available from L to N (to account for | |||
parallel links) | parallel links) | |||
2. N_u: sum of the uplink bandwidth available on N | 2. N_u: sum of the uplink bandwidth available on N | |||
3. T_N_u: L_N_u * OVERSUBSCRIPTION_CONSTANT + N_u | 3. T_N_u: L_N_u * OVERSUBSCRIPTION_CONSTANT + N_u | |||
For all T_N_u, determine the corresponding M_N_u as | For all T_N_u, determine the corresponding M_N_u as | |||
log_2(next_power_2(T_N_u)) and determine MAX_M_N_u as the maximum | log_2(next_power_2(T_N_u)) and determine MAX_M_N_u as the maximum | |||
skipping to change at line 5184 ¶ | skipping to change at line 5143 ¶ | |||
packet continues to flow southbound, it will take some viable, loop- | packet continues to flow southbound, it will take some viable, loop- | |||
free path to reach its destination. | free path to reach its destination. | |||
6.8.8. Label Binding | 6.8.8. Label Binding | |||
In its LIEs, a node MAY advertise a locally significant, downstream- | In its LIEs, a node MAY advertise a locally significant, downstream- | |||
assigned, interface-specific label. One use of such a label is a | assigned, interface-specific label. One use of such a label is a | |||
hop-by-hop encapsulation allowing forwarding planes to be easily | hop-by-hop encapsulation allowing forwarding planes to be easily | |||
distinguished among multiple RIFT instances. | distinguished among multiple RIFT instances. | |||
6.8.9. Leaf-to-Leaf Procedures | 6.8.9. L2L Procedures | |||
RIFT implementations SHOULD support special East-West adjacencies | RIFT implementations SHOULD support special East-West adjacencies | |||
between leaf nodes. Leaf nodes supporting these procedures MUST: | between leaf nodes. Leaf nodes supporting these procedures MUST: | |||
1. advertise the LEAF_2_LEAF flag in its node capabilities, | 1. advertise the LEAF_2_LEAF flag in its node capabilities, | |||
2. set the overload flag on all leaf's Node TIEs, | 2. set the overload flag on all leaf's Node TIEs, | |||
3. flood only a node's own North and South TIEs over E-W leaf | 3. flood only a node's own North and South TIEs over E-W leaf | |||
adjacencies, | adjacencies, | |||
4. always use E-W leaf adjacency in all SPF computations, | 4. always use E-W leaf adjacency in all SPF computations, | |||
5. install a discard route for any advertised aggregate routes in a | 5. install a discard route for any advertised aggregate routes in a | |||
leaf's TIE, *and* | leaf's TIE, *and* | |||
6. never form southbound adjacencies. | 6. never form southbound adjacencies. | |||
This will allow the E-W leaf nodes to exchange traffic strictly for | This will allow the E-W leaf nodes to exchange traffic strictly for | |||
the prefixes advertised in each other's north prefix TIEs since the | the prefixes advertised in each other's North Prefix TIEs since the | |||
southbound computation will find the reverse direction in the other | southbound computation will find the reverse direction in the other | |||
node's TIE and install its north prefixes. | node's TIE and install its north prefixes. | |||
6.8.10. Address Family and Multi-Topology Considerations | 6.8.10. Address Family and Multi-Topology Considerations | |||
Multi-Topology (MT) [RFC5120] and Multi-Instance (MI) [RFC8202] | Multi-Topology (MT) [RFC5120] and Multi-Instance (MI) [RFC8202] | |||
concepts are used today in link-state routing protocols to support | concepts are used today in link-state routing protocols to support | |||
several domains on the same physical topology. RIFT supports this | several domains on the same physical topology. RIFT supports this | |||
capability by carrying transport ports in the LIE protocol exchanges. | capability by carrying transport ports in the LIE protocol exchanges. | |||
Multiplexing of LIEs can be achieved by either choosing varying | Multiplexing of LIEs can be achieved by either choosing varying | |||
skipping to change at line 5341 ¶ | skipping to change at line 5300 ¶ | |||
* message integrity, | * message integrity, | |||
* the prevention of replay attacks, | * the prevention of replay attacks, | |||
* low processing overhead, and | * low processing overhead, and | |||
* efficient messaging | * efficient messaging | |||
unless no security is deployed by means of using | unless no security is deployed by means of using | |||
'undefined_securitykey_id' as key identifiers. | 'undefined_securitykey_id' as key identifiers (key ID). | |||
Message confidentiality is a non-goal. | Message confidentiality is a non-goal. | |||
The model in the previous section allows a range of security key | The model in the previous section allows a range of security key | |||
types that are analogous to the various security association models. | types that are analogous to the various security association models. | |||
PAM and NAM allow security associations at the port or node level | PAM and NAM allow security associations at the port or node level | |||
using symmetric or asymmetric keys that are preinstalled. FAM argues | using symmetric or asymmetric keys that are preinstalled. FAM argues | |||
for security associations to be applied only at a group level or to | for security associations to be applied only at a group level or to | |||
be refined once the topology has been established. RIFT does not | be refined once the topology has been established. RIFT does not | |||
specify how security keys are installed or updated, though it does | specify how security keys are installed or updated, though it does | |||
skipping to change at line 5364 ¶ | skipping to change at line 5323 ¶ | |||
The protocol has provisions for "weak" nonces to prevent replay | The protocol has provisions for "weak" nonces to prevent replay | |||
attacks and includes authentication mechanisms comparable to those | attacks and includes authentication mechanisms comparable to those | |||
described in [RFC5709] and [RFC7987]. | described in [RFC5709] and [RFC7987]. | |||
6.9.3. Security Envelope | 6.9.3. Security Envelope | |||
A serialized schema _ProtocolPacket_ MUST be carried in a secure | A serialized schema _ProtocolPacket_ MUST be carried in a secure | |||
envelope as illustrated in Figure 34. The _ProtocolPacket_ MUST be | envelope as illustrated in Figure 34. The _ProtocolPacket_ MUST be | |||
serialized using the default Thrift's binary protocol. Any value in | serialized using the default Thrift's binary protocol. Any value in | |||
the packet following a security fingerprint MUST be used by a | the packet following a security fingerprint MUST be used by a | |||
receiver only after the fingerprint generated based on acceptable, | receiver only after the fingerprint generated based on an acceptable, | |||
advertised key ID has been validated against the data covered by it | advertised key ID has been validated against the data covered by the | |||
bare exceptions arising from operational exigencies where, based on | bare exceptions arising from operational exigencies. Based on local | |||
local configuration, a node MAY allow for the envelope's integrity | configuration, a node MAY allow for the envelope's integrity checks | |||
checks to be skipped and for behavior specified in Section 6.9.6. | to be skipped and for the procedure specified in Section 6.9.6 to be | |||
This means that for all packets, in case the node is configured to | implemented. This means that for all packets, in case the node is | |||
validate the outer fingerprint based on a key ID, an unexpected key | configured to validate the outer fingerprint based on a key ID, an | |||
ID or fingerprint not validating against the expected key ID will | unexpected key ID or fingerprint not validating against the expected | |||
lead to packet rejection. Further, in case of reception of a TIE and | key ID will lead to packet rejection. Further, in case of reception | |||
the receiver being configured to validate the originator by checking | of a TIE and the receiver being configured to validate the originator | |||
the TIE Origin Security Envelope Header fingerprint against a key ID, | by checking the TIE Origin Security Envelope Header fingerprint | |||
an incorrect key ID or inner fingerprint not validating against the | against a key ID, an incorrect key ID or inner fingerprint not | |||
key ID will lead to the rejection of the packet. | validating against the key ID will lead to the rejection of the | |||
packet. | ||||
For reasons of clarity, it is important to observe that the | For reasons of clarity, it is important to observe that the | |||
specification uses the words "fingerprint" and "signature" | specification uses the words "fingerprint" and "signature" | |||
interchangeably since the specific properties of the fingerprint part | interchangeably since the specific properties of the fingerprint part | |||
of the envelope depend on the algorithms used to insure the payload | of the envelope depend on the algorithms used to insure the payload | |||
integrity. Moreover, any security chosen never implies encryption | integrity. Moreover, any security chosen never implies encryption | |||
due to performance impact involved but only fingerprint or signature | due to performance impact involved but only fingerprint or signature | |||
generation and validation. | generation and validation. | |||
An implementation MUST implement at least both sending and receiving | An implementation MUST implement at least both sending and receiving | |||
skipping to change at line 5624 ¶ | skipping to change at line 5584 ¶ | |||
adjacency back up. Obviously, an implementation MAY choose to stop | adjacency back up. Obviously, an implementation MAY choose to stop | |||
verifying the security envelope for the duration of the algorithm | verifying the security envelope for the duration of the algorithm | |||
change to keep the adjacency up, but since this introduces a security | change to keep the adjacency up, but since this introduces a security | |||
vulnerability window, such rollover SHOULD NOT be recommended. Other | vulnerability window, such rollover SHOULD NOT be recommended. Other | |||
approaches, such as accepting multiple algorithms for same key ID for | approaches, such as accepting multiple algorithms for same key ID for | |||
a configured time window, are possible but in the realm of | a configured time window, are possible but in the realm of | |||
implementation choices rather than protocol specification. | implementation choices rather than protocol specification. | |||
7. Information Elements Schema | 7. Information Elements Schema | |||
This section introduces the schema for information elements. The IDL | This section introduces the schema for information elements. The | |||
is Thrift [thrift]. | Interface Description Language (IDL) is Thrift [thrift]. | |||
On schema changes that | On schema changes that | |||
1. change field numbers, | 1. change field numbers *or* | |||
2. add new *required* fields, | 2. add new *required* fields *or* | |||
3. remove any fields. | 3. remove any fields *or* | |||
4. change lists into sets and unions into structures, | 4. change lists into sets, unions into structures *or* | |||
5. change the multiplicity of fields, | 5. change multiplicity of fields *or* | |||
6. change the type or name of any field, | 6. changes type or name of any field *or* | |||
7. change data types of the type of any field, | 7. change data types of the type of any field *or* | |||
8. add, change, or remove a default value of any *existing* field, | 8. adds, changes or removes a default value of any *existing* field | |||
*or* | ||||
9. remove or change any defined constant or constant value, | 9. removes or changes any defined constant or constant value *or* | |||
10. change any enumeration type except extending | 10. changes any enumeration type except extending | |||
'common.TIETypeType' (use of enumeration types is generally | `common.TIETypeType` (use of enumeration types is generally | |||
discouraged), or | discouraged) *or* | |||
11. add a new TIE type to _TIETypeType_ with the flooding scope | 11. adds new TIE type to _TIETypeType_ with flooding scope different | |||
different from the prefix TIE flooding scope | from prefix TIE flooding scope | |||
the major version of the schema MUST increase. All other changes | the major version of the schema MUST increase. All other changes | |||
MUST increase the minor version within the same major. | MUST increase the minor version within the same major. | |||
Introducing an optional field does not cause a major version increase | Introducing an optional field does not cause a major version increase | |||
even if the fields inside the structure are optional with defaults. | even if the fields inside the structure are optional with defaults. | |||
All signed integers, as forced by Thrift [thrift] support, must be | All signed integers, as forced by Thrift [thrift] support, must be | |||
cast for internal purposes to equivalent unsigned values without | cast for internal purposes to equivalent unsigned values without | |||
discarding the signedness bit. An implementation SHOULD try to avoid | discarding the signedness bit. An implementation SHOULD try to avoid | |||
skipping to change at line 5717 ¶ | skipping to change at line 5678 ¶ | |||
To support new TIE types without increasing the major version | To support new TIE types without increasing the major version | |||
enumeration, _TIEElement_ can be extended with new optional elements | enumeration, _TIEElement_ can be extended with new optional elements | |||
for new 'common.TIETypeType' values as long the scope of the new TIE | for new 'common.TIETypeType' values as long the scope of the new TIE | |||
matches the prefix TIE scope. In case it is necessary to understand | matches the prefix TIE scope. In case it is necessary to understand | |||
whether all nodes can parse the new TIE type, a node capability MUST | whether all nodes can parse the new TIE type, a node capability MUST | |||
be added in _NodeCapabilities_ to prevent a non-homogenous network. | be added in _NodeCapabilities_ to prevent a non-homogenous network. | |||
7.2. common.thrift | 7.2. common.thrift | |||
This schema references [RFC5837], [RFC5880], and [RFC6550]. | ||||
/** | /** | |||
Thrift file with common definitions for RIFT | Thrift file with common definitions for RIFT | |||
*/ | */ | |||
namespace py common | namespace py common | |||
/** @note MUST be interpreted in implementation as unsigned 64 bits. | /** @note MUST be interpreted in implementation as unsigned 64 bits. | |||
*/ | */ | |||
typedef i64 SystemIDType | typedef i64 SystemIDType | |||
typedef i32 IPv4Address | typedef i32 IPv4Address | |||
skipping to change at line 5790 ¶ | skipping to change at line 5753 ¶ | |||
value MUST be interpreted in implementation as unsigned */ | value MUST be interpreted in implementation as unsigned */ | |||
typedef i8 PrefixTransactionIDType | typedef i8 PrefixTransactionIDType | |||
/** Timestamp per IEEE 802.1AS, all values MUST be interpreted in | /** Timestamp per IEEE 802.1AS, all values MUST be interpreted in | |||
implementation as unsigned. */ | implementation as unsigned. */ | |||
struct IEEE802_1ASTimeStampType { | struct IEEE802_1ASTimeStampType { | |||
1: required i64 AS_sec; | 1: required i64 AS_sec; | |||
2: optional i32 AS_nsec; | 2: optional i32 AS_nsec; | |||
} | } | |||
/** generic counter type */ | /** generic counter type */ | |||
typedef i64 CounterType | typedef i64 CounterType | |||
/** Platform Interface Index type, i.e., index of interface on hardware, | /** Platform Interface Index type, i.e., index of interface on | |||
can be used, e.g., with RFC 5837 */ | hardware, can be used, e.g., with RFC 5837 */ | |||
typedef i32 PlatformInterfaceIndex | typedef i32 PlatformInterfaceIndex | |||
/** Flags indicating node configuration in case of ZTP. | /** Flags indicating node configuration in case of ZTP. | |||
*/ | */ | |||
enum HierarchyIndications { | enum HierarchyIndications { | |||
/** forces level to 'leaf_level' and enables according procedures */ | /** forces level to 'leaf_level' and enables | |||
according procedures */ | ||||
leaf_only = 0, | leaf_only = 0, | |||
/** forces level to 'leaf_level' and enables according procedures */ | /** forces level to 'leaf_level' and enables | |||
according procedures */ | ||||
leaf_only_and_leaf_2_leaf_procedures = 1, | leaf_only_and_leaf_2_leaf_procedures = 1, | |||
/** forces level to 'top_of_fabric' and enables according | /** forces level to 'top_of_fabric' and enables according | |||
procedures */ | procedures */ | |||
top_of_fabric = 2, | top_of_fabric = 2, | |||
} | } | |||
const PacketNumberType undefined_packet_number = 0 | const PacketNumberType undefined_packet_number = 0 | |||
/** used when node is configured as top of fabric in ZTP.*/ | /** used when node is configured as top of fabric in ZTP.*/ | |||
const LevelType top_of_fabric_level = 24 | const LevelType top_of_fabric_level = 24 | |||
/** default bandwidth on a link */ | /** default bandwidth on a link */ | |||
skipping to change at line 5831 ¶ | skipping to change at line 5796 ¶ | |||
/** any distance larger than this will be considered infinity */ | /** any distance larger than this will be considered infinity */ | |||
const MetricType infinite_distance = 0x7FFFFFFF | const MetricType infinite_distance = 0x7FFFFFFF | |||
/** represents invalid distance */ | /** represents invalid distance */ | |||
const MetricType invalid_distance = 0 | const MetricType invalid_distance = 0 | |||
const bool overload_default = false | const bool overload_default = false | |||
const bool flood_reduction_default = true | const bool flood_reduction_default = true | |||
/** default LIE FSM LIE TX interval time */ | /** default LIE FSM LIE TX interval time */ | |||
const TimeIntervalInSecType default_lie_tx_interval = 1 | const TimeIntervalInSecType default_lie_tx_interval = 1 | |||
/** default LIE FSM holddown time */ | /** default LIE FSM holddown time */ | |||
const TimeIntervalInSecType default_lie_holdtime = 3 | const TimeIntervalInSecType default_lie_holdtime = 3 | |||
/** multipler for default_lie_holdtime to hold down multiple neighbors */ | /** multiplier for default_lie_holdtime to | |||
const i8 multiple_neighbors_lie_holdtime_multipler = 4 | holddown multiple neighbors */ | |||
const i8 multiple_neighbors_lie_holdtime_multiplier = 4 | ||||
/** default ZTP FSM holddown time */ | /** default ZTP FSM holddown time */ | |||
const TimeIntervalInSecType default_ztp_holdtime = 1 | const TimeIntervalInSecType default_ztp_holdtime = 1 | |||
/** by default LIE levels are ZTP offers */ | /** by default LIE levels are ZTP offers */ | |||
const bool default_not_a_ztp_offer = false | const bool default_not_a_ztp_offer = false | |||
/** by default everyone is repeating flooding */ | /** by default everyone is repeating flooding */ | |||
const bool default_you_are_flood_repeater = true | const bool default_you_are_flood_repeater = true | |||
/** 0 is illegal for System IDs */ | /** 0 is illegal for System IDs */ | |||
const SystemIDType IllegalSystemID = 0 | const SystemIDType IllegalSystemID = 0 | |||
/** empty set of nodes */ | /** empty set of nodes */ | |||
const set<SystemIDType> empty_set_of_nodeids = {} | const set<SystemIDType> empty_set_of_nodeids = {} | |||
/** default lifetime of TIE is one week */ | /** default lifetime of TIE is one week */ | |||
const LifeTimeInSecType default_lifetime = 604800 | const LifeTimeInSecType default_lifetime = 604800 | |||
/** default lifetime when TIEs are purged is 5 minutes */ | /** default lifetime when TIEs are purged is 5 minutes */ | |||
const LifeTimeInSecType purge_lifetime = 300 | const LifeTimeInSecType purge_lifetime = 300 | |||
/** optional round down interval when TIEs are sent with security signatures | /** optional round down interval when | |||
to prevent excessive computation. **/ | * TIEs are sent with security signatures | |||
* to prevent excessive computation. | ||||
*/ | ||||
const LifeTimeInSecType rounddown_lifetime_interval = 60 | const LifeTimeInSecType rounddown_lifetime_interval = 60 | |||
/** any 'TieHeader' that has a smaller lifetime difference | /** any 'TieHeader' that has a smaller lifetime difference | |||
than this constant is equal (if other fields equal). */ | than this constant is equal (if other fields equal). */ | |||
const LifeTimeInSecType lifetime_diff2ignore = 400 | const LifeTimeInSecType lifetime_diff2ignore = 400 | |||
/** default UDP port to run LIEs on */ | /** default UDP port to run LIEs on */ | |||
const UDPPortType default_lie_udp_port = 914 | const UDPPortType default_lie_udp_port = 914 | |||
/** default UDP port to receive TIEs on, which can be peer specific */ | /** default UDP port to receive TIEs on, | |||
which can be peer specific */ | ||||
const UDPPortType default_tie_udp_flood_port = 915 | const UDPPortType default_tie_udp_flood_port = 915 | |||
/** default MTU link size to use */ | /** default MTU link size to use */ | |||
const MTUSizeType default_mtu_size = 1400 | const MTUSizeType default_mtu_size = 1400 | |||
/** default link being BFD capable */ | /** default link being BFD capable */ | |||
const bool bfd_default = true | const bool bfd_default = true | |||
/** type used to target nodes with key value */ | /** type used to target nodes with key value */ | |||
typedef i64 KeyValueTargetType | typedef i64 KeyValueTargetType | |||
/** default target for key value are all nodes. */ | /** default target for key value are all nodes. */ | |||
const KeyValueTargetType keyvaluetarget_default = 0 | const KeyValueTargetType keyvaluetarget_default = 0 | |||
/** value for _all leaves_ addressing. Represented by all bits set. */ | /** value for _all leaves_ addressing. | |||
Represented by all bits set. */ | ||||
const KeyValueTargetType keyvaluetarget_all_south_leaves = -1 | const KeyValueTargetType keyvaluetarget_all_south_leaves = -1 | |||
/** undefined nonce, equivalent to missing nonce */ | /** undefined nonce, equivalent to missing nonce */ | |||
const NonceType undefined_nonce = 0; | const NonceType undefined_nonce = 0; | |||
/** outer security key ID, MUST be interpreted as in implementation | /** outer security key ID, MUST be interpreted as in implementation | |||
as unsigned */ | as unsigned */ | |||
typedef i8 OuterSecurityKeyID | typedef i8 OuterSecurityKeyID | |||
/** security key ID, MUST be interpreted as in implementation | /** security key ID, MUST be interpreted as in implementation | |||
as unsigned */ | as unsigned */ | |||
typedef i32 TIESecurityKeyID | typedef i32 TIESecurityKeyID | |||
skipping to change at line 6060 ¶ | skipping to change at line 6030 ¶ | |||
/** Capabilities the node supports. */ | /** Capabilities the node supports. */ | |||
struct NodeCapabilities { | struct NodeCapabilities { | |||
/** Must advertise supported minor version dialect that way. */ | /** Must advertise supported minor version dialect that way. */ | |||
1: required common.MinorVersionType protocol_minor_version = | 1: required common.MinorVersionType protocol_minor_version = | |||
protocol_minor_version; | protocol_minor_version; | |||
/** indicates that node supports flood reduction. */ | /** indicates that node supports flood reduction. */ | |||
2: optional bool flood_reduction = | 2: optional bool flood_reduction = | |||
common.flood_reduction_default; | common.flood_reduction_default; | |||
/** indicates place in hierarchy, i.e., top of fabric or | /** indicates place in hierarchy, i.e., top of fabric or | |||
leaf only (in ZTP) or support for leaf-to-leaf | leaf only (in ZTP) or support for L2L | |||
procedures. */ | procedures. */ | |||
3: optional common.HierarchyIndications hierarchy_indications; | 3: optional common.HierarchyIndications hierarchy_indications; | |||
} | } | |||
/** Link capabilities. */ | /** Link capabilities. */ | |||
struct LinkCapabilities { | struct LinkCapabilities { | |||
/** Indicates that the link is supporting BFD. */ | /** Indicates that the link is supporting BFD. */ | |||
1: optional bool bfd = | 1: optional bool bfd = | |||
common.bfd_default; | common.bfd_default; | |||
/** Indicates whether the interface will support IPv4 | /** Indicates whether the interface will support IPv4 | |||
skipping to change at line 6375 ¶ | skipping to change at line 6345 ¶ | |||
the leaf routes in their own PoD to prevent traffic loss. | the leaf routes in their own PoD to prevent traffic loss. | |||
2. Leaf nodes only hold their own North TIEs and the South TIEs of | 2. Leaf nodes only hold their own North TIEs and the South TIEs of | |||
level 1 nodes they are connected to. | level 1 nodes they are connected to. | |||
3. Leaf nodes do not have to support any type of disaggregation | 3. Leaf nodes do not have to support any type of disaggregation | |||
computation or propagation. | computation or propagation. | |||
4. Leaf nodes are not required to support the overload flag. | 4. Leaf nodes are not required to support the overload flag. | |||
5. Leaf nodes do not need to originate S-TIEs unless optional leaf- | 5. Leaf nodes do not need to originate S-TIEs unless optional L2L | |||
to-leaf features are desired. | features are desired. | |||
8.2. Considerations for Spine Implementation | 8.2. Considerations for Spine Implementation | |||
Nodes that do not act as ToF are not required to discover fallen | Nodes that do not act as ToF are not required to discover fallen | |||
leaves by comparing reachable destinations with peers and therefore | leaves by comparing reachable destinations with peers and therefore | |||
do not need to run the computation of disaggregated routes based on | do not need to run the computation of disaggregated routes based on | |||
that discovery. On the other hand, non-ToF nodes need to respect | that discovery. On the other hand, non-ToF nodes need to respect | |||
disaggregated routes advertised from the north. In the case of | disaggregated routes advertised from the north. In the case of | |||
negative disaggregation, spines nodes need to generate southbound | negative disaggregation, spines nodes need to generate southbound | |||
disaggregated routes when all parents are lost for a fallen leaf. | disaggregated routes when all parents are lost for a fallen leaf. | |||
skipping to change at line 6512 ¶ | skipping to change at line 6482 ¶ | |||
combination must match the ongoing exchange and is then limited to | combination must match the ongoing exchange and is then limited to | |||
only a single flap since both nodes will advance their nonces in case | only a single flap since both nodes will advance their nonces in case | |||
the adjacency state changed. Even in the most unlikely case, the | the adjacency state changed. Even in the most unlikely case, the | |||
attack length is limited due to both sides periodically increasing | attack length is limited due to both sides periodically increasing | |||
their nonces. | their nonces. | |||
Generally, since weak nonces are not changed on every packet for | Generally, since weak nonces are not changed on every packet for | |||
performance reasons, a conceivable attack vector by a man in the | performance reasons, a conceivable attack vector by a man in the | |||
middle is to flood a receiving node with the maximum bandwidth of | middle is to flood a receiving node with the maximum bandwidth of | |||
recently observed packets, both LIEs as well as TIEs. In a scenario | recently observed packets, both LIEs as well as TIEs. In a scenario | |||
where such attacks are likely, _maximum_valid_nonce_delta_ can be | where such attacks are likely, _maximum_valid_nonce_delta_ and | |||
implemented as configurable, small value and | _nonce_regeneration_interval_ can be implemented as configurable and | |||
_nonce_regeneration_interval_ configured to very small value as well. | set to small values. This will likely present a significant | |||
This will likely present a significant computational load on large | computational load on large fabrics under normal operation. | |||
fabrics under normal operation. | ||||
9.8. TIE Origin Fingerprint DoS Attacks | 9.8. TIE Origin Fingerprint DoS Attacks | |||
Even when a mechanism in Section 10.2 is enabled to generate inner | Even when a mechanism in Section 10.2 is enabled to generate inner | |||
fingerprints or signatures, further attack considerations apply. | fingerprints or signatures, further attack considerations apply. | |||
In case the inner fingerprint could be generated by a compromised | In case the inner fingerprint could be generated by a compromised | |||
node in the network other than the originator based on shared | node in the network other than the originator based on shared | |||
secrets, the deployment must fall back on use of signatures that can | secrets, the deployment must fall back on use of signatures that can | |||
be validated but not generated by any other node except the | be validated but not generated by any other node except the | |||
originator. | originator. | |||
A compromised node in the network can attempt to brute force "fake | A compromised node in the network can attempt to brute force "fake | |||
TIEs" using other nodes' TIE origin key identifiers without | TIEs" using other nodes' TIE origin key ID without possessing the | |||
possessing the necessary secrets. Albeit the ultimate validation of | necessary secrets. Albeit the ultimate validation of the origin | |||
the origin signature will fail in such scenarios and not progress | signature will fail in such scenarios and not progress further than | |||
further than immediately peering nodes, the resulting DoS attack | immediately peering nodes, the resulting DoS attack seems unavoidable | |||
seems unavoidable since the TIE origin key ID is only protected by | since the TIE origin key ID is only protected by the (here assumed to | |||
the (here assumed to be compromised) node. | be compromised) node. | |||
9.9. Host Implementations | 9.9. Host Implementations | |||
It can be reasonably expected that the proliferation of RotH servers, | It can be reasonably expected that the proliferation of RotH servers, | |||
rather than dedicated networking devices, will represent a | rather than dedicated networking devices, will represent a | |||
significant amount of RIFT devices. Given their normally far wider | significant amount of RIFT devices. Given their normally far wider | |||
software envelope and access granted to them, such servers are also | software envelope and access granted to them, such servers are also | |||
far more likely to be compromised and present an attack vector on the | far more likely to be compromised and present an attack vector on the | |||
protocol. Hijacking of prefixes to attract traffic is a trust | protocol. Hijacking of prefixes to attract traffic is a trust | |||
problem and cannot be easily addressed within the protocol if the | problem and cannot be easily addressed within the protocol if the | |||
skipping to change at line 6560 ¶ | skipping to change at line 6529 ¶ | |||
attempting similar resource overrun attacks. A prudent | attempting similar resource overrun attacks. A prudent | |||
implementation forming adjacencies to leaves should implement | implementation forming adjacencies to leaves should implement | |||
threshold mechanisms and raise warnings when, e.g., a leaf is | threshold mechanisms and raise warnings when, e.g., a leaf is | |||
advertising an excess number of TIEs or prefixes. Additionally, such | advertising an excess number of TIEs or prefixes. Additionally, such | |||
implementation could refuse any topology information except the | implementation could refuse any topology information except the | |||
node's own TIEs and authenticated, reflected South Node TIEs at their | node's own TIEs and authenticated, reflected South Node TIEs at their | |||
own level. | own level. | |||
To isolate possible attack vectors on the leaf to the largest | To isolate possible attack vectors on the leaf to the largest | |||
possible extent, a dedicated leaf-only implementation could run | possible extent, a dedicated leaf-only implementation could run | |||
without any configuration by hard-coding a well-known adjacency key | without any configuration by: | |||
(which can be always rolled over by the means of, e.g., a well-known | ||||
key value distributed from the top of the fabric), leaf level value | * hard-coding a well-known adjacency key (which can be always rolled | |||
and always setting overload flag. All other values can be derived by | over by means of, e.g., a well-known key-value distributed from | |||
automatic means as described above. | top of the fabric), | |||
* hard-coding a leaf level value, and | ||||
* always setting the overload flag | ||||
9.9.1. IPv4 Broadcast and IPv6 All-Routers Multicast Implementations | 9.9.1. IPv4 Broadcast and IPv6 All-Routers Multicast Implementations | |||
Section 6.2 describes an optional implementation that supports LIE | Section 6.2 describes an optional implementation that supports LIE | |||
exchange over IPv4 broadcast addresses and/or the IPv6 all-routers | exchange over IPv4 broadcast addresses and/or the IPv6 all-routers | |||
multicast address. It is important to consider that if an | multicast address. It is important to consider that if an | |||
implementation supports this, the attack surface widens as LIEs may | implementation supports this, the attack surface widens as LIEs may | |||
be propagated to devices outside of the intended RIFT topology. This | be propagated to devices outside of the intended RIFT topology. This | |||
may leave RIFT nodes more susceptible to the various attack vectors | may leave RIFT nodes more susceptible to the various attack vectors | |||
already described in this section. | already described in this section. | |||
skipping to change at line 6607 ¶ | skipping to change at line 6580 ¶ | |||
Description: Routing in Fat Trees Link Information Element | Description: Routing in Fat Trees Link Information Element | |||
Assignee: IESG (iesg@ietf.org) | Assignee: IESG (iesg@ietf.org) | |||
Contact: IETF Chair (chair@ietf.org) | Contact: IETF Chair (chair@ietf.org) | |||
Reference: RFC 9692 | Reference: RFC 9692 | |||
_RIFT TIE Port_ | _RIFT TIE Port_ | |||
Service Name: rift-ties | Service Name: rift-ties | |||
Port Number: 915 | Port Number: 915 | |||
Transport Protocol: udp | Transport Protocol: udp | |||
Description: Routing in Fat Trees Topology Information Element | ||||
Assignee: IESG (iesg@ietf.org) | Assignee: IESG (iesg@ietf.org) | |||
Contact: IETF Chair (chair@ietf.org) | Contact: IETF Chair (chair@ietf.org) | |||
Description: Routing in Fat Trees Topology Information Element | ||||
Reference: RFC 9692 | Reference: RFC 9692 | |||
10.2. Registry for RIFT Security Algorithms | 10.2. Registry for RIFT Security Algorithms | |||
A new registry has been created to hold the allowed RIFT security | A new registry has been created to hold the allowed RIFT security | |||
algorithms. No particular enumeration values are necessary since | algorithms. No particular enumeration values are necessary since | |||
RIFT uses a key ID abstraction on packets without disclosing any | RIFT uses a key ID abstraction on packets without disclosing any | |||
information about the algorithm or secrets used and only carries the | information about the algorithm or secrets used and only carries the | |||
resulting fingerprint or signature protecting the integrity of the | resulting fingerprint or signature protecting the integrity of the | |||
data. | data. | |||
The registry applies the "Specification Required" policy per | The registry applies the "Specification Required" policy per | |||
[RFC8126]. The designated expert should ensure that the algorithms | [RFC8126]. The designated expert should ensure that the algorithms | |||
suggested represent the state of the art at a given point in time and | suggested represent the state of the art at a given point in time and | |||
avoid introducing algorithms that do not represent enhanced security | avoid introducing algorithms that do not represent enhanced security | |||
properties or ensure such properties at a lower cost as compared to | properties or ensure such properties at a lower cost as compared to | |||
existing registry entries. | existing registry entries. | |||
+==========================+============+==========================+ | +==========================+==========================+============+ | |||
| Name | Reference | Recommendation | | | Name | Recommendation | Reference | | |||
+==========================+============+==========================+ | +==========================+==========================+============+ | |||
| HMAC-SHA256 | [SHA-2] | Simplest way to ensure | | | HMAC-SHA256 | Simplest way to ensure | [SHA-2] | | |||
| | and | integrity of | | | | integrity of | and | | |||
| | [RFC2104] | transmissions across | | | | transmissions across | [RFC2104] | | |||
| | | adjacencies when used as | | | | adjacencies when used as | | | |||
| | | outer key and integrity | | | | outer keys and integrity | | | |||
| | | of TIEs when used as | | | | of TIEs when used as | | | |||
| | | inner keys. Recommended | | | | inner keys. Recommended | | | |||
| | | for most interoperable | | | | for most interoperable | | | |||
| | | security protection. | | | | security protection. | | | |||
+--------------------------+------------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
| HMAC-SHA512 | [SHA-2] | Same as HMAC-SHA256 with | | | HMAC-SHA512 | Same as HMAC-SHA256 with | [SHA-2] | | |||
| | and | stronger protection. | | | | stronger protection. | and | | |||
| | [RFC2104] | | | | | | [RFC2104] | | |||
+--------------------------+------------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
| SHA256-RSASSA-PKCS1-v1_5 | [RFC8017], | Recommended for high | | | SHA256-RSASSA-PKCS1-v1_5 | Recommended for high | [RFC8017], | | |||
| | Section | security applications | | | | security applications | Section | | |||
| | 8.2 | where private keys are | | | | where private keys are | 8.2 | | |||
| | | protected by according | | | | protected by according | | | |||
| | | nodes. Recommended as | | | | nodes. Recommended as | | | |||
| | | well in case not only | | | | well in case not only | | | |||
| | | integrity but origin | | | | integrity but origin | | | |||
| | | validation is necessary | | | | validation is necessary | | | |||
| | | for TIEs. Recommended | | | | for TIEs. Recommended | | | |||
| | | when adjacencies must be | | | | when adjacencies must be | | | |||
| | | protected without | | | | protected without | | | |||
| | | disclosing the secrets | | | | disclosing the secrets | | | |||
| | | on both sides of the | | | | on both sides of the | | | |||
| | | adjacency. | | | | adjacency. | | | |||
+--------------------------+------------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
| SHA512-RSASSA-PKCS1-v1_5 | [RFC8017] | Same as SHA256-RSASSA- | | | SHA512-RSASSA-PKCS1-v1_5 | Same as SHA256-RSASSA- | [RFC8017] | | |||
| | | PKCS1-v1_5 with stronger | | | | PKCS1-v1_5 with stronger | | | |||
| | | protection. | | | | protection. | | | |||
+--------------------------+------------+--------------------------+ | +--------------------------+--------------------------+------------+ | |||
Table 7 | Table 7: RIFT Security Algorithms | |||
10.3. Registries with Assigned Values for Schema Values | 10.3. Registries with Assigned Values for Schema Values | |||
This section requests registries that help govern the schema via the | This section requests registries that help govern the schema via the | |||
usual IANA registry procedures. The registry group "Routing in Fat | usual IANA registry procedures. The registry group "Routing in Fat | |||
Trees (RIFT)" holds the following registries. Registry values are | Trees (RIFT)" holds the following registries. Registry values are | |||
stored with their minimum and maximum version in which they are | stored with their minimum and maximum version in which they are | |||
available. All values not provided are to be considered | available. All values not provided are to be considered | |||
"Unassigned". The range of every registry is a 16-bit integer. | "Unassigned". The range of every registry is a 16-bit integer. | |||
Allocation of new values is performed via "Expert Review" action only | Allocation of new values is performed via "Expert Review" action only | |||
skipping to change at line 6727 ¶ | skipping to change at line 6700 ¶ | |||
+-------+-----------------------+-------------+---------+---------+ | +-------+-----------------------+-------------+---------+---------+ | |||
| 4 | AddressFamilyMaxValue | 8.0 | | | | | 4 | AddressFamilyMaxValue | 8.0 | | | | |||
+-------+-----------------------+-------------+---------+---------+ | +-------+-----------------------+-------------+---------+---------+ | |||
Table 9: Address Family Type | Table 9: Address Family Type | |||
10.3.3. RIFTCommonHierarchyIndications Registry | 10.3.3. RIFTCommonHierarchyIndications Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+====================================+=====+=======+=======+=======+ | +=====+====================================+=======+=======+=======+ | |||
|Name |Value|Min. |Max. |Comment| | |Value|Name |Min. |Max. |Comment| | |||
| | |Schema |Schema | | | | | |Schema |Schema | | | |||
| | |Version|Version| | | | | |Version|Version| | | |||
+====================================+=====+=======+=======+=======+ | +=====+====================================+=======+=======+=======+ | |||
|leaf_only |0 |8.0 | | | | |0 |leaf_only |8.0 | | | | |||
+------------------------------------+-----+-------+-------+-------+ | +-----+------------------------------------+-------+-------+-------+ | |||
|leaf_only_and_leaf_2_leaf_procedures|1 |8.0 | | | | |1 |leaf_only_and_leaf_2_leaf_procedures|8.0 | | | | |||
+------------------------------------+-----+-------+-------+-------+ | +-----+------------------------------------+-------+-------+-------+ | |||
|top_of_fabric |2 |8.0 | | | | |2 |top_of_fabric |8.0 | | | | |||
+------------------------------------+-----+-------+-------+-------+ | +-----+------------------------------------+-------+-------+-------+ | |||
Table 10: Flags Indicating Node Configuration in Case of ZTP | Table 10: Flags Indicating Node Configuration in Case of ZTP | |||
10.3.4. RIFTCommonIEEE8021ASTimeStampType Registry | 10.3.4. RIFTCommonIEEE8021ASTimeStampType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
The timestamp is per IEEE 802.1AS; all values MUST be interpreted in | The timestamp is per IEEE 802.1AS; all values MUST be interpreted in | |||
implementation as unsigned. | implementation as unsigned. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| AS_sec | 1 | 8.0 | | | | | 1 | AS_sec | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| AS_nsec | 2 | 8.0 | | | | | 2 | AS_nsec | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 11 | Table 11 | |||
10.3.5. RIFTCommonIPAddressType Registry | 10.3.5. RIFTCommonIPAddressType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+=============+=======+=====================+=============+=========+ | +=======+=============+=====================+=============+=========+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+=============+=======+=====================+=============+=========+ | +=======+=============+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-------------+-------+---------------------+-------------+---------+ | +-------+-------------+---------------------+-------------+---------+ | |||
| ipv4address | 1 | 8.0 | | Content | | | 1 | ipv4address | 8.0 | | Content | | |||
| | | | | is IPv4 | | | | | | | is IPv4 | | |||
+-------------+-------+---------------------+-------------+---------+ | +-------+-------------+---------------------+-------------+---------+ | |||
| ipv6address | 2 | 8.0 | | Content | | | 2 | ipv6address | 8.0 | | Content | | |||
| | | | | is IPv6 | | | | | | | is IPv6 | | |||
+-------------+-------+---------------------+-------------+---------+ | +-------+-------------+---------------------+-------------+---------+ | |||
Table 12: IP Address Type | Table 12: IP Address Type | |||
10.3.6. RIFTCommonIPPrefixType Registry | 10.3.6. RIFTCommonIPPrefixType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
Note: For interface addresses, the protocol can propagate the address | | Note: For interface addresses the protocol can propagate the | |||
part beyond the subnet mask and on reachability computation that has | | address part beyond the subnet mask and on reachability | |||
to be normalized. The non-significant bits can be used for | | computation the non-significant bits have to be normalized. | |||
operational purposes. | | Those bits can be used for operational purposes. | |||
+============+=======+=====================+=============+=========+ | +=======+============+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+============+=======+=====================+=============+=========+ | +=======+============+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+------------+-------+---------------------+-------------+---------+ | +-------+------------+---------------------+-------------+---------+ | |||
| ipv4prefix | 1 | 8.0 | | | | | 1 | ipv4prefix | 8.0 | | | | |||
+------------+-------+---------------------+-------------+---------+ | +-------+------------+---------------------+-------------+---------+ | |||
| ipv6prefix | 2 | 8.0 | | | | | 2 | ipv6prefix | 8.0 | | | | |||
+------------+-------+---------------------+-------------+---------+ | +-------+------------+---------------------+-------------+---------+ | |||
Table 13: Prefix Advertisement | Table 13: Prefix Advertisement | |||
10.3.7. RIFTCommonIPv4PrefixType Registry | 10.3.7. RIFTCommonIPv4PrefixType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
| address | 1 | 8.0 | | | | | 1 | address | 8.0 | | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
| prefixlen | 2 | 8.0 | | | | | 2 | prefixlen | 8.0 | | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
Table 14: IPv4 Prefix Type | Table 14: IPv4 Prefix Type | |||
10.3.8. RIFTCommonIPv6PrefixType Registry | 10.3.8. RIFTCommonIPv6PrefixType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
| address | 1 | 8.0 | | | | | 1 | address | 8.0 | | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
| prefixlen | 2 | 8.0 | | | | | 2 | prefixlen | 8.0 | | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
Table 15: IPv6 Prefix Type | Table 15: IPv6 Prefix Type | |||
10.3.9. RIFTCommonKVTypes Registry | 10.3.9. RIFTCommonKVTypes Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==============+=======+=============+=============+=========+ | +=======+==============+=============+=============+=========+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+==============+=======+=============+=============+=========+ | +=======+==============+=============+=============+=========+ | |||
| Unassigned | 0 | | | | | | 0 | Unassigned | | | | | |||
+--------------+-------+-------------+-------------+---------+ | +-------+--------------+-------------+-------------+---------+ | |||
| Experimental | 1 | 8.0 | | | | | 1 | Experimental | 8.0 | | | | |||
+--------------+-------+-------------+-------------+---------+ | +-------+--------------+-------------+-------------+---------+ | |||
| WellKnown | 2 | 8.0 | | | | | 2 | WellKnown | 8.0 | | | | |||
+--------------+-------+-------------+-------------+---------+ | +-------+--------------+-------------+-------------+---------+ | |||
| OUI | 3 | 8.0 | | | | | 3 | OUI | 8.0 | | | | |||
+--------------+-------+-------------+-------------+---------+ | +-------+--------------+-------------+-------------+---------+ | |||
Table 16 | Table 16 | |||
10.3.10. RIFTCommonPrefixSequenceType Registry | 10.3.10. RIFTCommonPrefixSequenceType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===============+=======+=========+==========+===================+ | +=======+===============+=========+==========+===================+ | |||
| Name | Value | Min. | Max. | Comment | | | Value | Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+===============+=======+=========+==========+===================+ | +=======+===============+=========+==========+===================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
| timestamp | 1 | 8.0 | | | | | 1 | timestamp | 8.0 | | | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
| transactionid | 2 | 8.0 | | Transaction ID | | | 2 | transactionid | 8.0 | | Transaction ID | | |||
| | | | | set by client in, | | | | | | | set by client in, | | |||
| | | | | e.g., 6LoWPAN. | | | | | | | e.g., 6LoWPAN. | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
Table 17: Sequence of a Prefix in Case of Move | Table 17: Sequence of a Prefix in Case of Move | |||
10.3.11. RIFTCommonRouteType Registry | 10.3.11. RIFTCommonRouteType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
Note: The only purpose of these values is to introduce an ordering, | | Note: The only purpose of these values is to introduce an | |||
whereas an implementation can internally choose any other values as | | ordering, whereas an implementation can internally choose any | |||
long the ordering is preserved. | | other values as long the ordering is preserved. | |||
+=====================+=======+=============+=============+=========+ | +=======+=====================+=============+=============+=========+ | |||
| Name | Value | Min. Schema | Max. | Comment | | | Value | Name | Min. Schema | Max. | Comment | | |||
| | | Version | Schema | | | | | | Version | Schema | | | |||
| | | | Version | | | | | | | Version | | | |||
+=====================+=======+=============+=============+=========+ | +=======+=====================+=============+=============+=========+ | |||
| Illegal | 0 | 8.0 | | | | | 0 | Illegal | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| RouteTypeMinValue | 1 | 8.0 | | | | | 1 | RouteTypeMinValue | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| Discard | 2 | 8.0 | | | | | 2 | Discard | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| LocalPrefix | 3 | 8.0 | | | | | 3 | LocalPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| SouthPGPPrefix | 4 | 8.0 | | | | | 4 | SouthPGPPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| NorthPGPPrefix | 5 | 8.0 | | | | | 5 | NorthPGPPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| NorthPrefix | 6 | 8.0 | | | | | 6 | NorthPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| NorthExternalPrefix | 7 | 8.0 | | | | | 7 | NorthExternalPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| SouthPrefix | 8 | 8.0 | | | | | 8 | SouthPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| SouthExternalPrefix | 9 | 8.0 | | | | | 9 | SouthExternalPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| NegativeSouthPrefix | 10 | 8.0 | | | | | 10 | NegativeSouthPrefix | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
| RouteTypeMaxValue | 11 | 8.0 | | | | | 11 | RouteTypeMaxValue | 8.0 | | | | |||
+---------------------+-------+-------------+-------------+---------+ | +-------+---------------------+-------------+-------------+---------+ | |||
Table 18: RIFT Route Types | Table 18: RIFT Route Types | |||
10.3.12. RIFTCommonTIETypeType Registry | 10.3.12. RIFTCommonTIETypeType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===================================+=====+=======+=======+=======+ | +=====+===========================================+=======+=======+=======+ | |||
|Name |Value|Min. |Max. |Comment| | |Value|Name |Min. |Max. |Comment| | |||
| | |Schema |Schema | | | | | |Schema |Schema | | | |||
| | |Version|Version| | | | | |Version|Version| | | |||
+===================================+=====+=======+=======+=======+ | +=====+===========================================+=======+=======+=======+ | |||
|Illegal |0 |8.0 | | | | |0 |Illegal |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|TIETypeMinValue |1 |8.0 | | | | |1 |TIETypeMinValue |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|NodeTIEType |2 |8.0 | | | | |2 |NodeTIEType |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|PrefixTIEType |3 |8.0 | | | | |3 |PrefixTIEType |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|PositiveDisaggregationPrefixTIEType|4 |8.0 | | | | |4 |PositiveDisaggregationPrefixTIEType |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|NegativeDisaggregationPrefixTIEType|5 |8.0 | | | | |5 |NegativeDisaggregationPrefixTIEType |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|PGPrefixTIEType |6 |8.0 | | | | |6 |PGPrefixTIEType |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|KeyValueTIEType |7 |8.0 | | | | |7 |KeyValueTIEType |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|ExternalPrefixTIEType |8 |8.0 | | | | |8 |ExternalPrefixTIEType |8.0 | | | | |||
+-----------------------------------+-----+-------+-------+-------+ | +-----+-------------------------------------------+-------+-------+-------+ | |||
|PositiveExternalDisaggregation |9 |8.0 | | | | |9 |PositiveExternalDisaggregationPrefixTIEType|8.0 | | | | |||
|PrefixTIEType | | | | | | +-----+-------------------------------------------+-------+-------+-------+ | |||
+-----------------------------------+-----+-------+-------+-------+ | |10 |TIETypeMaxValue |8.0 | | | | |||
|TIETypeMaxValue |10 |8.0 | | | | +-----+-------------------------------------------+-------+-------+-------+ | |||
+-----------------------------------+-----+-------+-------+-------+ | ||||
Table 19: Type of TIE | Table 19: Type of TIE | |||
10.3.13. RIFTCommonTieDirectionType Registry | 10.3.13. RIFTCommonTieDirectionType Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===================+=======+=============+=============+=========+ | +=======+===================+=============+=============+=========+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+===================+=======+=============+=============+=========+ | +=======+===================+=============+=============+=========+ | |||
| Illegal | 0 | 8.0 | | | | | 0 | Illegal | 8.0 | | | | |||
+-------------------+-------+-------------+-------------+---------+ | +-------+-------------------+-------------+-------------+---------+ | |||
| South | 1 | 8.0 | | | | | 1 | South | 8.0 | | | | |||
+-------------------+-------+-------------+-------------+---------+ | +-------+-------------------+-------------+-------------+---------+ | |||
| North | 2 | 8.0 | | | | | 2 | North | 8.0 | | | | |||
+-------------------+-------+-------------+-------------+---------+ | +-------+-------------------+-------------+-------------+---------+ | |||
| DirectionMaxValue | 3 | 8.0 | | | | | 3 | DirectionMaxValue | 8.0 | | | | |||
+-------------------+-------+-------------+-------------+---------+ | +-------+-------------------+-------------+-------------+---------+ | |||
Table 20: Direction of TIEs | Table 20: Direction of TIEs | |||
10.3.14. RIFTEncodingCommunity Registry | 10.3.14. RIFTEncodingCommunity Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==========+=======+=====================+=============+============+ | +=======+==========+=====================+=============+============+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+==========+=======+=====================+=============+============+ | +=======+==========+=====================+=============+============+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+------------+ | +-------+----------+---------------------+-------------+------------+ | |||
| top | 1 | 8.0 | | Higher | | | 1 | top | 8.0 | | Higher | | |||
| | | | | order bits | | | | | | | order bits | | |||
+----------+-------+---------------------+-------------+------------+ | +-------+----------+---------------------+-------------+------------+ | |||
| bottom | 2 | 8.0 | | Lower | | | 2 | bottom | 8.0 | | Lower | | |||
| | | | | order bits | | | | | | | order bits | | |||
+----------+-------+---------------------+-------------+------------+ | +-------+----------+---------------------+-------------+------------+ | |||
Table 21: Prefix Community | Table 21: Prefix Community | |||
10.3.15. RIFTEncodingKeyValueTIEElement Registry | 10.3.15. RIFTEncodingKeyValueTIEElement Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+===========+=======+=====================+=============+=========+ | +=======+===========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
| keyvalues | 1 | 8.0 | | | | | 1 | keyvalues | 8.0 | | | | |||
+-----------+-------+---------------------+-------------+---------+ | +-------+-----------+---------------------+-------------+---------+ | |||
Table 22: Generic Key Value Pairs | Table 22: Generic Key Value Pairs | |||
10.3.16. RIFTEncodingKeyValueTIEElementContent Registry | 10.3.16. RIFTEncodingKeyValueTIEElementContent Registry | |||
This registry has the following initial values. It defines the | This registry has the following initial values. It defines the | |||
targeted nodes and the value carried. | targeted nodes and the value carried. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| targets | 1 | 8.0 | | | | | 1 | targets | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| value | 2 | 8.0 | | | | | 2 | value | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 23 | Table 23 | |||
10.3.17. RIFTEncodingLIEPacket Registry | 10.3.17. RIFTEncodingLIEPacket Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
Note: This node's level is already included on the packet header. | | Note: This node's level is already included on the packet | |||
| header. | ||||
+=============================+=====+=======+========+==============+ | +=====+=============================+=======+========+==============+ | |||
| Name |Value|Min. |Max. |Comment | | |Value| Name |Min. |Max. |Comment | | |||
| | |Schema |Schema | | | | | |Schema |Schema | | | |||
| | |Version|Version | | | | | |Version|Version | | | |||
+=============================+=====+=======+========+==============+ | +=====+=============================+=======+========+==============+ | |||
| Reserved |0 |8.0 |All | | | |0 | Reserved |8.0 |All | | | |||
| | | |Versions| | | | | | |Versions| | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| name |1 |8.0 | |Node or | | |1 | name |8.0 | |Node or | | |||
| | | | |adjacency | | | | | | |adjacency | | |||
| | | | |name. | | | | | | |name. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| local_id |2 |8.0 | |Local link | | |2 | local_id |8.0 | |Local link | | |||
| | | | |ID. | | | | | | |ID. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| flood_port |3 |8.0 | |UDP port to | | |3 | flood_port |8.0 | |UDP port to | | |||
| | | | |which we can | | | | | | |which we can | | |||
| | | | |receive | | | | | | |receive | | |||
| | | | |flooded ties. | | | | | | |flooded ties. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| link_mtu_size |4 |8.0 | |Layer 2 MTU, | | |4 | link_mtu_size |8.0 | |Layer 2 MTU, | | |||
| | | | |used to | | | | | | |used to | | |||
| | | | |discover | | | | | | |discover | | |||
| | | | |mismatch. | | | | | | |mismatch. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| link_bandwidth |5 |8.0 | |Local link | | |5 | link_bandwidth |8.0 | |Local link | | |||
| | | | |bandwidth on | | | | | | |bandwidth on | | |||
| | | | |the | | | | | | |the | | |||
| | | | |interface. | | | | | | |interface. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| neighbor |6 |8.0 | |Reflects the | | |6 | neighbor |8.0 | |Reflects the | | |||
| | | | |neighbor once | | | | | | |neighbor once | | |||
| | | | |received to | | | | | | |received to | | |||
| | | | |provide 3-way | | | | | | |provide 3-way | | |||
| | | | |connectivity. | | | | | | |connectivity. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| pod |7 |8.0 | |Node's PoD. | | |7 | pod |8.0 | |Node's PoD. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| node_capabilities |10 |8.0 | |Node | | |10 | node_capabilities |8.0 | |Node | | |||
| | | | |capabilities | | | | | | |capabilities | | |||
| | | | |supported. | | | | | | |supported. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| link_capabilities |11 |8.0 | |Capabilities | | |11 | link_capabilities |8.0 | |Capabilities | | |||
| | | | |of this link. | | | | | | |of this link. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| holdtime |12 |8.0 | |Required | | |12 | holdtime |8.0 | |Required | | |||
| | | | |holdtime of | | | | | | |holdtime of | | |||
| | | | |the | | | | | | |the | | |||
| | | | |adjacency, | | | | | | |adjacency, | | |||
| | | | |i.e., for how | | | | | | |i.e., for how | | |||
| | | | |long a period | | | | | | |long a period | | |||
| | | | |adjacency | | | | | | |adjacency | | |||
| | | | |should be | | | | | | |should be | | |||
| | | | |kept up | | | | | | |kept up | | |||
| | | | |without valid | | | | | | |without valid | | |||
| | | | |LIE | | | | | | |LIE | | |||
| | | | |reception. | | | | | | |reception. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| label |13 |8.0 | |Optional, | | |13 | label |8.0 | |Optional, | | |||
| | | | |unsolicited, | | | | | | |unsolicited, | | |||
| | | | |downstream | | | | | | |downstream | | |||
| | | | |assigned | | | | | | |assigned | | |||
| | | | |locally | | | | | | |locally | | |||
| | | | |significant | | | | | | |significant | | |||
| | | | |label value | | | | | | |label value | | |||
| | | | |for the | | | | | | |for the | | |||
| | | | |adjacency. | | | | | | |adjacency. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| not_a_ztp_offer |21 |8.0 | |Indicates | | |21 | not_a_ztp_offer |8.0 | |Indicates | | |||
| | | | |that the | | | | | | |that the | | |||
| | | | |level on the | | | | | | |level on the | | |||
| | | | |lie must not | | | | | | |LIE must not | | |||
| | | | |be used to | | | | | | |be used to | | |||
| | | | |derive a ZTP | | | | | | |derive a ZTP | | |||
| | | | |level by the | | | | | | |level by the | | |||
| | | | |receiving | | | | | | |receiving | | |||
| | | | |node. | | | | | | |node. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| you_are_flood_repeater |22 |8.0 | |Indicates to | | |22 | you_are_flood_repeater |8.0 | |Indicates to | | |||
| | | | |the | | | | | | |the | | |||
| | | | |northbound | | | | | | |northbound | | |||
| | | | |neighbor that | | | | | | |neighbor that | | |||
| | | | |it should be | | | | | | |it should be | | |||
| | | | |reflooding | | | | | | |reflooding | | |||
| | | | |ties received | | | | | | |TIEs received | | |||
| | | | |from this | | | | | | |from this | | |||
| | | | |node to | | | | | | |node to | | |||
| | | | |achieve flood | | | | | | |achieve flood | | |||
| | | | |reduction and | | | | | | |reduction and | | |||
| | | | |balancing for | | | | | | |balancing for | | |||
| | | | |northbound | | | | | | |northbound | | |||
| | | | |flooding. | | | | | | |flooding. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| you_are_sending_too_quickly |23 |8.0 | |Indicates to | | |23 | you_are_sending_too_quickly |8.0 | |Indicates to | | |||
| | | | |the neighbor | | | | | | |the neighbor | | |||
| | | | |to flood node | | | | | | |to flood node | | |||
| | | | |ties only and | | | | | | |ties only and | | |||
| | | | |slow down all | | | | | | |slow down all | | |||
| | | | |other ties. | | | | | | |other ties. | | |||
| | | | |Ignored when | | | | | | |Ignored when | | |||
| | | | |received from | | | | | | |received from | | |||
| | | | |the | | | | | | |the | | |||
| | | | |southbound | | | | | | |southbound | | |||
| | | | |neighbor. | | | | | | |neighbor. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| instance_name |24 |8.0 | |Instance name | | |24 | instance_name |8.0 | |Instance name | | |||
| | | | |in case | | | | | | |in case | | |||
| | | | |multiple rift | | | | | | |multiple RIFT | | |||
| | | | |instances | | | | | | |instances are | | |||
| | | | |running on | | | | | | |running on | | |||
| | | | |same | | | | | | |the same | | |||
| | | | |interface. | | | | | | |interface. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
| fabric_id |35 |8.0 | |It provides | | |35 | fabric_id |8.0 | |It provides | | |||
| | | | |the optional | | | | | | |the optional | | |||
| | | | |ID of the | | | | | | |ID of the | | |||
| | | | |fabric | | | | | | |fabric | | |||
| | | | |configured. | | | | | | |configured. | | |||
| | | | |This must | | | | | | |This must | | |||
| | | | |match the | | | | | | |match the | | |||
| | | | |information | | | | | | |information | | |||
| | | | |advertised on | | | | | | |advertised on | | |||
| | | | |the node | | | | | | |the node | | |||
| | | | |element. | | | | | | |element. | | |||
+-----------------------------+-----+-------+--------+--------------+ | +-----+-----------------------------+-------+--------+--------------+ | |||
Table 24: RIFT LIE Packet | Table 24: RIFT LIE Packet | |||
10.3.18. RIFTEncodingLinkCapabilities Registry | 10.3.18. RIFTEncodingLinkCapabilities Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+=========================+=====+=========+==========+==============+ | +=====+=========================+=========+==========+==============+ | |||
| Name |Value| Min. | Max. | Comment | | |Value| Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+=========================+=====+=========+==========+==============+ | +=====+=========================+=========+==========+==============+ | |||
| Reserved |0 | 8.0 | All | | | |0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-------------------------+-----+---------+----------+--------------+ | +-----+-------------------------+---------+----------+--------------+ | |||
| bfd |1 | 8.0 | | Indicates | | |1 | bfd | 8.0 | | Indicates | | |||
| | | | | that the | | | | | | | that the | | |||
| | | | | link is | | | | | | | link is | | |||
| | | | | supporting | | | | | | | supporting | | |||
| | | | | BFD. | | | | | | | BFD. | | |||
+-------------------------+-----+---------+----------+--------------+ | +-----+-------------------------+---------+----------+--------------+ | |||
| ipv4_forwarding_capable |2 | 8.0 | | Indicates | | |2 | ipv4_forwarding_capable | 8.0 | | Indicates | | |||
| | | | | whether the | | | | | | | whether the | | |||
| | | | | interface | | | | | | | interface | | |||
| | | | | will | | | | | | | will | | |||
| | | | | support | | | | | | | support | | |||
| | | | | IPv4 | | | | | | | IPv4 | | |||
| | | | | forwarding. | | | | | | | forwarding. | | |||
+-------------------------+-----+---------+----------+--------------+ | +-----+-------------------------+---------+----------+--------------+ | |||
Table 25: Link Capabilities | Table 25: Link Capabilities | |||
10.3.19. RIFTEncodingLinkIDPair Registry | 10.3.19. RIFTEncodingLinkIDPair Registry | |||
The LinkID pair describes one of the parallel links between two | The LinkID pair describes one of the parallel links between two | |||
nodes. | nodes. | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+============================+=====+=======+========+==============+ | +=====+============================+=======+========+==============+ | |||
| Name |Value|Min. |Max. | Comment | | |Value| Name |Min. |Max. | Comment | | |||
| | |Schema |Schema | | | | | |Schema |Schema | | | |||
| | |Version|Version | | | | | |Version|Version | | | |||
+============================+=====+=======+========+==============+ | +=====+============================+=======+========+==============+ | |||
| Reserved |0 |8.0 |All | | | |0 | Reserved |8.0 |All | | | |||
| | | |Versions| | | | | | |Versions| | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
| local_id |1 |8.0 | | Node-wide | | |1 | local_id |8.0 | | Node-wide | | |||
| | | | | unique value | | | | | | | unique value | | |||
| | | | | for the | | | | | | | for the | | |||
| | | | | local link. | | | | | | | local link. | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
| remote_id |2 |8.0 | | Received the | | |2 | remote_id |8.0 | | Received the | | |||
| | | | | remote link | | | | | | | remote link | | |||
| | | | | ID for this | | | | | | | ID for this | | |||
| | | | | link. | | | | | | | link. | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
| platform_interface_index |10 |8.0 | | Describes | | |10 | platform_interface_index |8.0 | | Describes | | |||
| | | | | the local | | | | | | | the local | | |||
| | | | | interface | | | | | | | interface | | |||
| | | | | index of the | | | | | | | index of the | | |||
| | | | | link. | | | | | | | link. | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
| platform_interface_name |11 |8.0 | | Describes | | |11 | platform_interface_name |8.0 | | Describes | | |||
| | | | | the local | | | | | | | the local | | |||
| | | | | interface | | | | | | | interface | | |||
| | | | | name. | | | | | | | name. | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
| trusted_outer_security_key |12 |8.0 | | Indicates | | |12 | trusted_outer_security_key |8.0 | | Indicates | | |||
| | | | | whether the | | | | | | | whether the | | |||
| | | | | link is | | | | | | | link is | | |||
| | | | | secured, | | | | | | | secured, | | |||
| | | | | i.e., | | | | | | | i.e., | | |||
| | | | | protected by | | | | | | | protected by | | |||
| | | | | outer key, | | | | | | | outer key, | | |||
| | | | | absence of | | | | | | | absence of | | |||
| | | | | this element | | | | | | | this element | | |||
| | | | | means no | | | | | | | means no | | |||
| | | | | indication, | | | | | | | indication, | | |||
| | | | | undefined | | | | | | | undefined | | |||
| | | | | outer key | | | | | | | outer key | | |||
| | | | | means not | | | | | | | means not | | |||
| | | | | secured. | | | | | | | secured. | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
| bfd_up |13 |8.0 | | Indicates | | |13 | bfd_up |8.0 | | Indicates | | |||
| | | | | whether the | | | | | | | whether the | | |||
| | | | | link is | | | | | | | link is | | |||
| | | | | protected by | | | | | | | protected by | | |||
| | | | | an | | | | | | | an | | |||
| | | | | established | | | | | | | established | | |||
| | | | | BFD session. | | | | | | | BFD session. | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
| address_families |14 |8.0 | | Optional | | |14 | address_families |8.0 | | Optional | | |||
| | | | | indication | | | | | | | indication | | |||
| | | | | that address | | | | | | | that address | | |||
| | | | | families are | | | | | | | families are | | |||
| | | | | up on the | | | | | | | up on the | | |||
| | | | | interface. | | | | | | | interface. | | |||
+----------------------------+-----+-------+--------+--------------+ | +-----+----------------------------+-------+--------+--------------+ | |||
Table 26 | Table 26 | |||
10.3.20. RIFTEncodingNeighbor Registry | 10.3.20. RIFTEncodingNeighbor Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+============+=======+=============+=============+=================+ | +=======+============+=============+=============+=================+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+============+=======+=============+=============+=================+ | +=======+============+=============+=============+=================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+------------+-------+-------------+-------------+-----------------+ | +-------+------------+-------------+-------------+-----------------+ | |||
| originator | 1 | 8.0 | | System ID of | | | 1 | originator | 8.0 | | System ID of | | |||
| | | | | the originator. | | | | | | | the originator. | | |||
+------------+-------+-------------+-------------+-----------------+ | +-------+------------+-------------+-------------+-----------------+ | |||
| remote_id | 2 | 8.0 | | ID of remote | | | 2 | remote_id | 8.0 | | ID of remote | | |||
| | | | | side of the | | | | | | | side of the | | |||
| | | | | link. | | | | | | | link. | | |||
+------------+-------+-------------+-------------+-----------------+ | +-------+------------+-------------+-------------+-----------------+ | |||
Table 27: Neighbor Structure | Table 27: Neighbor Structure | |||
10.3.21. RIFTEncodingNodeCapabilities Registry | 10.3.21. RIFTEncodingNodeCapabilities Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+========================+=====+=========+==========+==============+ | +=====+========================+=========+==========+==============+ | |||
| Name |Value| Min. | Max. | Comment | | |Value| Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+========================+=====+=========+==========+==============+ | +=====+========================+=========+==========+==============+ | |||
| Reserved |0 | 8.0 | All | | | |0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+------------------------+-----+---------+----------+--------------+ | +-----+------------------------+---------+----------+--------------+ | |||
| protocol_minor_version |1 | 8.0 | | Must | | |1 | protocol_minor_version | 8.0 | | Must | | |||
| | | | | advertise | | | | | | | advertise | | |||
| | | | | supported | | | | | | | supported | | |||
| | | | | minor | | | | | | | minor | | |||
| | | | | version | | | | | | | version | | |||
| | | | | dialect that | | | | | | | dialect that | | |||
| | | | | way. | | | | | | | way. | | |||
+------------------------+-----+---------+----------+--------------+ | +-----+------------------------+---------+----------+--------------+ | |||
| flood_reduction |2 | 8.0 | | Indicates | | |2 | flood_reduction | 8.0 | | Indicates | | |||
| | | | | that node | | | | | | | that node | | |||
| | | | | supports | | | | | | | supports | | |||
| | | | | flood | | | | | | | flood | | |||
| | | | | reduction. | | | | | | | reduction. | | |||
+------------------------+-----+---------+----------+--------------+ | +-----+------------------------+---------+----------+--------------+ | |||
| hierarchy_indications |3 | 8.0 | | Indicates | | |3 | hierarchy_indications | 8.0 | | Indicates | | |||
| | | | | place in | | | | | | | place in | | |||
| | | | | hierarchy, | | | | | | | hierarchy, | | |||
| | | | | i.e., top of | | | | | | | i.e., top of | | |||
| | | | | fabric or | | | | | | | fabric or | | |||
| | | | | leaf only | | | | | | | leaf only | | |||
| | | | | (in ZTP) or | | | | | | | (in ZTP) or | | |||
| | | | | support for | | | | | | | support for | | |||
| | | | | leaf-to-leaf | | | | | | | L2L | | |||
| | | | | procedures. | | | | | | | procedures. | | |||
+------------------------+-----+---------+----------+--------------+ | +-----+------------------------+---------+----------+--------------+ | |||
Table 28: Capabilities the Node Supports | Table 28: Capabilities the Node Supports | |||
10.3.22. RIFTEncodingNodeFlags Registry | 10.3.22. RIFTEncodingNodeFlags Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==========+=======+=========+==========+===========================+ | +=======+==========+=========+==========+===========================+ | |||
| Name | Value | Min. | Max. | Comment | | | Value | Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+==========+=======+=========+==========+===========================+ | +=======+==========+=========+==========+===========================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------+----------+---------------------------+ | +-------+----------+---------+----------+---------------------------+ | |||
| overload | 1 | 8.0 | | Indicates that node | | | 1 | overload | 8.0 | | Indicates that node | | |||
| | | | | is in overload; do | | | | | | | is in overload; do | | |||
| | | | | not transit traffic | | | | | | | not transit traffic | | |||
| | | | | through it. | | | | | | | through it. | | |||
+----------+-------+---------+----------+---------------------------+ | +-------+----------+---------+----------+---------------------------+ | |||
Table 29: Indication Flags of the Node | Table 29: Indication Flags of the Node | |||
10.3.23. RIFTEncodingNodeNeighborsTIEElement Registry | 10.3.23. RIFTEncodingNodeNeighborsTIEElement Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===========+=======+=========+==========+======================+ | +=======+===========+=========+==========+======================+ | |||
| Name | Value | Min. | Max. | Comment | | | Value | Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+===========+=======+=========+==========+======================+ | +=======+===========+=========+==========+======================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-----------+-------+---------+----------+----------------------+ | +-------+-----------+---------+----------+----------------------+ | |||
| level | 1 | 8.0 | | Level of neighbor. | | | 1 | level | 8.0 | | Level of neighbor. | | |||
+-----------+-------+---------+----------+----------------------+ | +-------+-----------+---------+----------+----------------------+ | |||
| cost | 3 | 8.0 | | Cost to neighbor. | | | 3 | cost | 8.0 | | Cost to neighbor. | | |||
| | | | | Ignore anything | | | | | | | Ignore anything | | |||
| | | | | equal or larger than | | | | | | | equal or larger than | | |||
| | | | | 'infinite_distance' | | | | | | | 'infinite_distance' | | |||
| | | | | and equal to | | | | | | | and equal to | | |||
| | | | | 'invalid_distance'. | | | | | | | 'invalid_distance'. | | |||
+-----------+-------+---------+----------+----------------------+ | +-------+-----------+---------+----------+----------------------+ | |||
| link_ids | 4 | 8.0 | | Carries description | | | 4 | link_ids | 8.0 | | Carries description | | |||
| | | | | of multiple parallel | | | | | | | of multiple parallel | | |||
| | | | | links in a tie. | | | | | | | links in a tie. | | |||
+-----------+-------+---------+----------+----------------------+ | +-------+-----------+---------+----------+----------------------+ | |||
| bandwidth | 5 | 8.0 | | Total bandwidth to | | | 5 | bandwidth | 8.0 | | Total bandwidth to | | |||
| | | | | neighbor as sum of | | | | | | | neighbor as sum of | | |||
| | | | | all parallel links. | | | | | | | all parallel links. | | |||
+-----------+-------+---------+----------+----------------------+ | +-------+-----------+---------+----------+----------------------+ | |||
Table 30: Neighbor of a Node | Table 30: Neighbor of a Node | |||
10.3.24. RIFTEncodingNodeTIEElement Registry | 10.3.24. RIFTEncodingNodeTIEElement Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+=================+=======+=========+==========+====================+ | +=======+=================+=========+==========+====================+ | |||
| Name | Value | Min. | Max. | Comment | | | Value | Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+=================+=======+=========+==========+====================+ | +=======+=================+=========+==========+====================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| level | 1 | 8.0 | | Level of the | | | 1 | level | 8.0 | | Level of the | | |||
| | | | | node. | | | | | | | node. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| neighbors | 2 | 8.0 | | Node's neighbors. | | | 2 | neighbors | 8.0 | | Node's neighbors. | | |||
| | | | | Multiple node | | | | | | | Multiple node | | |||
| | | | | ties can carry | | | | | | | ties can carry | | |||
| | | | | disjoint sets of | | | | | | | disjoint sets of | | |||
| | | | | neighbors. | | | | | | | neighbors. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| capabilities | 3 | 8.0 | | Capabilities of | | | 3 | capabilities | 8.0 | | Capabilities of | | |||
| | | | | the node. | | | | | | | the node. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| flags | 4 | 8.0 | | Flags of the | | | 4 | flags | 8.0 | | Flags of the | | |||
| | | | | node. | | | | | | | node. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| name | 5 | 8.0 | | Optional node | | | 5 | name | 8.0 | | Optional node | | |||
| | | | | name for easier | | | | | | | name for easier | | |||
| | | | | operations. | | | | | | | operations. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| pod | 6 | 8.0 | | Pod to which the | | | 6 | pod | 8.0 | | Pod to which the | | |||
| | | | | node belongs. | | | | | | | node belongs. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| startup_time | 7 | 8.0 | | Optional startup | | | 7 | startup_time | 8.0 | | Optional startup | | |||
| | | | | time of the node. | | | | | | | time of the node. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| miscabled_links | 10 | 8.0 | | If any local | | | 10 | miscabled_links | 8.0 | | If any local | | |||
| | | | | links are | | | | | | | links are | | |||
| | | | | miscabled, this | | | | | | | miscabled, this | | |||
| | | | | indication is | | | | | | | indication is | | |||
| | | | | flooded. | | | | | | | flooded. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| same_plane_tofs | 12 | 8.0 | | ToFs in the same | | | 12 | same_plane_tofs | 8.0 | | ToFs in the same | | |||
| | | | | plane. Only | | | | | | | plane. Only | | |||
| | | | | carried by ToF. | | | | | | | carried by ToF. | | |||
| | | | | Multiple node | | | | | | | Multiple node | | |||
| | | | | ties can carry | | | | | | | ties can carry | | |||
| | | | | disjoint sets of | | | | | | | disjoint sets of | | |||
| | | | | ToFs that must be | | | | | | | ToFs that must be | | |||
| | | | | joined to form a | | | | | | | joined to form a | | |||
| | | | | single set. | | | | | | | single set. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
| fabric_id | 20 | 8.0 | | It provides the | | | 20 | fabric_id | 8.0 | | It provides the | | |||
| | | | | optional ID of | | | | | | | optional ID of | | |||
| | | | | the fabric | | | | | | | the fabric | | |||
| | | | | configured. | | | | | | | configured. | | |||
+-----------------+-------+---------+----------+--------------------+ | +-------+-----------------+---------+----------+--------------------+ | |||
Table 31: Description of a Node | Table 31: Description of a Node | |||
10.3.25. RIFTEncodingPacketContent Registry | 10.3.25. RIFTEncodingPacketContent Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| lie | 1 | 8.0 | | | | | 1 | lie | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| tide | 2 | 8.0 | | | | | 2 | tide | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| tire | 3 | 8.0 | | | | | 3 | tire | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| tie | 4 | 8.0 | | | | | 4 | tie | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 32: Content of a RIFT Packet | Table 32: Content of a RIFT Packet | |||
10.3.26. RIFTEncodingPacketHeader Registry | 10.3.26. RIFTEncodingPacketHeader Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===============+=======+=========+==========+===================+ | +=======+===============+=========+==========+===================+ | |||
| Name | Value | Min. | Max. | Comment | | | Value | Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+===============+=======+=========+==========+===================+ | +=======+===============+=========+==========+===================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
| major_version | 1 | 8.0 | | Major version of | | | 1 | major_version | 8.0 | | Major version of | | |||
| | | | | protocol. | | | | | | | protocol. | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
| minor_version | 2 | 8.0 | | Minor version of | | | 2 | minor_version | 8.0 | | Minor version of | | |||
| | | | | protocol. | | | | | | | protocol. | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
| sender | 3 | 8.0 | | Node sending the | | | 3 | sender | 8.0 | | Node sending the | | |||
| | | | | packet, in case | | | | | | | packet, in case | | |||
| | | | | of LIE/TIRE/TIDE | | | | | | | of LIE/TIRE/TIDE | | |||
| | | | | also the | | | | | | | also the | | |||
| | | | | originator of it. | | | | | | | originator of it. | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
| level | 4 | 8.0 | | Level of the node | | | 4 | level | 8.0 | | Level of the node | | |||
| | | | | sending the | | | | | | | sending the | | |||
| | | | | packet, required | | | | | | | packet, required | | |||
| | | | | on everything | | | | | | | on everything | | |||
| | | | | except LIEs. | | | | | | | except LIEs. | | |||
| | | | | Lack of presence | | | | | | | Lack of presence | | |||
| | | | | on LIEs indicates | | | | | | | on LIEs indicates | | |||
| | | | | undefined_level | | | | | | | undefined_level | | |||
| | | | | and is used in | | | | | | | and is used in | | |||
| | | | | ZTP procedures. | | | | | | | ZTP procedures. | | |||
+---------------+-------+---------+----------+-------------------+ | +-------+---------------+---------+----------+-------------------+ | |||
Table 33: Common RIFT Packet Header | Table 33: Common RIFT Packet Header | |||
10.3.27. RIFTEncodingPrefixAttributes Registry | 10.3.27. RIFTEncodingPrefixAttributes Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+===================+=======+=========+==========+==================+ | +=======+===================+=========+==========+==================+ | |||
| Name | Value | Min. | Max. | Comment | | | Value | Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+===================+=======+=========+==========+==================+ | +=======+===================+=========+==========+==================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
| metric | 2 | 8.0 | | Distance of the | | | 2 | metric | 8.0 | | Distance of the | | |||
| | | | | prefix. | | | | | | | prefix. | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
| tags | 3 | 8.0 | | Generic | | | 3 | tags | 8.0 | | Generic | | |||
| | | | | unordered set | | | | | | | unordered set | | |||
| | | | | of route tags, | | | | | | | of route tags, | | |||
| | | | | can be | | | | | | | can be | | |||
| | | | | redistributed | | | | | | | redistributed | | |||
| | | | | to other | | | | | | | to other | | |||
| | | | | protocols or | | | | | | | protocols or | | |||
| | | | | used within the | | | | | | | used within the | | |||
| | | | | context of real | | | | | | | context of real | | |||
| | | | | time analytics. | | | | | | | time analytics. | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
| monotonic_clock | 4 | 8.0 | | Monotonic clock | | | 4 | monotonic_clock | 8.0 | | Monotonic clock | | |||
| | | | | for mobile | | | | | | | for mobile | | |||
| | | | | addresses. | | | | | | | addresses. | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
| loopback | 6 | 8.0 | | Indicates if | | | 6 | loopback | 8.0 | | Indicates if | | |||
| | | | | the prefix is a | | | | | | | the prefix is a | | |||
| | | | | node loopback. | | | | | | | node loopback. | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
| directly_attached | 7 | 8.0 | | Indicates that | | | 7 | directly_attached | 8.0 | | Indicates that | | |||
| | | | | the prefix is | | | | | | | the prefix is | | |||
| | | | | directly | | | | | | | directly | | |||
| | | | | attached. | | | | | | | attached. | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
| from_link | 10 | 8.0 | | Link to which | | | 10 | from_link | 8.0 | | Link to which | | |||
| | | | | the address | | | | | | | the address | | |||
| | | | | belongs to. | | | | | | | belongs to. | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
| label | 12 | 8.0 | | Optional, per- | | | 12 | label | 8.0 | | Optional, per- | | |||
| | | | | prefix | | | | | | | prefix | | |||
| | | | | significant | | | | | | | significant | | |||
| | | | | label. | | | | | | | label. | | |||
+-------------------+-------+---------+----------+------------------+ | +-------+-------------------+---------+----------+------------------+ | |||
Table 34: Attributes of a Prefix | Table 34: Attributes of a Prefix | |||
10.3.28. RIFTEncodingPrefixTIEElement Registry | 10.3.28. RIFTEncodingPrefixTIEElement Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==========+=======+=============+=============+================+ | +=======+==========+=============+=============+================+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+==========+=======+=============+=============+================+ | +=======+==========+=============+=============+================+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+-------------+-------------+----------------+ | +-------+----------+-------------+-------------+----------------+ | |||
| prefixes | 1 | 8.0 | | Prefixes with | | | 1 | prefixes | 8.0 | | Prefixes with | | |||
| | | | | the associated | | | | | | | the associated | | |||
| | | | | attributes. | | | | | | | attributes. | | |||
+----------+-------+-------------+-------------+----------------+ | +-------+----------+-------------+-------------+----------------+ | |||
Table 35: TIE Carrying Prefixes | Table 35: TIE Carrying Prefixes | |||
10.3.29. RIFTEncodingProtocolPacket Registry | 10.3.29. RIFTEncodingProtocolPacket Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| header | 1 | 8.0 | | | | | 1 | header | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| content | 2 | 8.0 | | | | | 2 | content | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 36: RIFT Packet Structure | Table 36: RIFT Packet Structure | |||
10.3.30. RIFTEncodingTIDEPacket Registry | 10.3.30. RIFTEncodingTIDEPacket Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+=============+=======+=============+=============+===============+ | +=======+=============+=============+=============+===============+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+=============+=======+=============+=============+===============+ | +=======+=============+=============+=============+===============+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+-------------+-------+-------------+-------------+---------------+ | +-------+-------------+-------------+-------------+---------------+ | |||
| start_range | 1 | 8.0 | | First TIE | | | 1 | start_range | 8.0 | | First TIE | | |||
| | | | | header in the | | | | | | | header in the | | |||
| | | | | TIDE packet. | | | | | | | TIDE packet. | | |||
+-------------+-------+-------------+-------------+---------------+ | +-------+-------------+-------------+-------------+---------------+ | |||
| end_range | 2 | 8.0 | | Last TIE | | | 2 | end_range | 8.0 | | Last TIE | | |||
| | | | | header in the | | | | | | | header in the | | |||
| | | | | TIDE packet. | | | | | | | TIDE packet. | | |||
+-------------+-------+-------------+-------------+---------------+ | +-------+-------------+-------------+-------------+---------------+ | |||
| headers | 3 | 8.0 | | _sorted_ list | | | 3 | headers | 8.0 | | _sorted_ list | | |||
| | | | | of headers. | | | | | | | of headers. | | |||
+-------------+-------+-------------+-------------+---------------+ | +-------+-------------+-------------+-------------+---------------+ | |||
Table 37: TIDE with Sorted TIE Headers | Table 37: TIDE with Sorted TIE Headers | |||
10.3.31. RIFTEncodingTIEElement Registry | 10.3.31. RIFTEncodingTIEElement Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+========================+=====+=======+========+===================+ | +=====+========================+=======+========+===================+ | |||
|Name |Value|Min. |Max. |Comment | | |Value|Name |Min. |Max. |Comment | | |||
| | |Schema |Schema | | | | | |Schema |Schema | | | |||
| | |Version|Version | | | | | |Version|Version | | | |||
+========================+=====+=======+========+===================+ | +=====+========================+=======+========+===================+ | |||
|Reserved |0 |8.0 |All | | | |0 |Reserved |8.0 |All | | | |||
| | | |Versions| | | | | | |Versions| | | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
|node |1 |8.0 | |Used in case of | | |1 |node |8.0 | |Used in case of | | |||
| | | | |enum | | | | | | |enum | | |||
| | | | |common.tietypetype.| | | | | | |common.tietypetype.| | |||
| | | | |nodetietype. | | | | | | |nodetietype. | | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
|prefixes |2 |8.0 | |Used in case of | | |2 |prefixes |8.0 | |Used in case of | | |||
| | | | |enum | | | | | | |enum | | |||
| | | | |common.tietypetype.| | | | | | |common.tietypetype.| | |||
| | | | |prefixtietype. | | | | | | |prefixtietype. | | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
|positive_disaggregation_|3 |8.0 | |Positive prefixes | | |3 |positive_disaggregation_|8.0 | |Positive prefixes | | |||
|prefixes | | | |(always | | | |prefixes | | |(always | | |||
| | | | |southbound). | | | | | | |southbound). | | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
|negative_disaggregation_|5 |8.0 | |Transitive, | | |5 |negative_disaggregation_|8.0 | |Transitive, | | |||
|prefixes | | | |negative prefixes | | | |prefixes | | |negative prefixes | | |||
| | | | |(always southbound)| | | | | | |(always southbound)| | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
|external_prefixes |6 |8.0 | |Externally | | |6 |external_prefixes |8.0 | |Externally | | |||
| | | | |reimported | | | | | | |reimported | | |||
| | | | |prefixes. | | | | | | |prefixes. | | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
|positive_external_ |7 |8.0 | |Positive external | | |7 |positive_external_ |8.0 | |Positive external | | |||
|disaggregation_prefixes | | | |disaggregated | | | |disaggregation_prefixes | | |disaggregated | | |||
| | | | |prefixes | | | | | | |prefixes | | |||
| | | | |(always | | | | | | |(always | | |||
| | | | |southbound). | | | | | | |southbound). | | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
|keyvalues |9 |8.0 | |Key-value | | |9 |keyvalues |8.0 | |Key-value | | |||
| | | | |store elements. | | | | | | |store elements. | | |||
+------------------------+-----+-------+--------+-------------------+ | +-----+------------------------+-------+--------+-------------------+ | |||
Table 38: Single Element in a TIE | Table 38: Single Element in a TIE | |||
10.3.32. RIFTEncodingTIEHeader Registry | 10.3.32. RIFTEncodingTIEHeader Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+======================+=======+=========+==========+==============+ | +=======+======================+=========+==========+==============+ | |||
| Name | Value | Min. | Max. | Comment | | | Value | Name | Min. | Max. | Comment | | |||
| | | Schema | Schema | | | | | | Schema | Schema | | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+======================+=======+=========+==========+==============+ | +=======+======================+=========+==========+==============+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------------------+-------+---------+----------+--------------+ | +-------+----------------------+---------+----------+--------------+ | |||
| tieid | 2 | 8.0 | | ID of TIE. | | | 2 | tieid | 8.0 | | ID of TIE. | | |||
+----------------------+-------+---------+----------+--------------+ | +-------+----------------------+---------+----------+--------------+ | |||
| seq_nr | 3 | 8.0 | | Sequence | | | 3 | seq_nr | 8.0 | | Sequence | | |||
| | | | | number of | | | | | | | number of | | |||
| | | | | TIE. | | | | | | | TIE. | | |||
+----------------------+-------+---------+----------+--------------+ | +-------+----------------------+---------+----------+--------------+ | |||
| origination_time | 10 | 8.0 | | Absolute | | | 10 | origination_time | 8.0 | | Absolute | | |||
| | | | | timestamp | | | | | | | timestamp | | |||
| | | | | when TIE was | | | | | | | when TIE was | | |||
| | | | | generated. | | | | | | | generated. | | |||
+----------------------+-------+---------+----------+--------------+ | +-------+----------------------+---------+----------+--------------+ | |||
| origination_lifetime | 12 | 8.0 | | Original | | | 12 | origination_lifetime | 8.0 | | Original | | |||
| | | | | lifetime | | | | | | | lifetime | | |||
| | | | | when TIE was | | | | | | | when TIE was | | |||
| | | | | generated. | | | | | | | generated. | | |||
+----------------------+-------+---------+----------+--------------+ | +-------+----------------------+---------+----------+--------------+ | |||
Table 39: Header of a TIE | Table 39: Header of a TIE | |||
10.3.33. RIFTEncodingTIEHeaderWithLifeTime Registry | 10.3.33. RIFTEncodingTIEHeaderWithLifeTime Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+====================+=======+=============+==========+===========+ | +=======+====================+=============+==========+===========+ | |||
| Name | Value | Min. Schema | Max. | Comment | | | Value | Name | Min. Schema | Max. | Comment | | |||
| | | Version | Schema | | | | | | Version | Schema | | | |||
| | | | Version | | | | | | | Version | | | |||
+====================+=======+=============+==========+===========+ | +=======+====================+=============+==========+===========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+--------------------+-------+-------------+----------+-----------+ | +-------+--------------------+-------------+----------+-----------+ | |||
| header | 1 | 8.0 | | | | | 1 | header | 8.0 | | | | |||
+--------------------+-------+-------------+----------+-----------+ | +-------+--------------------+-------------+----------+-----------+ | |||
| remaining_lifetime | 2 | 8.0 | | Remaining | | | 2 | remaining_lifetime | 8.0 | | Remaining | | |||
| | | | | lifetime. | | | | | | | lifetime. | | |||
+--------------------+-------+-------------+----------+-----------+ | +-------+--------------------+-------------+----------+-----------+ | |||
Table 40: Header of a TIE as Described in TIRE/TIDE | Table 40: Header of a TIE as Described in TIRE/TIDE | |||
10.3.34. RIFTEncodingTIEID Registry | 10.3.34. RIFTEncodingTIEID Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+============+=======+=============+=============+============+ | +=======+============+=============+=============+============+ | |||
| Name | Value | Min. Schema | Max. Schema | Comment | | | Value | Name | Min. Schema | Max. Schema | Comment | | |||
| | | Version | Version | | | | | | Version | Version | | | |||
+============+=======+=============+=============+============+ | +=======+============+=============+=============+============+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+------------+-------+-------------+-------------+------------+ | +-------+------------+-------------+-------------+------------+ | |||
| direction | 1 | 8.0 | | Direction | | | 1 | direction | 8.0 | | Direction | | |||
| | | | | of TIE. | | | | | | | of TIE. | | |||
+------------+-------+-------------+-------------+------------+ | +-------+------------+-------------+-------------+------------+ | |||
| originator | 2 | 8.0 | | Indicates | | | 2 | originator | 8.0 | | Indicates | | |||
| | | | | originator | | | | | | | originator | | |||
| | | | | of TIE. | | | | | | | of TIE. | | |||
+------------+-------+-------------+-------------+------------+ | +-------+------------+-------------+-------------+------------+ | |||
| tietype | 3 | 8.0 | | Type of | | | 3 | tietype | 8.0 | | Type of | | |||
| | | | | TIE. | | | | | | | TIE. | | |||
+------------+-------+-------------+-------------+------------+ | +-------+------------+-------------+-------------+------------+ | |||
| tie_nr | 4 | 8.0 | | Number of | | | 4 | tie_nr | 8.0 | | Number of | | |||
| | | | | TIE. | | | | | | | TIE. | | |||
+------------+-------+-------------+-------------+------------+ | +-------+------------+-------------+-------------+------------+ | |||
Table 41: Unique ID of a TIE | Table 41: Unique ID of a TIE | |||
10.3.35. RIFTEncodingTIEPacket Registry | 10.3.35. RIFTEncodingTIEPacket Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| header | 1 | 8.0 | | | | | 1 | header | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| element | 2 | 8.0 | | | | | 2 | element | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 42: TIE Packet | Table 42: TIE Packet | |||
10.3.36. RIFTEncodingTIREPacket Registry | 10.3.36. RIFTEncodingTIREPacket Registry | |||
This registry has the following initial values. | This registry has the following initial values. | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Name | Value | Min. Schema Version | Max. Schema | Comment | | | Value | Name | Min. Schema Version | Max. Schema | Comment | | |||
| | | | Version | | | | | | | Version | | | |||
+==========+=======+=====================+=============+=========+ | +=======+==========+=====================+=============+=========+ | |||
| Reserved | 0 | 8.0 | All | | | | 0 | Reserved | 8.0 | All | | | |||
| | | | Versions | | | | | | | Versions | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
| headers | 1 | 8.0 | | | | | 1 | headers | 8.0 | | | | |||
+----------+-------+---------------------+-------------+---------+ | +-------+----------+---------------------+-------------+---------+ | |||
Table 43: TIRE Packet | Table 43: TIRE Packet | |||
11. References | 11. References | |||
11.1. Normative References | 11.1. Normative References | |||
[EUI64] IEEE, "Guidelines for Use of Extended Unique Identifier | [EUI64] IEEE, "Guidelines for Use of Extended Unique Identifier | |||
(EUI), Organizationally Unique Identifier (OUI), and | (EUI), Organizationally Unique Identifier (OUI), and | |||
Company ID (CID)", <https://standards-support.ieee.org/hc/ | Company ID (CID)", <https://standards-support.ieee.org/hc/ | |||
skipping to change at line 8155 ¶ | skipping to change at line 8128 ¶ | |||
21 and ToF 22. | 21 and ToF 22. | |||
Spines hold only North TIEs of level 0 for their PoD, while leaves | Spines hold only North TIEs of level 0 for their PoD, while leaves | |||
only hold their own North TIEs while, at this point, both ToF 21 and | only hold their own North TIEs while, at this point, both ToF 21 and | |||
ToF 22 (as well as any northbound connected controllers) would have | ToF 22 (as well as any northbound connected controllers) would have | |||
the complete network topology. | the complete network topology. | |||
ToF 21 and ToF 22 would then originate and flood South TIEs | ToF 21 and ToF 22 would then originate and flood South TIEs | |||
containing any established adjacencies and a default IP route to all | containing any established adjacencies and a default IP route to all | |||
spines. Spine 111, Spine 112, Spine 121, and Spine 122 will reflect | spines. Spine 111, Spine 112, Spine 121, and Spine 122 will reflect | |||
all Node South TIEs received from ToF 21 to ToF 22 and all Node South | all South Node TIEs received from ToF 21 to ToF 22 and all South Node | |||
TIEs from ToF 22 to ToF 21. South TIEs will not be re-propagated | TIEs from ToF 22 to ToF 21. South TIEs will not be re-propagated | |||
southbound. | southbound. | |||
South TIEs containing a default IP route are then originated by both | South TIEs containing a default IP route are then originated by both | |||
Spine 111 and Spine 112 towards Leaf 111 and Leaf 112. Similarly, | Spine 111 and Spine 112 towards Leaf 111 and Leaf 112. Similarly, | |||
South TIEs containing a default IP route are originated by Spine 121 | South TIEs containing a default IP route are originated by Spine 121 | |||
and Spine 122 towards Leaf 121 and Leaf 122. | and Spine 122 towards Leaf 121 and Leaf 122. | |||
At this point, IP connectivity across the maximum number of viable | At this point, IP connectivity across the maximum number of viable | |||
paths has been established for all leaves, with routing information | paths has been established for all leaves, with routing information | |||
skipping to change at line 8194 ¶ | skipping to change at line 8167 ¶ | |||
+-------+ +-------+ | +-------+ +-------+ | |||
+ + | + + | |||
Prefix111 Prefix112 | Prefix111 Prefix112 | |||
Figure 36: Single Leaf Link Failure | Figure 36: Single Leaf Link Failure | |||
In the event of a link failure between Spine 112 and Leaf 112, both | In the event of a link failure between Spine 112 and Leaf 112, both | |||
nodes will originate new Node TIEs that contain their connected | nodes will originate new Node TIEs that contain their connected | |||
adjacencies, except for the one that just failed. Leaf 112 will send | adjacencies, except for the one that just failed. Leaf 112 will send | |||
a North Node TIE to Spine 111. Spine 112 will send a North Node TIE | a North Node TIE to Spine 111. Spine 112 will send a North Node TIE | |||
to ToF 21 and ToF 22 as well as a new Node South TIE to Leaf 111 that | to ToF 21 and ToF 22 as well as a new South Node TIE to Leaf 111 that | |||
will be reflected to Spine 111. Necessary SPF recomputation will | will be reflected to Spine 111. Necessary SPF recomputation will | |||
occur, resulting in Spine 112 no longer being in the forwarding path | occur, resulting in Spine 112 no longer being in the forwarding path | |||
for Prefix 112. | for Prefix 112. | |||
Spine 111 will also disaggregate Prefix 112 by sending new Prefix | Spine 111 will also disaggregate Prefix 112 by sending new South | |||
South TIE to Leaf 111 and Leaf 112. Though disaggregation is covered | Prefix TIE to Leaf 111 and Leaf 112. Though disaggregation is | |||
in more detail in the following section, it is worth mentioning in | covered in more detail in the following section, it is worth | |||
this example as it further illustrates RIFT's mechanism to mitigate | mentioning in this example as it further illustrates RIFT's mechanism | |||
traffic loss. Consider that Leaf 111 has yet to receive the more | to mitigate traffic loss. Consider that Leaf 111 has yet to receive | |||
specific (disaggregated) route from Spine 111. In such a scenario, | the more specific (disaggregated) route from Spine 111. In such a | |||
traffic from Leaf 111 towards Prefix 112 may still use Spine 112's | scenario, traffic from Leaf 111 towards Prefix 112 may still use | |||
default route, causing it to traverse ToF 21 and ToF 22 back down via | Spine 112's default route, causing it to traverse ToF 21 and ToF 22 | |||
Spine 111. While this behavior is suboptimal, it is transient in | back down via Spine 111. While this behavior is suboptimal, it is | |||
nature and preferred to dropping traffic. | transient in nature and preferred to dropping traffic. | |||
B.3. Partitioned Fabric | B.3. Partitioned Fabric | |||
+--------+ +--------+ | +--------+ +--------+ | |||
Level 2 |ToF 21| |ToF 22| | Level 2 |ToF 21| |ToF 22| | |||
++-+--+-++ ++-+--+-++ | ++-+--+-++ ++-+--+-++ | |||
| | | | | | | | | | | | | | | | | | |||
| | | | | | | 0/0 | | | | | | | | 0/0 | |||
| | | | | | | | | | | | | | | | | | |||
| | | | | | | | | | | | | | | | | | |||
skipping to change at line 8273 ¶ | skipping to change at line 8246 ¶ | |||
do not benefit from this information. Spine 111 and Spine 112 are | do not benefit from this information. Spine 111 and Spine 112 are | |||
only required to reflect the new South Node TIEs received from ToF 22 | only required to reflect the new South Node TIEs received from ToF 22 | |||
to ToF 21. In short, only the relevant nodes received the relevant | to ToF 21. In short, only the relevant nodes received the relevant | |||
updates, thereby restricting the failure to only the partitioned | updates, thereby restricting the failure to only the partitioned | |||
level rather than burdening the whole fabric with the flooding and | level rather than burdening the whole fabric with the flooding and | |||
recomputation of the new topology information. | recomputation of the new topology information. | |||
To finish this example, the following list shows sets computed by ToF | To finish this example, the following list shows sets computed by ToF | |||
22 using notation introduced in Section 6.5: | 22 using notation introduced in Section 6.5: | |||
* R = Prefix 111, Prefix 112, Prefix 121, Prefix 122 | |R = Prefix 111, Prefix 112, Prefix 121, Prefix 122 | |||
* H (for r=Prefix 111) = Spine 111, Spine 112 | |H (for r=Prefix 111) = Spine 111, Spine 112 | |||
* H (for r=Prefix 112) = Spine 111, Spine 112 | |H (for r=Prefix 112) = Spine 111, Spine 112 | |||
* H (for r=Prefix 121) = Spine 121, Spine 122 | |H (for r=Prefix 121) = Spine 121, Spine 122 | |||
* H (for r=Prefix 122) = Spine 121, Spine 122 | |H (for r=Prefix 122) = Spine 121, Spine 122 | |||
* A (for ToF 21) = Spine 111, Spine 112 | |A (for ToF 21) = Spine 111, Spine 112 | |||
With that and |H (for r=Prefix 121) and |H (for r=Prefix 122) being | With that and |H (for r=Prefix 121) and |H (for r=Prefix 122) being | |||
disjoint from |A (for ToF 21), ToF 22 will originate a South TIE with | disjoint from |A (for ToF 21), ToF 22 will originate a South TIE with | |||
Prefix 121 and Prefix 122, which will be flooded to all spines. | Prefix 121 and Prefix 122, which will be flooded to all spines. | |||
B.4. Northbound Partitioned Router and Optional East-West Links | B.4. Northbound Partitioned Router and Optional East-West Links | |||
+ + + | + + + | |||
X N1 | N2 | N3 | X N1 | N2 | N3 | |||
X | | | X | | | |||
skipping to change at line 8396 ¶ | skipping to change at line 8369 ¶ | |||
Contributors | Contributors | |||
This work is a product of a list of individuals who are all to be | This work is a product of a list of individuals who are all to be | |||
considered major contributors, independent of the fact whether or not | considered major contributors, independent of the fact whether or not | |||
their name made it to the limited author list. | their name made it to the limited author list. | |||
Tony Przygienda, Ed. | Tony Przygienda, Ed. | |||
Juniper | Juniper | |||
Jordan Head, Ed. | ||||
Juniper | ||||
Alankar Sharma | ||||
Hudson River Trading | ||||
Pascal Thubert | Pascal Thubert | |||
Cisco | Cisco | |||
Bruno Rijsman | Bruno Rijsman | |||
Individual | Individual | |||
Jordan Head, Ed. | ||||
Juniper | ||||
Dmitry Afanasiev | Dmitry Afanasiev | |||
Individual | Individual | |||
Don Fedyk | Don Fedyk | |||
LabN | LabN | |||
Alia Atlas | Alia Atlas | |||
Individual | Individual | |||
John Drake | John Drake | |||
End of changes. 252 change blocks. | ||||
1294 lines changed or deleted | 1270 lines changed or added | |||
This html diff was produced by rfcdiff 1.48. |