Internet-Draft | OAM Requirements for Enhanced DetNet | July 2024 |
Yan, et al. | Expires 6 January 2025 | [Page] |
This document describes the specific requirements of the Operations, Administration, and Maintenance (OAM) for Enhanced DetNet, and analyzes the gaps with the existing OAM methods. It describes related OAM solutions considerations as well.¶
This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.¶
Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.¶
Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."¶
This Internet-Draft will expire on 6 January 2025.¶
Copyright (c) 2024 IETF Trust and the persons identified as the document authors. All rights reserved.¶
This document is subject to BCP 78 and the IETF Trust's Legal Provisions Relating to IETF Documents (https://trustee.ietf.org/license-info) in effect on the date of publication of this document. Please review these documents carefully, as they describe your rights and restrictions with respect to this document. Code Components extracted from this document must include Revised BSD License text as described in Section 4.e of the Trust Legal Provisions and are provided without warranty as described in the Revised BSD License.¶
The framework of OAM for DetNet has been specified in [RFC9551]. According to [RFC8655], DetNet functionality is divided into forwarding sub-layer and service sub-layer. [RFC9551] lists general functional requirements for DetNet OAM as well as functional requirements in each of the DetNet sub-layers of a DetNet domain. The IP and MPLS DetNet data plane have been defined respectively in [I-D.ietf-detnet-ip-oam] and [RFC9546].¶
[I-D.ietf-detnet-scaling-requirements] has described the enhanced requirements for DetNet enhanced data plane including the deterministic latency guarantees. [I-D.ietf-detnet-dataplane-taxonomy] has discussed the classification criteria of many variations and extensions of queuing mechanisms such as ECQF[IEEE 802.1Qdv], Multi-CQF[I-D.dang-queuing-with-multiple-cyclic-buffers], TCQF[I-D.eckert-detnet-tcqf], CSQF[I-D.chen-detnet-sr-based-bounded-latency], TQF[I-D.peng-detnet-packet-timeslot-mechanism], C-SCORE[I-D.joung-detnet-stateless-fair-queuing], EDF[I-D.peng-detnet-deadline-based-forwarding], gLBF[I-D.eckert-detnet-glbf] and so on. These queuing mechanisms demand high precision to achieve deterministic latency. For example, with the increasing of link speed from 100Mbps to 1Gbps, 10Gbps, 100Gbps, or even higher in larger networks, either more bytes can be transmitted within the same cycle interval or the smaller cycle interval is required to transmit the same amount of bytes in a cycle as that in low-speed networks.¶
For DetNet OAM, it is required to provide DetNet services with high precision in a large-scale network, and OAM performance requirements must be strictly guaranteed. It needs to monitor and validate that promised service levels, such as latency and packet loss rate, are being delivered. Existing OAM methods, which include proactive and reactive techniques, running both active and passive modes, are no longer sufficient to meet the monitoring and measurement requirements.¶
Based on the consideration above, this document describes the specific requirements of the OAM for Enhanced DetNet, and analyzes the gaps with the existing OAM methods. It describes related OAM solutions considerations as well.¶
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].¶
This section presents the enhanced requirements for Enhanced DetNet OAM and analyzes the technical gaps when applying OAM technologies as per [RFC9551] in large-scale networks.¶
As per [I-D.ietf-detnet-scaling-requirements], a deterministic network can use higher-speed links, especially for its backbone. With the increasing data rate, the network scheduling cycle can be reduced if the same amount of data is required to be sent each cycle for each application. Or, more data can be sent if the network cycle time remains the same. For the former, it requires more precise time control (e.g. cycle in the order of a few microseconds or sub-microseconds) for the input stream gate and the timed output buffer. As per [RFC9551], service protection (provided by the DetNet Service sub-layer) is designed to mitigate simple network failures more rapidly than the expected response time of the DetNet Controller Plane. In other words, the time accuracy of the DetNet OAM mechanism must meet or exceed the accuracy requirements of service-level agreements (SLAs). For instance, if service delay measurements require microsecond-level precision, the OAM mechanism should support sub-microsecond precision to assess whether service delay and jitter performance meet SLA demand.¶
The accuracy of time measurements may depend on the time synchronization protocol utilized in the network and the specific location where timestamps are captured during the forwarding process. Enhanced DetNet aims at large-scale networks, according to [I-D.ietf-detnet-scaling-requirements], requiring the tolerance of time asynchrony, hence the efficient one-way delay measurement becomes critical. Additionally, based on time asynchrony assumption, synchronization technologies leveraging network effects, especially in metropolitan area networks with spine-leaf topologies, provide solutions with synchronization accuracy down to the microsecond granularity.¶
According to [I-D.ietf-detnet-scaling-requirements], large-scale networks are growing in complexity to address the constant need for more bandwidth, lower latency and jitter, customized traffic prioritization, and SLA-grade network resilience. A more complex network infrastructure requires a deeper visibility of the Enhanced DetNet. The OAM of Enhanced DetNet requires examining network behavior at the granularity of packets. Leveraging legacy network monitoring technologies is insufficient since they do not offer real-time and granular visibility.¶
DetNet requires high reliability to fulfill the demands of service flows. However, in large-scale networks with a significant number of service flows, detecting SLA violations on a per-packet basis can bring challenges. The feasibility of implementing per-packet monitoring with the OAM mechanism depends on device capabilities, including counter resources, CPU resources, and port speeds. For instance, if the processing time for forwarding packets at line speed is 10 nanoseconds per packet in in-situ OAM mode, the network processor chip must parse the OAM data fields for each packet within this time frame. Any delay beyond 10 nanoseconds per packet would impact the device's line-speed forwarding capability. In large-scale networks where the number of services exceeds thousands, mainstream equipment currently operates at capacities typically in the range of a few thousand, making it challenging to implement per-packet OAM effectively.¶
In DetNet, due to the limited network scale, the impact of detection and response time on OAM performance is not obvious. However, long-distance transmission will increase transmission latency in large-scale networks. Meanwhile, the excessive number of devices will lead to more network link failures. In order to achieve real-time perception and monitoring of the network operating state, high-speed monitoring and response is necessary. To ensure that the service will not be interrupted for a long time or violate the service level agreement, it is necessary to have a high detection and response speed.¶
The traditional information collection cycle is at the minute level, which is unable to effectively locate millisecond-level latency variation and low-probability packet loss problems. Network fault location can only be passively located, the positioning cycle is quite long.¶
Traditional detection methods relying on switching and notification are unsuitable for Enhanced DetNet, as follows:¶
OAM methods can be classified into three types according to [RFC7799]:¶
Hybrid OAM mode. At present, the two most popular in-band OAM technologies are the Alternate-Marking (coloring) Method [RFC8321] and In-situ Network Telemetry (INT) [RFC9197], which are both hybrid OAM methods and are regarded as solutions that can naturally achieve fate-sharing among test and service packets on the forwarding plane. Based on forwarding plane detection technologies utilizing coloring and INT, there are two main OAM solutions:¶
Based on the above analysis, there are two main solutions for Enhanced DetNet OAM to meet the high-reliability requirements.¶
Traditional protection switching networks employ the following three types of protection methods, each with different response speeds:¶
This solution is aimed at Enhanced DetNet where the protection provided by OAM is best-effort, solely deploying protection switching mechanisms. In such networks, traditional forwarding sub-layer OAM protection mechanisms can be utilized:¶
Observability technologies introduce additional latency and are unsuitable for real-time OAM detection of online services.¶
This approach aims to guarantee forwarding sub-layer performance through service sub-layer functionality, utilizing packet-level "multi-send and selective-receive" technology to mitigate the switching delay. According to [IEEE802.1CB]and [RFC8655], ensuring high reliability for time-sensitive services is challenging without deploying PREF. When the network supports PREF configuration, it naturally meets reliability requirements including packet loss rate and latency violation constraints, eliminating the need to enhance detection speed. Another factor ensuring high reliability is the ability to support protection at each node and link level, rather than solely at the path level. The PREF mechanism requires each packet encapsulated with a unique flow-id encapsulating and the sequence number to facilitate selective receiving.¶
While the network supports PREF to guarantee a packet loss rate as low as 0.0001%, it remains essential to proactively monitor the performance of each individual path to promptly detect path defects. This proactive approach helps prevent SLA violations such as link degradation to ensure stable transmission of service traffic during PREF protection.¶
TBA¶
TBA¶
TBA¶