Internet-Draft OAM Requirements for Enhanced DetNet July 2024
Yan, et al. Expires 6 January 2025 [Page]
Workgroup:
DetNet
Internet-Draft:
draft-yan-detnet-oam-requirements-enhancement-00
Published:
Intended Status:
Informational
Expires:
Authors:
J. Yan
ZTE Corporation
Z. Han
China Unicom
X. ZHU
ZTE Corporation

OAM Requirements for Enhanced DetNet OAM

Abstract

This document describes the specific requirements of the Operations, Administration, and Maintenance (OAM) for Enhanced DetNet, and analyzes the gaps with the existing OAM methods. It describes related OAM solutions considerations as well.

Status of This Memo

This Internet-Draft is submitted in full conformance with the provisions of BCP 78 and BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF). Note that other groups may also distribute working documents as Internet-Drafts. The list of current Internet-Drafts is at https://datatracker.ietf.org/drafts/current/.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress."

This Internet-Draft will expire on 6 January 2025.

Table of Contents

1. Introduction

The framework of OAM for DetNet has been specified in [RFC9551]. According to [RFC8655], DetNet functionality is divided into forwarding sub-layer and service sub-layer. [RFC9551] lists general functional requirements for DetNet OAM as well as functional requirements in each of the DetNet sub-layers of a DetNet domain. The IP and MPLS DetNet data plane have been defined respectively in [I-D.ietf-detnet-ip-oam] and [RFC9546].

[I-D.ietf-detnet-scaling-requirements] has described the enhanced requirements for DetNet enhanced data plane including the deterministic latency guarantees. [I-D.ietf-detnet-dataplane-taxonomy] has discussed the classification criteria of many variations and extensions of queuing mechanisms such as ECQF[IEEE 802.1Qdv], Multi-CQF[I-D.dang-queuing-with-multiple-cyclic-buffers], TCQF[I-D.eckert-detnet-tcqf], CSQF[I-D.chen-detnet-sr-based-bounded-latency], TQF[I-D.peng-detnet-packet-timeslot-mechanism], C-SCORE[I-D.joung-detnet-stateless-fair-queuing], EDF[I-D.peng-detnet-deadline-based-forwarding], gLBF[I-D.eckert-detnet-glbf] and so on. These queuing mechanisms demand high precision to achieve deterministic latency. For example, with the increasing of link speed from 100Mbps to 1Gbps, 10Gbps, 100Gbps, or even higher in larger networks, either more bytes can be transmitted within the same cycle interval or the smaller cycle interval is required to transmit the same amount of bytes in a cycle as that in low-speed networks.

For DetNet OAM, it is required to provide DetNet services with high precision in a large-scale network, and OAM performance requirements must be strictly guaranteed. It needs to monitor and validate that promised service levels, such as latency and packet loss rate, are being delivered. Existing OAM methods, which include proactive and reactive techniques, running both active and passive modes, are no longer sufficient to meet the monitoring and measurement requirements.

Based on the consideration above, this document describes the specific requirements of the OAM for Enhanced DetNet, and analyzes the gaps with the existing OAM methods. It describes related OAM solutions considerations as well.

1.1. Requirements Language

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT", "SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this document are to be interpreted as described in [RFC2119].

2. Terminology

The terminology is defined as [RFC8655].

3. Requirements and Gap Analysis

This section presents the enhanced requirements for Enhanced DetNet OAM and analyzes the technical gaps when applying OAM technologies as per [RFC9551] in large-scale networks.

3.1. Support Microsecond-level Measurement Precision

As per [I-D.ietf-detnet-scaling-requirements], a deterministic network can use higher-speed links, especially for its backbone. With the increasing data rate, the network scheduling cycle can be reduced if the same amount of data is required to be sent each cycle for each application. Or, more data can be sent if the network cycle time remains the same. For the former, it requires more precise time control (e.g. cycle in the order of a few microseconds or sub-microseconds) for the input stream gate and the timed output buffer. As per [RFC9551], service protection (provided by the DetNet Service sub-layer) is designed to mitigate simple network failures more rapidly than the expected response time of the DetNet Controller Plane. In other words, the time accuracy of the DetNet OAM mechanism must meet or exceed the accuracy requirements of service-level agreements (SLAs). For instance, if service delay measurements require microsecond-level precision, the OAM mechanism should support sub-microsecond precision to assess whether service delay and jitter performance meet SLA demand.

The accuracy of time measurements may depend on the time synchronization protocol utilized in the network and the specific location where timestamps are captured during the forwarding process. Enhanced DetNet aims at large-scale networks, according to [I-D.ietf-detnet-scaling-requirements], requiring the tolerance of time asynchrony, hence the efficient one-way delay measurement becomes critical.  Additionally, based on time asynchrony assumption, synchronization technologies leveraging network effects, especially in metropolitan area networks with spine-leaf topologies, provide solutions with synchronization accuracy down to the microsecond granularity.

3.2. Support Per-packet Performance Monitoring

According to [I-D.ietf-detnet-scaling-requirements], large-scale networks are growing in complexity to address the constant need for more bandwidth, lower latency and jitter, customized traffic prioritization, and SLA-grade network resilience. A more complex network infrastructure requires a deeper visibility of the Enhanced DetNet. The OAM of Enhanced DetNet requires examining network behavior at the granularity of packets. Leveraging legacy network monitoring technologies is insufficient since they do not offer real-time and granular visibility.

DetNet requires high reliability to fulfill the demands of service flows. However, in large-scale networks with a significant number of service flows, detecting SLA violations on a per-packet basis can bring challenges. The feasibility of implementing per-packet monitoring with the OAM mechanism depends on device capabilities, including counter resources, CPU resources, and port speeds. For instance, if the processing time for forwarding packets at line speed is 10 nanoseconds per packet in in-situ OAM mode, the network processor chip must parse the OAM data fields for each packet within this time frame. Any delay beyond 10 nanoseconds per packet would impact the device's line-speed forwarding capability. In large-scale networks where the number of services exceeds thousands, mainstream equipment currently operates at capacities typically in the range of a few thousand, making it challenging to implement per-packet OAM effectively.

3.3. Support High-speed Detection and Response

In DetNet, due to the limited network scale, the impact of detection and response time on OAM performance is not obvious. However, long-distance transmission will increase transmission latency in large-scale networks. Meanwhile, the excessive number of devices will lead to more network link failures. In order to achieve real-time perception and monitoring of the network operating state, high-speed monitoring and response is necessary. To ensure that the service will not be interrupted for a long time or violate the service level agreement, it is necessary to have a high detection and response speed.

The traditional information collection cycle is at the minute level, which is unable to effectively locate millisecond-level latency variation and low-probability packet loss problems. Network fault location can only be passively located, the positioning cycle is quite long.

Traditional detection methods relying on switching and notification are unsuitable for Enhanced DetNet, as follows:

  1. Active OAM protocols, such as Bidirectional Forwarding Detection (BFD)[RFC5880], typically operate at millisecond-level at most.
  2. Currently, in-band detection methods include the Alternate-Marking Method [RFC8321] which operates with a detection duration of up to ten-second level, and IOAM Direct Export (DEX) [RFC9326], achieving near real-time monitoring but requires packet encapsulated with a sequence number field. Out-of-band reporting methods such as extensions of [RFC6374] generally offer millisecond-level delay.
  3. Telemetry can provide monitoring at minute or hour levels at most.  A drawback of telemetry is the uniform upload and processing of packets. Moreover, fixed round-trip time overhead across node-to-controller paths, node CPU performance, uplink bandwidth, and controller processing capability are all bottlenecks of Telemetry.

4. Consideration for Enhanced DetNet OAM Solutions

OAM methods can be classified into three types according to [RFC7799]:

  1. Active OAM mode (e.g., TWAMP[RFC5357]). [RFC9546] defines DetNet OAM with the MPLS Data Plane. And [I-D.ietf-detnet-ip-oam] discusses DetNet OAM with the IP Data Plane. These two drafts focusing on the active OAM mode indicate that the fate sharing among test and service packets can be theoretically achieved, while it is difficult to implement in practice. Although path sharing is comparatively simple, it is a challenging task for network operators to implement fate sharing between test packets and data traffic without introducing any congestion.
  2. Passive OAM mode. In a strict sense, passive methods cannot modify packet encapsulation, making it difficult to calculate packet loss when packets do not carry sequence numbers.
  3. Hybrid OAM mode. At present, the two most popular in-band OAM technologies are the Alternate-Marking (coloring) Method [RFC8321] and In-situ Network Telemetry (INT) [RFC9197], which are both hybrid OAM methods and are regarded as solutions that can naturally achieve fate-sharing among test and service packets on the forwarding plane. Based on forwarding plane detection technologies utilizing coloring and INT, there are two main OAM solutions:

    • The head-end or tail-end node performs local OAM data computations, which include packet loss rates, delay, etc. Optionally, out-of-band OAM technologies (such as [RFC6374], STWAP [RFC8762], and their extensions) can be used for information exchange between head-end and tail-end nodes.
    • Telemetry methods (like iFIT [I-D.song-opsawg-ifit-framework]). The head-end and tail-end nodes transmit their local data to a controller or cloud platform via southbound interfaces. Here, third-party nodes perform OAM-related computations and notify the nodes to generate responses also via southbound interfaces.

Based on the above analysis, there are two main solutions for Enhanced DetNet OAM to meet the high-reliability requirements.

4.1. Based on Traditional OAM Protection Mechanism

Traditional protection switching networks employ the following three types of protection methods, each with different response speeds:

  1. End-to-end 1+1 protection: There is a working link serving as the primary path and a protection link as the backup path, both established in advance. At the head-end node, service traffic is duplicated and transmitted over both the working and protection links. If the working link fails to transmit service flows, the tail-end node switches to receiving traffic from the protection link, known as single-end switching. This method switches relatively quickly with low bandwidth utilization due to both links being allocated for one service.
  2. End-to-end 1:1 protection: Service traffic is transmitted only on the working link, leaving the protection link idle or used for low-priority services. If the transmission fails on the working link, the tail-end node notifies the head-end node to switch protected service flows from the working link to the protection link. This switching requires actions at both ends, resulting in slower switching but achieving higher bandwidth utilization.
  3. Fast Reroute (FRR): Enhanced DetNet requires explicit paths, making rerouting based on distributed protocols unsuitable.

This solution is aimed at Enhanced DetNet where the protection provided by OAM is best-effort, solely deploying protection switching mechanisms. In such networks, traditional forwarding sub-layer OAM protection mechanisms can be utilized:

  1. The INT mechanism based on the sequence number and data collection in Direct Export mode offers near real-time detection with limited packet disordering while the response time depends on the protection method. This approach achieves optimal detection speed but requires encapsulating/decapsulating in-situ OAM message headers as per requirements and supporting Direct Export Type.
  2. The in-situ flow detection mechanism with traffic coloring offers detection granularity in the range from 1 second to 300 seconds (approximately averaging 10 seconds), aligning performance monitoring duration within at least the second range. The Continuity Check and Connectivity Verification (CC-CV) performance is typically at the millisecond level.
  3. Various active OAM mechanisms based on fate sharing compromise the requirements in chapters 3.1 and 3.2. The CC-CV method relies on protocols like BFD.

Observability technologies introduce additional latency and are unsuitable for real-time OAM detection of online services.

4.2. Supporting PREF Protection Function in DetNet

This approach aims to guarantee forwarding sub-layer performance through service sub-layer functionality, utilizing packet-level "multi-send and selective-receive" technology to mitigate the switching delay. According to [IEEE802.1CB]and [RFC8655], ensuring high reliability for time-sensitive services is challenging without deploying PREF. When the network supports PREF configuration, it naturally meets reliability requirements including packet loss rate and latency violation constraints, eliminating the need to enhance detection speed. Another factor ensuring high reliability is the ability to support protection at each node and link level, rather than solely at the path level. The PREF mechanism requires each packet encapsulated with a unique flow-id encapsulating and the sequence number to facilitate selective receiving.

While the network supports PREF to guarantee a packet loss rate as low as 0.0001%, it remains essential to proactively monitor the performance of each individual path to promptly detect path defects. This proactive approach helps prevent SLA violations such as link degradation to ensure stable transmission of service traffic during PREF protection.

5. Security Considerations

TBA

6. IANA Considerations

TBA

7. Acknowledgements

TBA

8. References

8.1. Normative References

[I-D.chen-detnet-sr-based-bounded-latency]
Chen, M., Geng, X., Li, Z., Joung, J., and J. Ryoo, "Segment Routing (SR) Based Bounded Latency", , <https://datatracker.ietf.org/doc/html/draft-chen-detnet-sr-based-bounded-latency-03>.
[I-D.dang-queuing-with-multiple-cyclic-buffers]
Liu, B. and J. Dang, "A Queuing Mechanism with Multiple Cyclic Buffers", , <https://datatracker.ietf.org/doc/html/draft-dang-queuing-with-multiple-cyclic-buffers-00>.
[I-D.eckert-detnet-glbf]
Eckert, T. T., Clemm, A., Bryant, S., and S. Hommes, "Deterministic Networking (DetNet) Data Plane - guaranteed Latency Based Forwarding (gLBF) for bounded latency with low jitter and asynchronous forwarding in Deterministic Networks", , <https://datatracker.ietf.org/doc/html/draft-eckert-detnet-glbf-02>.
[I-D.eckert-detnet-tcqf]
Eckert, T. T., Li, Y., Bryant, S., Malis, A. G., Ryoo, J., Liu, P., Li, G., Ren, S., and F. Yang, "Deterministic Networking (DetNet) Data Plane - Tagged Cyclic Queuing and Forwarding (TCQF) for bounded latency with low jitter in large scale DetNets", , <https://datatracker.ietf.org/doc/html/draft-eckert-detnet-tcqf-05>.
[I-D.ietf-detnet-dataplane-taxonomy]
Joung, J., Geng, X., Peng, S., and T. T. Eckert, "Dataplane Enhancement Taxonomy", , <https://datatracker.ietf.org/doc/html/draft-ietf-detnet-dataplane-taxonomy-00>.
[I-D.ietf-detnet-ip-oam]
Mirsky, G., Chen, M., and D. L. Black, "Operations, Administration, and Maintenance (OAM) for Deterministic Networks (DetNet) with IP Data Plane", , <https://datatracker.ietf.org/doc/html/draft-ietf-detnet-ip-oam-13>.
[I-D.ietf-detnet-scaling-requirements]
Liu, P., Li, Y., Eckert, T. T., Xiong, Q., Ryoo, J., zhushiyin, and X. Geng, "Requirements for Scaling Deterministic Networks", , <https://datatracker.ietf.org/doc/html/draft-ietf-detnet-scaling-requirements-06>.
[I-D.joung-detnet-stateless-fair-queuing]
Joung, J., Ryoo, J., Cheung, T., Li, Y., and P. Liu, "Latency Guarantee with Stateless Fair Queuing", , <https://datatracker.ietf.org/doc/html/draft-joung-detnet-stateless-fair-queuing-02>.
[I-D.peng-detnet-deadline-based-forwarding]
Peng, S., Du, Z., Basu, K., cheng, Yang, D., and C. Liu, "Deadline Based Deterministic Forwarding", , <https://datatracker.ietf.org/doc/html/draft-peng-detnet-deadline-based-forwarding-10>.
[I-D.peng-detnet-packet-timeslot-mechanism]
Peng, S., Liu, P., Basu, K., Liu, A., Yang, D., and G. Peng, "Timeslot Queueing and Forwarding Mechanism", , <https://datatracker.ietf.org/doc/html/draft-peng-detnet-packet-timeslot-mechanism-07>.
[I-D.song-opsawg-ifit-framework]
Song, H., Qin, F., Chen, H., Jin, J., and J. Shin, "Framework for In-situ Flow Information Telemetry", , <https://datatracker.ietf.org/doc/html/draft-song-opsawg-ifit-framework-21>.
[I-D.xiong-detnet-enhanced-detnet-gap-analysis]
Xiong, Q. and A. Liu, "Gap Analysis for Enhanced DetNet", , <https://datatracker.ietf.org/doc/html/draft-xiong-detnet-enhanced-detnet-gap-analysis-03>.
[I-D.xiong-detnet-large-scale-enhancements]
Xiong, Q., Du, Z., Zhao, J., and D. Yang, "Enhanced DetNet Data Plane Framework for Scaling Deterministic Networks", , <https://datatracker.ietf.org/doc/html/draft-xiong-detnet-large-scale-enhancements-04>.
[IEEE802.1CB]
"IEEE Standard for Local and metropolitan area networks--Frame Replication and Elimination for Reliability", , <https://ieeexplore.ieee.org/document/8091139>.
[RFC2119]
Bradner, S., "Key words for use in RFCs to Indicate Requirement Levels", BCP 14, RFC 2119, DOI 10.17487/RFC2119, , <https://www.rfc-editor.org/info/rfc2119>.
[RFC5357]
Hedayat, K., Krzanowski, R., Morton, A., Yum, K., and J. Babiarz, "A Two-Way Active Measurement Protocol (TWAMP)", RFC 5357, DOI 10.17487/RFC5357, , <https://www.rfc-editor.org/info/rfc5357>.
[RFC5880]
Katz, D. and D. Ward, "Bidirectional Forwarding Detection (BFD)", RFC 5880, DOI 10.17487/RFC5880, , <https://www.rfc-editor.org/info/rfc5880>.
[RFC6374]
Frost, D. and S. Bryant, "Packet Loss and Delay Measurement for MPLS Networks", RFC 6374, DOI 10.17487/RFC6374, , <https://www.rfc-editor.org/info/rfc6374>.
[RFC7799]
Morton, A., "Active and Passive Metrics and Methods (with Hybrid Types In-Between)", RFC 7799, DOI 10.17487/RFC7799, , <https://www.rfc-editor.org/info/rfc7799>.
[RFC8321]
Fioccola, G., Ed., Capello, A., Cociglio, M., Castaldelli, L., Chen, M., Zheng, L., Mirsky, G., and T. Mizrahi, "Alternate-Marking Method for Passive and Hybrid Performance Monitoring", RFC 8321, DOI 10.17487/RFC8321, , <https://www.rfc-editor.org/info/rfc8321>.
[RFC8655]
Finn, N., Thubert, P., Varga, B., and J. Farkas, "Deterministic Networking Architecture", RFC 8655, DOI 10.17487/RFC8655, , <https://www.rfc-editor.org/info/rfc8655>.
[RFC8762]
Mirsky, G., Jun, G., Nydell, H., and R. Foote, "Simple Two-Way Active Measurement Protocol", RFC 8762, DOI 10.17487/RFC8762, , <https://www.rfc-editor.org/info/rfc8762>.
[RFC8938]
Varga, B., Ed., Farkas, J., Berger, L., Malis, A., and S. Bryant, "Deterministic Networking (DetNet) Data Plane Framework", RFC 8938, DOI 10.17487/RFC8938, , <https://www.rfc-editor.org/info/rfc8938>.
[RFC9197]
Brockners, F., Ed., Bhandari, S., Ed., and T. Mizrahi, Ed., "Data Fields for In Situ Operations, Administration, and Maintenance (IOAM)", RFC 9197, DOI 10.17487/RFC9197, , <https://www.rfc-editor.org/info/rfc9197>.
[RFC9326]
Song, H., Gafni, B., Brockners, F., Bhandari, S., and T. Mizrahi, "In Situ Operations, Administration, and Maintenance (IOAM) Direct Exporting", RFC 9326, DOI 10.17487/RFC9326, , <https://www.rfc-editor.org/info/rfc9326>.
[RFC9546]
Mirsky, G., Chen, M., and B. Varga, "Operations, Administration, and Maintenance (OAM) for Deterministic Networking (DetNet) with the MPLS Data Plane", RFC 9546, DOI 10.17487/RFC9546, , <https://www.rfc-editor.org/info/rfc9546>.
[RFC9551]
Mirsky, G., Theoleyre, F., Papadopoulos, G., Bernardos, CJ., Varga, B., and J. Farkas, "Framework of Operations, Administration, and Maintenance (OAM) for Deterministic Networking (DetNet)", RFC 9551, DOI 10.17487/RFC9551, , <https://www.rfc-editor.org/info/rfc9551>.

Authors' Addresses

Jinjie Yan
ZTE Corporation
China
Zhengxin Han
China Unicom
China
Xiangyang Zhu
ZTE Corporation
China