public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: Vincent Jardin <vjardin@free.fr>
To: dev@dpdk.org
Cc: rasland@nvidia.com, thomas@monjalon.net,
	andrew.rybchenko@oktetlabs.ru, dsosnowski@nvidia.com,
	viacheslavo@nvidia.com, bingz@nvidia.com, orika@nvidia.com,
	suanmingm@nvidia.com, matan@nvidia.com,
	Vincent Jardin <vjardin@free.fr>
Subject: [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation
Date: Tue, 10 Mar 2026 10:20:05 +0100	[thread overview]
Message-ID: <20260310092014.2762894-2-vjardin@free.fr> (raw)
In-Reply-To: <20260310092014.2762894-1-vjardin@free.fr>

The Tx Scheduling section incorrectly stated that timestamps can only
be put on the first packet in a burst. The driver actually checks every
packet's ol_flags for the timestamp dynamic flag and inserts a dedicated
WAIT WQE per timestamped packet. The eMPW path also breaks batches when
a timestamped packet is encountered.

Additionally, the ConnectX-7+ wait-on-time capability was only briefly
mentioned in the tx_pp parameter section with no explanation of how it
differs from the ConnectX-6 Dx Clock Queue approach.

This patch:
- Removes the stale first-packet-only limitation
- Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
  ConnectX-7+ wait-on-time) with separate requirements tables
- Clarifies that tx_pp is specific to ConnectX-6 Dx
- Fixes tx_skew applicability to cover both hardware generations
- Updates the Send Scheduling Counters intro to reflect that timestamp
  validation counters also apply to ConnectX-7+ wait-on-time mode

Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
 1 file changed, 78 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 2529c2f4c8..5b097dbc90 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -553,27 +553,32 @@ for an additional list of options shared with other mlx5 drivers.
 
 - ``tx_pp`` parameter [int]
 
+  This parameter applies to **ConnectX-6 Dx** only.
   If a nonzero value is specified the driver creates all necessary internal
-  objects to provide accurate packet send scheduling on mbuf timestamps.
+  objects (Clock Queue and Rearm Queue) to provide accurate packet send
+  scheduling on mbuf timestamps using a cross-channel approach.
   The positive value specifies the scheduling granularity in nanoseconds,
   the packet send will be accurate up to specified digits. The allowed range is
   from 500 to 1 million of nanoseconds. The negative value specifies the module
   of granularity and engages the special test mode the check the schedule rate.
   By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
-  feature is disabled.
+  feature is disabled on ConnectX-6 Dx.
 
-  Starting with ConnectX-7 the capability to schedule traffic directly
-  on timestamp specified in descriptor is provided,
-  no extra objects are needed anymore and scheduling capability
-  is advertised and handled regardless ``tx_pp`` parameter presence.
+  Starting with **ConnectX-7** the hardware provides a native wait-on-time
+  capability that inserts the scheduling delay directly in the WQE descriptor.
+  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter is not
+  required. The driver automatically advertises send scheduling support when
+  the HCA wait-on-time capability is detected. The ``tx_skew`` parameter can
+  still be used on ConnectX-7 and above to compensate for wire delay.
 
 - ``tx_skew`` parameter [int]
 
   The parameter adjusts the send packet scheduling on timestamps and represents
   the average delay between beginning of the transmitting descriptor processing
   by the hardware and appearance of actual packet data on the wire. The value
-  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
-  specified. The default value is zero.
+  should be provided in nanoseconds and applies to both ConnectX-6 Dx
+  (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
+  The default value is zero.
 
 - ``tx_vec_en`` parameter [int]
 
@@ -883,9 +888,13 @@ Send Scheduling Counters
 
 The mlx5 PMD provides a comprehensive set of counters designed for
 debugging and diagnostics related to packet scheduling during transmission.
-These counters are applicable only if the port was configured with the ``tx_pp`` devarg
-and reflect the status of the PMD scheduling infrastructure
-based on Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
+The first group of counters (prefixed ``tx_pp_``) reflects the status of the
+Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx and is
+applicable only if the port was configured with the ``tx_pp`` devarg.
+The timestamp validation counters
+(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
+``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and above
+in wait-on-time mode, without requiring ``tx_pp``.
 
 ``tx_pp_missed_interrupt_errors``
   Indicates that the Rearm Queue interrupt was not serviced on time.
@@ -1960,31 +1969,54 @@ Limitations
 Tx Scheduling
 ~~~~~~~~~~~~~
 
-When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on the packet
-being sent it tries to synchronize the time of packet appearing on
-the wire with the specified packet timestamp. If the specified one
-is in the past it should be ignored, if one is in the distant future
-it should be capped with some reasonable value (in range of seconds).
-These specific cases ("too late" and "distant future") can be optionally
-reported via device xstats to assist applications to detect the
-time-related problems.
-
-The timestamp upper "too-distant-future" limit
-at the moment of invoking the Tx burst routine
-can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on a packet
+being sent it inserts a dedicated WAIT WQE to synchronize the time of the
+packet appearing on the wire with the specified timestamp. Every packet
+in a burst that carries the timestamp dynamic flag is individually
+scheduled -- there is no restriction to the first packet only.
+
+If the specified timestamp is in the past, the packet is sent immediately.
+If it is in the distant future it should be capped with some reasonable
+value (in range of seconds). These specific cases ("too late" and
+"distant future") can be optionally reported via device xstats to assist
+applications to detect time-related problems.
+
+The eMPW (enhanced Multi-Packet Write) data path automatically breaks
+the batch when a timestamped packet is encountered, ensuring each
+scheduled packet gets its own WAIT WQE.
+
+Two hardware mechanisms are supported:
+
+**ConnectX-6 Dx -- Clock Queue (cross-channel)**
+   The driver creates a Clock Queue and a Rearm Queue that together
+   provide a time reference for scheduling. This mode requires the
+   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
+   "too-distant-future" limit at the moment of invoking the Tx burst
+   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
+   by 2^23.
+
+**ConnectX-7 and above -- wait-on-time**
+   The hardware supports placing the scheduling delay directly inside
+   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
+   ``tx_pp`` devarg is **not** required. The driver automatically
+   advertises send scheduling support when the HCA wait-on-time
+   capability is detected.
+
 Please note, for the testpmd txonly mode,
 the limit is deduced from the expression::
 
    (n_tx_descriptors / burst_size + 1) * inter_burst_gap
 
-There is no any packet reordering according timestamps is supposed,
-neither within packet burst, nor between packets, it is an entirely
-application responsibility to generate packets and its timestamps
-in desired order.
+There is no packet reordering according to timestamps,
+neither within a packet burst, nor between packets. It is entirely the
+application's responsibility to generate packets and their timestamps
+in the desired order.
 
 Requirements
 ^^^^^^^^^^^^
 
+ConnectX-6 Dx (Clock Queue mode):
+
 =========  =============
 Minimum    Version
 =========  =============
@@ -1996,20 +2028,35 @@ rdma-core
 DPDK       20.08
 =========  =============
 
+ConnectX-7 and above (wait-on-time mode):
+
+=========  =============
+Minimum    Version
+=========  =============
+hardware   ConnectX-7
+=========  =============
+
 Firmware configuration
 ^^^^^^^^^^^^^^^^^^^^^^
 
 Runtime configuration
 ^^^^^^^^^^^^^^^^^^^^^
 
-To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
-parameter should be specified.
+**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must be
+specified to enable send scheduling on mbuf timestamps.
+
+**ConnectX-7+**: no devarg is required. Send scheduling is automatically
+enabled when the HCA reports the wait-on-time capability.
+
+On both hardware generations the ``tx_skew`` parameter can be used to
+compensate for the delay between descriptor processing and actual wire
+time.
 
 Limitations
 ^^^^^^^^^^^
 
-#. The timestamps can be put only in the first packet
-   in the burst providing the entire burst scheduling.
+#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
+   are capped (see the ``tx_pp`` x 2^23 limit above).
 
 
 .. _mlx5_tx_inline:
-- 
2.43.0


  reply	other threads:[~2026-03-10  9:20 UTC|newest]

Thread overview: 88+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
2026-03-10  9:20 ` Vincent Jardin [this message]
2026-03-11 12:26   ` [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation Slava Ovsiienko
2026-03-10  9:20 ` [PATCH v1 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 9/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-10 14:20 ` [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-11 16:29     ` Stephen Hemminger
2026-03-10 23:26   ` [PATCH v2 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-11 16:31     ` Stephen Hemminger
2026-03-10 23:26   ` [PATCH v2 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-11 16:17     ` Stephen Hemminger
2026-03-11 16:26     ` Stephen Hemminger
2026-03-12 15:54       ` Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 09/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-11 16:31     ` Stephen Hemminger
2026-03-11 16:35     ` Stephen Hemminger
2026-03-12 15:05       ` Vincent Jardin
2026-03-12 16:01         ` Stephen Hemminger
2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 01/9] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-20 12:02     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-20 12:01     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-20 12:51     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 05/9] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-20 15:11     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 06/9] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-20 15:19     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-20 15:38     ` Slava Ovsiienko
2026-03-22 14:02       ` Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 08/9] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-20 15:44     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 09/9] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-20 15:49     ` Slava Ovsiienko
2026-03-16 16:04   ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-22 14:16     ` Vincent Jardin
2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-23 12:59       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-23 13:00       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-23 13:17       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-23 13:18       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-23 13:19       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-23 13:19       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
2026-03-23 13:20       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-23 13:20       ` Slava Ovsiienko
2026-03-23 23:09     ` [PATCH v4 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-25  2:24         ` Stephen Hemminger
2026-03-24 16:50       ` [PATCH v5 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-25  2:25       ` [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-04-13 13:18       ` Raslan Darawsheh

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260310092014.2762894-2-vjardin@free.fr \
    --to=vjardin@free.fr \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=rasland@nvidia.com \
    --cc=suanmingm@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox