public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
* [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
@ 2026-03-10  9:20 Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
                   ` (12 more replies)
  0 siblings, 13 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

This series adds per-queue Tx rate limiting to the mlx5 PMD using
the HW packet pacing (PP) rate table.

The ConnectX-6 Dx and later NICs expose a per-SQ
packet_pacing_rate_limit_index that can be changed on a live SQ
via modify_bitmask without queue teardown. The kernel mlx5 driver
refcounts PP contexts internally, so queues configured at the same
rate share a single HW rate table entry.

The series is structured as follows:

  1. Doc fix for stale packet pacing documentation
  2-3. common/mlx5: query PP capabilities and extend SQ modify
  4-6. net/mlx5: per-queue PP infrastructure, rate_limit callback,
       burst pacing devargs (tx_burst_bound, tx_typical_pkt_sz)
  7. net/mlx5: testpmd command to query per-queue rate state
  8. ethdev: add rte_eth_get_queue_rate_limit() symmetric getter
  9. net/mlx5: share PP rate table entries across queues
  10. net/mlx5: rate table capacity query API

Usage with testpmd:
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0      # disable
  mlx5 port 0 txq 0 rate show    # query

Tested on ConnectX-6 Dx only.

Vincent Jardin (11):
  doc/nics/mlx5: fix stale packet pacing documentation
  common/mlx5: query packet pacing rate table capabilities
  common/mlx5: extend SQ modify to support rate limit update
  net/mlx5: add per-queue packet pacing infrastructure
  net/mlx5: support per-queue rate limiting
  net/mlx5: add burst pacing devargs
  net/mlx5: add testpmd command to query per-queue rate limit
  ethdev: add getter for per-queue Tx rate limit
  mailmap: update Vincent Jardin email address
  net/mlx5: share pacing rate table entries across queues
  net/mlx5: add rate table capacity query API

 .mailmap                             |   3 +-
 doc/guides/nics/mlx5.rst             | 125 +++++++++++++++++------
 drivers/common/mlx5/mlx5_devx_cmds.c |  20 ++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
 drivers/net/mlx5/mlx5.c              |  46 +++++++++
 drivers/net/mlx5/mlx5.h              |  13 +++
 drivers/net/mlx5/mlx5_testpmd.c      |  93 +++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c           |  89 +++++++++++++++++
 drivers/net/mlx5/mlx5_tx.h           |   5 +
 drivers/net/mlx5/mlx5_txpp.c         |  75 ++++++++++++++
 drivers/net/mlx5/mlx5_txq.c          | 144 +++++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h      |  57 +++++++++++
 lib/ethdev/ethdev_driver.h           |   7 ++
 lib/ethdev/rte_ethdev.c              |  28 ++++++
 lib/ethdev/rte_ethdev.h              |  24 +++++
 15 files changed, 710 insertions(+), 33 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-11 12:26   ` Slava Ovsiienko
  2026-03-10  9:20 ` [PATCH v1 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

The Tx Scheduling section incorrectly stated that timestamps can only
be put on the first packet in a burst. The driver actually checks every
packet's ol_flags for the timestamp dynamic flag and inserts a dedicated
WAIT WQE per timestamped packet. The eMPW path also breaks batches when
a timestamped packet is encountered.

Additionally, the ConnectX-7+ wait-on-time capability was only briefly
mentioned in the tx_pp parameter section with no explanation of how it
differs from the ConnectX-6 Dx Clock Queue approach.

This patch:
- Removes the stale first-packet-only limitation
- Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
  ConnectX-7+ wait-on-time) with separate requirements tables
- Clarifies that tx_pp is specific to ConnectX-6 Dx
- Fixes tx_skew applicability to cover both hardware generations
- Updates the Send Scheduling Counters intro to reflect that timestamp
  validation counters also apply to ConnectX-7+ wait-on-time mode

Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
 1 file changed, 78 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 2529c2f4c8..5b097dbc90 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -553,27 +553,32 @@ for an additional list of options shared with other mlx5 drivers.
 
 - ``tx_pp`` parameter [int]
 
+  This parameter applies to **ConnectX-6 Dx** only.
   If a nonzero value is specified the driver creates all necessary internal
-  objects to provide accurate packet send scheduling on mbuf timestamps.
+  objects (Clock Queue and Rearm Queue) to provide accurate packet send
+  scheduling on mbuf timestamps using a cross-channel approach.
   The positive value specifies the scheduling granularity in nanoseconds,
   the packet send will be accurate up to specified digits. The allowed range is
   from 500 to 1 million of nanoseconds. The negative value specifies the module
   of granularity and engages the special test mode the check the schedule rate.
   By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
-  feature is disabled.
+  feature is disabled on ConnectX-6 Dx.
 
-  Starting with ConnectX-7 the capability to schedule traffic directly
-  on timestamp specified in descriptor is provided,
-  no extra objects are needed anymore and scheduling capability
-  is advertised and handled regardless ``tx_pp`` parameter presence.
+  Starting with **ConnectX-7** the hardware provides a native wait-on-time
+  capability that inserts the scheduling delay directly in the WQE descriptor.
+  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter is not
+  required. The driver automatically advertises send scheduling support when
+  the HCA wait-on-time capability is detected. The ``tx_skew`` parameter can
+  still be used on ConnectX-7 and above to compensate for wire delay.
 
 - ``tx_skew`` parameter [int]
 
   The parameter adjusts the send packet scheduling on timestamps and represents
   the average delay between beginning of the transmitting descriptor processing
   by the hardware and appearance of actual packet data on the wire. The value
-  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
-  specified. The default value is zero.
+  should be provided in nanoseconds and applies to both ConnectX-6 Dx
+  (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
+  The default value is zero.
 
 - ``tx_vec_en`` parameter [int]
 
@@ -883,9 +888,13 @@ Send Scheduling Counters
 
 The mlx5 PMD provides a comprehensive set of counters designed for
 debugging and diagnostics related to packet scheduling during transmission.
-These counters are applicable only if the port was configured with the ``tx_pp`` devarg
-and reflect the status of the PMD scheduling infrastructure
-based on Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
+The first group of counters (prefixed ``tx_pp_``) reflects the status of the
+Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx and is
+applicable only if the port was configured with the ``tx_pp`` devarg.
+The timestamp validation counters
+(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
+``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and above
+in wait-on-time mode, without requiring ``tx_pp``.
 
 ``tx_pp_missed_interrupt_errors``
   Indicates that the Rearm Queue interrupt was not serviced on time.
@@ -1960,31 +1969,54 @@ Limitations
 Tx Scheduling
 ~~~~~~~~~~~~~
 
-When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on the packet
-being sent it tries to synchronize the time of packet appearing on
-the wire with the specified packet timestamp. If the specified one
-is in the past it should be ignored, if one is in the distant future
-it should be capped with some reasonable value (in range of seconds).
-These specific cases ("too late" and "distant future") can be optionally
-reported via device xstats to assist applications to detect the
-time-related problems.
-
-The timestamp upper "too-distant-future" limit
-at the moment of invoking the Tx burst routine
-can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on a packet
+being sent it inserts a dedicated WAIT WQE to synchronize the time of the
+packet appearing on the wire with the specified timestamp. Every packet
+in a burst that carries the timestamp dynamic flag is individually
+scheduled -- there is no restriction to the first packet only.
+
+If the specified timestamp is in the past, the packet is sent immediately.
+If it is in the distant future it should be capped with some reasonable
+value (in range of seconds). These specific cases ("too late" and
+"distant future") can be optionally reported via device xstats to assist
+applications to detect time-related problems.
+
+The eMPW (enhanced Multi-Packet Write) data path automatically breaks
+the batch when a timestamped packet is encountered, ensuring each
+scheduled packet gets its own WAIT WQE.
+
+Two hardware mechanisms are supported:
+
+**ConnectX-6 Dx -- Clock Queue (cross-channel)**
+   The driver creates a Clock Queue and a Rearm Queue that together
+   provide a time reference for scheduling. This mode requires the
+   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
+   "too-distant-future" limit at the moment of invoking the Tx burst
+   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
+   by 2^23.
+
+**ConnectX-7 and above -- wait-on-time**
+   The hardware supports placing the scheduling delay directly inside
+   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
+   ``tx_pp`` devarg is **not** required. The driver automatically
+   advertises send scheduling support when the HCA wait-on-time
+   capability is detected.
+
 Please note, for the testpmd txonly mode,
 the limit is deduced from the expression::
 
    (n_tx_descriptors / burst_size + 1) * inter_burst_gap
 
-There is no any packet reordering according timestamps is supposed,
-neither within packet burst, nor between packets, it is an entirely
-application responsibility to generate packets and its timestamps
-in desired order.
+There is no packet reordering according to timestamps,
+neither within a packet burst, nor between packets. It is entirely the
+application's responsibility to generate packets and their timestamps
+in the desired order.
 
 Requirements
 ^^^^^^^^^^^^
 
+ConnectX-6 Dx (Clock Queue mode):
+
 =========  =============
 Minimum    Version
 =========  =============
@@ -1996,20 +2028,35 @@ rdma-core
 DPDK       20.08
 =========  =============
 
+ConnectX-7 and above (wait-on-time mode):
+
+=========  =============
+Minimum    Version
+=========  =============
+hardware   ConnectX-7
+=========  =============
+
 Firmware configuration
 ^^^^^^^^^^^^^^^^^^^^^^
 
 Runtime configuration
 ^^^^^^^^^^^^^^^^^^^^^
 
-To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
-parameter should be specified.
+**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must be
+specified to enable send scheduling on mbuf timestamps.
+
+**ConnectX-7+**: no devarg is required. Send scheduling is automatically
+enabled when the HCA reports the wait-on-time capability.
+
+On both hardware generations the ``tx_skew`` parameter can be used to
+compensate for the delay between descriptor processing and actual wire
+time.
 
 Limitations
 ^^^^^^^^^^^
 
-#. The timestamps can be put only in the first packet
-   in the burst providing the entire burst scheduling.
+#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
+   are capped (see the ``tx_pp`` x 2^23 limit above).
 
 
 .. _mlx5_tx_inline:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 02/10] common/mlx5: query packet pacing rate table capabilities
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
                   ` (10 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Query additional QoS packet pacing capabilities from HCA attributes:
- packet_pacing_burst_bound: HW supports burst_upper_bound parameter
- packet_pacing_typical_size: HW supports typical_packet_size parameter
- packet_pacing_max_rate / packet_pacing_min_rate: rate range in kbps
- packet_pacing_rate_table_size: number of HW rate table entries

These capabilities are needed by the upcoming per-queue rate limiting
feature to validate devarg values and report HW limits.

Supported hardware:
- ConnectX-6 Dx and later (different boards expose different subsets)
- ConnectX-5 reports packet_pacing but not all extended fields
- ConnectX-7/8 report the full capability set
- BlueField-2 and later DPUs also report these capabilities

Not supported:
- ConnectX-4 Lx and earlier (no packet_pacing capability at all)
- ConnectX-5 Ex may not report burst_bound or typical_size

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 11 ++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index d12ebf8487..8f53303fa7 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1244,6 +1244,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 				MLX5_GET(qos_cap, hcattr, packet_pacing);
 		attr->qos.wqe_rate_pp =
 				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
+		attr->qos.packet_pacing_burst_bound =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_burst_bound);
+		attr->qos.packet_pacing_typical_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_typical_size);
+		attr->qos.packet_pacing_max_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_max_rate);
+		attr->qos.packet_pacing_min_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_min_rate);
+		attr->qos.packet_pacing_rate_table_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_rate_table_size);
 		if (attr->qos.flow_meter_aso_sup) {
 			attr->qos.log_meter_aso_granularity =
 				MLX5_GET(qos_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index da50fc686c..930ae2c072 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -67,7 +67,16 @@ struct mlx5_hca_qos_attr {
 	/* Power of the maximum allocation granularity Object. */
 	uint32_t log_max_num_meter_aso:5;
 	/* Power of the maximum number of supported objects. */
-
+	uint32_t packet_pacing_burst_bound:1;
+	/* HW supports burst_upper_bound PP parameter. */
+	uint32_t packet_pacing_typical_size:1;
+	/* HW supports typical_packet_size PP parameter. */
+	uint32_t packet_pacing_max_rate;
+	/* Maximum supported pacing rate in kbps. */
+	uint32_t packet_pacing_min_rate;
+	/* Minimum supported pacing rate in kbps. */
+	uint16_t packet_pacing_rate_table_size;
+	/* Number of entries in the HW rate table. */
 };
 
 struct mlx5_hca_vdpa_attr {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 03/10] common/mlx5: extend SQ modify to support rate limit update
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
                   ` (9 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Add rl_update and packet_pacing_rate_limit_index fields to
mlx5_devx_modify_sq_attr. When rl_update is set, the modify SQ
command sets modify_bitmask bit 0 and writes the PP index into
the SQ context, allowing dynamic rate changes on a live RDY SQ
without teardown.

modify_sq_in.modify_bitmask[0x40] bit 0 controls the
packet_pacing_rate_limit_index.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same SQ context field, also supports wait-on-time
- BlueField-2/3: same modify_sq command support

Not supported:
- ConnectX-5: supports packet_pacing but only at SQ creation time,
  dynamic modify_bitmask update may not be supported on all FW
- ConnectX-4 Lx and earlier: no packet_pacing support

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 5 +++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 8f53303fa7..17378e1753 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2129,6 +2129,11 @@ mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
 	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	if (sq_attr->rl_update) {
+		MLX5_SET64(modify_sq_in, in, modify_bitmask, 1);
+		MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+			 sq_attr->packet_pacing_rate_limit_index);
+	}
 	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
 					 out, sizeof(out));
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 930ae2c072..82d949972b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -519,6 +519,9 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t state:4;
 	uint32_t hairpin_peer_rq:24;
 	uint32_t hairpin_peer_vhca:16;
+	uint32_t rl_update:1;
+	/* Set to update packet_pacing_rate_limit_index on a live SQ. */
+	uint32_t packet_pacing_rate_limit_index:16;
 };
 
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 04/10] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (2 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
                   ` (8 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Add mlx5_txq_rate_limit structure and alloc/free helpers for
per-queue data-rate packet pacing. Each Tx queue can now hold
its own PP (Packet Pacing) index allocated via mlx5dv_pp_alloc()
with MLX5_DATA_RATE mode.

mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM
rate_limit field and allocates a dedicated PP index from the HW
rate table. mlx5_txq_free_pp_rate_limit() releases it.

The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is
untouched — it uses MLX5_WQE_RATE for per-packet scheduling,
while per-queue rate limiting uses MLX5_DATA_RATE.

PP index cleanup is added to mlx5_txq_release() to prevent leaks
when queues are destroyed.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same mechanism, plus wait-on-time coexistence
- BlueField-2/3: same PP allocation support

Not supported:
- ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
  not be available on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.h      | 11 +++++++
 drivers/net/mlx5/mlx5_tx.h   |  1 +
 drivers/net/mlx5/mlx5_txpp.c | 64 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c  |  1 +
 4 files changed, 77 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b83dda5652..c48c3072d1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1296,6 +1296,13 @@ struct mlx5_txpp_ts {
 	RTE_ATOMIC(uint64_t) ts;
 };
 
+/* Per-queue rate limit tracking. */
+struct mlx5_txq_rate_limit {
+	void *pp;		/* Packet pacing context from dv_alloc_pp. */
+	uint16_t pp_id;		/* Packet pacing index. */
+	uint32_t rate_mbps;	/* Current rate in Mbps, 0 = disabled. */
+};
+
 /* Tx packet pacing structure. */
 struct mlx5_dev_txpp {
 	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
@@ -2634,6 +2641,10 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev,
 void mlx5_txpp_interrupt_handler(void *cb_arg);
 int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev);
 void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev);
+int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+				 struct mlx5_txq_rate_limit *rl,
+				 uint32_t rate_mbps);
+void mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl);
 
 /* mlx5_rxtx.c */
 
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 0134a2e003..b1b3653247 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -192,6 +192,7 @@ struct mlx5_txq_ctrl {
 	uint16_t dump_file_n; /* Number of dump files. */
 	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	uint32_t hairpin_status; /* Hairpin binding status. */
+	struct mlx5_txq_rate_limit rl; /* Per-queue rate limit. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 0e99b58bde..e57e263628 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -128,6 +128,70 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 #endif
 }
 
+/* Free a per-queue packet pacing index. */
+void
+mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	if (rl->pp) {
+		mlx5_glue->dv_free_pp(rl->pp);
+		rl->pp = NULL;
+		rl->pp_id = 0;
+		rl->rate_mbps = 0;
+	}
+#else
+	RTE_SET_USED(rl);
+#endif
+}
+
+/* Allocate a per-queue packet pacing index for data-rate limiting. */
+int
+mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_txq_rate_limit *rl,
+			     uint32_t rate_mbps)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
+	uint32_t rate_kbps;
+
+	MLX5_ASSERT(rate_mbps > 0);
+	/* Free previous allocation if any. */
+	mlx5_txq_free_pp_rate_limit(rl);
+	memset(&pp, 0, sizeof(pp));
+	rate_kbps = rate_mbps * 1000;
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, rate_kbps);
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	rl->pp = mlx5_glue->dv_alloc_pp
+				(sh->cdev->ctx, sizeof(pp), &pp,
+				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
+	if (rl->pp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
+			rate_mbps);
+		rte_errno = errno;
+		return -errno;
+	}
+	rl->pp_id = ((struct mlx5dv_pp *)(rl->pp))->index;
+	if (!rl->pp_id) {
+		DRV_LOG(ERR, "Zero PP index allocated for rate %u Mbps.",
+			rate_mbps);
+		mlx5_txq_free_pp_rate_limit(rl);
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	rl->rate_mbps = rate_mbps;
+	DRV_LOG(DEBUG, "Allocated PP index %u for rate %u Mbps.",
+		rl->pp_id, rate_mbps);
+	return 0;
+#else
+	RTE_SET_USED(sh);
+	RTE_SET_USED(rl);
+	RTE_SET_USED(rate_mbps);
+	DRV_LOG(ERR, "Per-queue rate limit requires rdma-core PP support.");
+	rte_errno = ENOTSUP;
+	return -ENOTSUP;
+#endif
+}
+
 static void
 mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9275efb58e..fa9bb48fd4 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1338,6 +1338,7 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
 	if (rte_atomic_fetch_sub_explicit(&txq_ctrl->refcnt, 1, rte_memory_order_relaxed) - 1 > 1)
 		return 1;
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
 	if (txq_ctrl->obj) {
 		priv->obj_ops.txq_obj_release(txq_ctrl->obj);
 		LIST_REMOVE(txq_ctrl->obj, next);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 05/10] net/mlx5: support per-queue rate limiting
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (3 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback
allocates a per-queue PP index with the requested data rate, then
modifies the live SQ via modify_bitmask bit 0 to apply the new
packet_pacing_rate_limit_index — no queue teardown required.

Setting tx_rate=0 clears the PP index on the SQ and frees it.

Capability check uses hca_attr.qos.packet_pacing directly (not
dev_cap.txpp_en which requires Clock Queue prerequisites). This
allows per-queue rate limiting without the tx_pp devarg.

The callback rejects hairpin queues and queues whose SQ is not
yet created.

testpmd usage (no testpmd changes needed):
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0     # disable

Supported hardware:
- ConnectX-6 Dx: full support, per-SQ rate via HW rate table
- ConnectX-7/8: full support, coexists with wait-on-time scheduling
- BlueField-2/3: full support as DPU rep ports

Not supported:
- ConnectX-5: packet_pacing exists but dynamic SQ modify may not
  work on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.c     |  2 +
 drivers/net/mlx5/mlx5_tx.h  |  2 +
 drivers/net/mlx5/mlx5_txq.c | 96 +++++++++++++++++++++++++++++++++++++
 3 files changed, 100 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4d3bfddc36..c390406ac7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2690,6 +2690,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2783,6 +2784,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.count_aggr_ports = mlx5_count_aggr_ports,
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index b1b3653247..3a37f5bb4d 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index fa9bb48fd4..85959ca365 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1363,6 +1363,102 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	return 0;
 }
 
+/**
+ * Set per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param tx_rate
+ *   TX rate in Mbps, 0 to disable rate limiting.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	int ret;
+
+	if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
+		DRV_LOG(ERR, "Port %u packet pacing not supported.",
+			dev->data->port_id);
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		DRV_LOG(ERR, "Port %u Tx queue %u out of range.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	if (txq_ctrl->is_hairpin) {
+		DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (tx_rate == 0) {
+		/* Disable rate limiting. */
+		if (txq_ctrl->rl.pp_id == 0)
+			return 0; /* Already disabled. */
+		sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.rl_update = 1;
+		sq_attr.packet_pacing_rate_limit_index = 0;
+		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+		if (ret) {
+			DRV_LOG(ERR,
+				"Port %u Tx queue %u failed to clear rate.",
+				dev->data->port_id, queue_idx);
+			return ret;
+		}
+		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
+		DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
+			dev->data->port_id, queue_idx);
+		return 0;
+	}
+	/* Allocate a new PP index for the requested rate. */
+	ret = mlx5_txq_alloc_pp_rate_limit(sh, &txq_ctrl->rl, tx_rate);
+	if (ret)
+		return ret;
+	/* Modify live SQ to use the new PP index. */
+	sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+	sq_attr.state = MLX5_SQC_STATE_RDY;
+	sq_attr.rl_update = 1;
+	sq_attr.packet_pacing_rate_limit_index = txq_ctrl->rl.pp_id;
+	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
+			dev->data->port_id, queue_idx, tx_rate);
+		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
+		return ret;
+	}
+	DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx %u).",
+		dev->data->port_id, queue_idx, tx_rate, txq_ctrl->rl.pp_id);
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 06/10] net/mlx5: add burst pacing devargs
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (4 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
                   ` (6 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Expose burst_upper_bound and typical_packet_size from the PRM
set_pp_rate_limit_context as devargs:
- tx_burst_bound=<bytes>: max burst before rate evaluation kicks in
- tx_typical_pkt_sz=<bytes>: typical packet size for accuracy

These parameters apply to both per-queue rate limiting
(rte_eth_set_queue_rate_limit) and Clock Queue pacing (tx_pp).

Values are validated against HCA capabilities
(packet_pacing_burst_bound and packet_pacing_typical_size).
If the HW does not support them, a warning is logged and the
value is silently zeroed. Test mode still overrides both values.

Shared context mismatch checks ensure all ports on the same
device use the same burst parameters.

Supported hardware:
- ConnectX-6 Dx: burst_upper_bound and typical_packet_size
  reported via packet_pacing_burst_bound / packet_pacing_typical_size
  QoS capability bits
- ConnectX-7/8: full support for both parameters
- BlueField-2/3: same capabilities as host-side ConnectX

Not supported:
- ConnectX-5: may not report burst_bound or typical_size caps
- ConnectX-4 Lx and earlier: no packet_pacing at all

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst     | 16 ++++++++++++++
 drivers/net/mlx5/mlx5.c      | 42 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_txpp.c | 12 +++++++++++
 4 files changed, 72 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 5b097dbc90..2507fae846 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,22 @@ for an additional list of options shared with other mlx5 drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+- ``tx_burst_bound`` parameter [int]
+
+  Specifies the burst upper bound in bytes for packet pacing rate evaluation.
+  When set, the hardware considers this burst size when enforcing the configured
+  rate limit. Only effective when the HCA reports ``packet_pacing_burst_bound``
+  capability. Applies to both per-queue rate limiting
+  (``rte_eth_set_queue_rate_limit()``) and Clock Queue pacing (``tx_pp``).
+  The default value is zero (hardware default).
+
+- ``tx_typical_pkt_sz`` parameter [int]
+
+  Specifies the typical packet size in bytes for packet pacing rate accuracy
+  improvement. Only effective when the HCA reports
+  ``packet_pacing_typical_size`` capability. Applies to both per-queue rate
+  limiting and Clock Queue pacing. The default value is zero (hardware default).
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector with ConnectX-5 NICs and above.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c390406ac7..f399e0d5c9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -119,6 +119,18 @@
  */
 #define MLX5_TX_SKEW "tx_skew"
 
+/*
+ * Device parameter to specify burst upper bound in bytes
+ * for packet pacing rate evaluation.
+ */
+#define MLX5_TX_BURST_BOUND "tx_burst_bound"
+
+/*
+ * Device parameter to specify typical packet size in bytes
+ * for packet pacing rate accuracy improvement.
+ */
+#define MLX5_TX_TYPICAL_PKT_SZ "tx_typical_pkt_sz"
+
 /*
  * Device parameter to enable hardware Tx vector.
  * Deprecated, ignored (no vectorized Tx routines anymore).
@@ -1405,6 +1417,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->tx_pp = tmp;
 	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
 		config->tx_skew = tmp;
+	} else if (strcmp(MLX5_TX_BURST_BOUND, key) == 0) {
+		config->tx_burst_bound = tmp;
+	} else if (strcmp(MLX5_TX_TYPICAL_PKT_SZ, key) == 0) {
+		config->tx_typical_pkt_sz = tmp;
 	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
 		config->l3_vxlan_en = !!tmp;
 	} else if (strcmp(MLX5_VF_NL_EN, key) == 0) {
@@ -1518,8 +1534,10 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 				struct mlx5_sh_config *config)
 {
 	const char **params = (const char *[]){
+		MLX5_TX_BURST_BOUND,
 		MLX5_TX_PP,
 		MLX5_TX_SKEW,
+		MLX5_TX_TYPICAL_PKT_SZ,
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_ESW_EN,
@@ -1626,6 +1644,18 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(WARNING,
 			"\"tx_skew\" doesn't affect without \"tx_pp\".");
 	}
+	if (config->tx_burst_bound &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_burst_bound) {
+		DRV_LOG(WARNING,
+			"HW does not support burst_upper_bound, ignoring.");
+		config->tx_burst_bound = 0;
+	}
+	if (config->tx_typical_pkt_sz &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_typical_size) {
+		DRV_LOG(WARNING,
+			"HW does not support typical_packet_size, ignoring.");
+		config->tx_typical_pkt_sz = 0;
+	}
 	/* Check for LRO support. */
 	if (mlx5_devx_obj_ops_en(sh) && sh->cdev->config.hca_attr.lro_cap) {
 		/* TBD check tunnel lro caps. */
@@ -3260,6 +3290,18 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.tx_burst_bound != config->tx_burst_bound) {
+		DRV_LOG(ERR, "\"tx_burst_bound\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
+	if (sh->config.tx_typical_pkt_sz != config->tx_typical_pkt_sz) {
+		DRV_LOG(ERR, "\"tx_typical_pkt_sz\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
 		DRV_LOG(ERR, "\"TxQ memory alignment\" "
 			"configuration mismatch for shared %s context. %u - %u",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c48c3072d1..a8d71482ac 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -382,6 +382,8 @@ struct mlx5_port_config {
 struct mlx5_sh_config {
 	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
 	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
+	uint32_t tx_burst_bound; /* Burst upper bound in bytes, 0 = default. */
+	uint32_t tx_typical_pkt_sz; /* Typical packet size in bytes, 0 = default. */
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index e57e263628..5327af8d75 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -88,6 +88,12 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 	rate = NS_PER_S / sh->txpp.tick;
 	if (rate * sh->txpp.tick != NS_PER_S)
 		DRV_LOG(WARNING, "Packet pacing frequency is not precise.");
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	if (sh->txpp.test) {
 		uint32_t len;
 
@@ -161,6 +167,12 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 	rate_kbps = rate_mbps * 1000;
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, rate_kbps);
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	rl->pp = mlx5_glue->dv_alloc_pp
 				(sh->cdev->ctx, sizeof(pp), &pp,
 				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 07/10] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (5 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
                   ` (5 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Add a new testpmd command to display the per-queue packet pacing
rate limit state, including the PP index from both driver state
and FW SQ context readback:

  testpmd> mlx5 port <port_id> txq <queue_id> rate show

This helps verify that the FW actually applied the PP index to
the SQ after setting a per-queue rate limit.

Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that
queries txq_ctrl->rl for driver state and mlx5_devx_cmd_query_sq()
for the FW packet_pacing_rate_limit_index field.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_testpmd.c | 93 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c      | 38 ++++++++++++++
 drivers/net/mlx5/mlx5_txq.c     | 19 +++++--
 drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++++++++++
 4 files changed, 177 insertions(+), 3 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 1bb5a89559..fd3efecc5d 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -1365,6 +1365,94 @@ cmdline_parse_inst_t mlx5_cmd_dump_rq_context_options = {
 	}
 };
 
+/* Show per-queue rate limit PP index for a given port/queue */
+struct mlx5_cmd_show_rate_limit_options {
+	cmdline_fixed_string_t mlx5;
+	cmdline_fixed_string_t port;
+	portid_t port_id;
+	cmdline_fixed_string_t txq;
+	queueid_t queue_id;
+	cmdline_fixed_string_t rate;
+	cmdline_fixed_string_t show;
+};
+
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_mlx5 =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 mlx5, "mlx5");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 port, "port");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_port_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      port_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_txq =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 txq, "txq");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_queue_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      queue_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 rate, "rate");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 show, "show");
+
+static void
+mlx5_cmd_show_rate_limit_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct mlx5_cmd_show_rate_limit_options *res = parsed_result;
+	struct rte_pmd_mlx5_txq_rate_limit_info info;
+	int ret;
+
+	ret = rte_pmd_mlx5_txq_rate_limit_query(res->port_id, res->queue_id,
+						 &info);
+	switch (ret) {
+	case 0:
+		break;
+	case -ENODEV:
+		fprintf(stderr, "invalid port_id %u\n", res->port_id);
+		return;
+	case -EINVAL:
+		fprintf(stderr, "invalid queue index (%u), out of range\n",
+			res->queue_id);
+		return;
+	case -EIO:
+		fprintf(stderr, "failed to query SQ context\n");
+		return;
+	default:
+		fprintf(stderr, "query failed (%d)\n", ret);
+		return;
+	}
+	fprintf(stdout, "Port %u Txq %u rate limit info:\n",
+		res->port_id, res->queue_id);
+	if (info.rate_mbps > 0)
+		fprintf(stdout, "  Configured rate: %u Mbps\n",
+			info.rate_mbps);
+	else
+		fprintf(stdout, "  Configured rate: disabled\n");
+	fprintf(stdout, "  PP index (driver): %u\n", info.pp_index);
+	fprintf(stdout, "  PP index (FW readback): %u\n", info.fw_pp_index);
+}
+
+cmdline_parse_inst_t mlx5_cmd_show_rate_limit = {
+	.f = mlx5_cmd_show_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "mlx5 port <port_id> txq <queue_id> rate show",
+	.tokens = {
+		(void *)&mlx5_cmd_show_rate_limit_mlx5,
+		(void *)&mlx5_cmd_show_rate_limit_port,
+		(void *)&mlx5_cmd_show_rate_limit_port_id,
+		(void *)&mlx5_cmd_show_rate_limit_txq,
+		(void *)&mlx5_cmd_show_rate_limit_queue_id,
+		(void *)&mlx5_cmd_show_rate_limit_rate,
+		(void *)&mlx5_cmd_show_rate_limit_show,
+		NULL,
+	}
+};
+
 static struct testpmd_driver_commands mlx5_driver_cmds = {
 	.commands = {
 		{
@@ -1440,6 +1528,11 @@ static struct testpmd_driver_commands mlx5_driver_cmds = {
 			.help = "mlx5 port (port_id) queue (queue_id) dump rq_context (file_name)\n"
 				"    Dump mlx5 RQ Context\n\n",
 		},
+		{
+			.ctx = &mlx5_cmd_show_rate_limit,
+			.help = "mlx5 port (port_id) txq (queue_id) rate show\n"
+				"    Show per-queue rate limit PP index\n\n",
+		},
 		{
 			.ctx = NULL,
 		},
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 8085b5c306..7051390a5e 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -848,3 +848,41 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	fclose(fd);
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_txq_rate_limit_query, 26.07)
+int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				       struct rte_pmd_mlx5_txq_rate_limit_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	struct mlx5_txq_data *txq_data;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	uint32_t sq_out[MLX5_ST_SZ_DW(query_sq_out)] = {0};
+	int ret;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id))
+		return -EINVAL;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	txq_data = (*priv->txqs)[queue_id];
+	txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	info->rate_mbps = txq_ctrl->rl.rate_mbps;
+	info->pp_index = txq_ctrl->rl.pp_id;
+	if (txq_ctrl->obj == NULL) {
+		info->fw_pp_index = 0;
+		return 0;
+	}
+	ret = mlx5_devx_cmd_query_sq(txq_ctrl->obj->sq_obj.sq,
+				     sq_out, sizeof(sq_out));
+	if (ret)
+		return -EIO;
+	info->fw_pp_index = MLX5_GET(sqc,
+				     MLX5_ADDR_OF(query_sq_out, sq_out,
+						  sq_context),
+				     packet_pacing_rate_limit_index);
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 85959ca365..0e051b0839 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1412,7 +1412,20 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {
+	if (txq_ctrl->obj == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * For non-hairpin queues the SQ DevX object lives in
+	 * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
+	 * queues use obj->sq directly.  These are different members
+	 * of a union inside mlx5_txq_obj.
+	 */
+	struct mlx5_devx_obj *sq_devx = txq_ctrl->obj->sq_obj.sq;
+	if (sq_devx == NULL) {
 		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
 			dev->data->port_id, queue_idx);
 		rte_errno = EINVAL;
@@ -1426,7 +1439,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 		sq_attr.state = MLX5_SQC_STATE_RDY;
 		sq_attr.rl_update = 1;
 		sq_attr.packet_pacing_rate_limit_index = 0;
-		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+		ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Port %u Tx queue %u failed to clear rate.",
@@ -1447,7 +1460,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	sq_attr.state = MLX5_SQC_STATE_RDY;
 	sq_attr.rl_update = 1;
 	sq_attr.packet_pacing_rate_limit_index = txq_ctrl->rl.pp_id;
-	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+	ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
 			dev->data->port_id, queue_idx, tx_rate);
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 7acfdae97d..698d7d2032 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -420,6 +420,36 @@ __rte_experimental
 int
 rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const char *filename);
 
+/**
+ * Per-queue rate limit information.
+ */
+struct rte_pmd_mlx5_txq_rate_limit_info {
+	uint32_t rate_mbps;	/**< Configured rate in Mbps, 0 = disabled. */
+	uint16_t pp_index;	/**< PP index from driver state. */
+	uint16_t fw_pp_index;	/**< PP index read back from FW SQ context. */
+};
+
+/**
+ * Query per-queue rate limit state for a given Tx queue.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[in] queue_id
+ *   Tx queue ID.
+ * @param[out] info
+ *   Rate limit information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: invalid queue_id.
+ *   - -EIO: FW query failed.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (6 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 9/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
                   ` (4 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

The existing rte_eth_set_queue_rate_limit() API allows setting a
per-queue Tx rate but provides no way to read it back. Applications
such as grout are forced to maintain a shadow copy of the rate to
be able to report it.

Add rte_eth_get_queue_rate_limit() as the symmetric getter, following
the established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
rte_eth_dev_set_vlan_offload/get_vlan_offload).

This adds:
- eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
- rte_eth_get_queue_rate_limit() public experimental API (26.07)
- mlx5 PMD implementation reading from the existing per-queue
  rate_mbps tracking field

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.c     |  2 ++
 drivers/net/mlx5/mlx5_tx.h  |  2 ++
 drivers/net/mlx5/mlx5_txq.c | 34 ++++++++++++++++++++++++++++++++++
 lib/ethdev/ethdev_driver.h  |  7 +++++++
 lib/ethdev/rte_ethdev.c     | 28 ++++++++++++++++++++++++++++
 lib/ethdev/rte_ethdev.h     | 24 ++++++++++++++++++++++++
 6 files changed, 97 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f399e0d5c9..6e21ed31f3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2721,6 +2721,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2815,6 +2816,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 3a37f5bb4d..46e199d93e 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -224,6 +224,8 @@ int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
 int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 			      uint32_t tx_rate);
+int mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t *tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 0e051b0839..4065585ce7 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1472,6 +1472,40 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 }
 
+/**
+ * Get per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param[out] tx_rate
+ *   Pointer to store the TX rate in Mbps, 0 if rate limiting is disabled.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t *tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *txq_ctrl;
+
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	*tx_rate = txq_ctrl->rl.rate_mbps;
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 1255cd6f2c..0f336f9567 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -762,6 +762,11 @@ typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
 				uint16_t queue_idx,
 				uint32_t tx_rate);
 
+/** @internal Get queue Tx rate. */
+typedef int (*eth_get_queue_rate_limit_t)(struct rte_eth_dev *dev,
+				uint16_t queue_idx,
+				uint32_t *tx_rate);
+
 /** @internal Add tunneling UDP port. */
 typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *tunnel_udp);
@@ -1522,6 +1527,8 @@ struct eth_dev_ops {
 
 	/** Set queue rate limit */
 	eth_set_queue_rate_limit_t set_queue_rate_limit;
+	/** Get queue rate limit */
+	eth_get_queue_rate_limit_t get_queue_rate_limit;
 
 	/** Configure RSS hash protocols and hashing key */
 	rss_hash_update_t          rss_hash_update;
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 2edc7a362e..c6ad399033 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5694,6 +5694,34 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	return ret;
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_get_queue_rate_limit, 26.07)
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+					uint32_t *tx_rate)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (tx_rate == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: NULL tx_rate pointer",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: invalid queue ID=%u",
+			port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	if (dev->dev_ops->get_queue_rate_limit == NULL)
+		return -ENOTSUP;
+	return eth_err(port_id, dev->dev_ops->get_queue_rate_limit(dev, queue_idx, tx_rate));
+}
+
 RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_rx_avail_thresh_set, 22.07)
 int rte_eth_rx_avail_thresh_set(uint16_t port_id, uint16_t queue_id,
 			       uint8_t avail_thresh)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 0d8e2d0236..e525217b77 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4817,6 +4817,30 @@ int rte_eth_dev_uc_all_hash_table_set(uint16_t port_id, uint8_t on);
 int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 			uint32_t tx_rate);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Get the rate limitation for a queue on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_idx
+ *   The queue ID.
+ * @param[out] tx_rate
+ *   A pointer to retrieve the Tx rate in Mbps.
+ *   0 means rate limiting is disabled.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support this feature.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+			uint32_t *tx_rate);
+
 /**
  * Configuration of Receive Side Scaling hash computation of Ethernet device.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 9/10] net/mlx5: share pacing rate table entries across queues
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (7 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10  9:20 ` [PATCH v1 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
                   ` (3 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Allocating PP contexts with MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX
forces one HW rate table entry per TX queue. On ConnectX-6 Dx the
rate table is small (typically 128 entries), so setting the same rate
on many queues exhausts it quickly and returns ENOSPC.

Without the dedicated flag, the kernel mlx5 driver shares a single
rate table entry across all PP contexts with identical parameters
(rate, burst, packet size) using internal refcounting. Each queue
still gets its own PP handle for proper cleanup, but the underlying
HW index is shared.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_txpp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 5327af8d75..e40ee539f2 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -174,8 +174,7 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 		MLX5_SET(set_pp_rate_limit_context, &pp,
 			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	rl->pp = mlx5_glue->dv_alloc_pp
-				(sh->cdev->ctx, sizeof(pp), &pp,
-				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
+				(sh->cdev->ctx, sizeof(pp), &pp, 0);
 	if (rl->pp == NULL) {
 		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
 			rate_mbps);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v1 10/10] net/mlx5: add rate table capacity query API
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (8 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 9/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
@ 2026-03-10  9:20 ` Vincent Jardin
  2026-03-10 14:20 ` [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
                   ` (2 subsequent siblings)
  12 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10  9:20 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, Vincent Jardin

Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet
pacing rate table size and how many entries are currently in use.

The total comes from the HCA QoS capability
packet_pacing_rate_table_size. The used count is derived by
collecting unique non-zero PP indices across all TX queues.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_tx.c      | 51 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h | 27 +++++++++++++++++
 2 files changed, 78 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 7051390a5e..4cd0e1ce60 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -886,3 +886,54 @@ int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				     packet_pacing_rate_limit_index);
 	return 0;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query, 26.07)
+int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				     struct rte_pmd_mlx5_pp_rate_table_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint16_t used = 0;
+	uint16_t seen[RTE_MAX_QUEUES_PER_PORT];
+	unsigned int i;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	info->total = priv->sh->cdev->config.hca_attr.qos
+			.packet_pacing_rate_table_size;
+	/* Count unique non-zero PP indices across all TX queues. */
+	for (i = 0; i < priv->txqs_n; i++) {
+		struct mlx5_txq_data *txq_data;
+		struct mlx5_txq_ctrl *txq_ctrl;
+		uint16_t pp_id;
+		uint16_t j;
+		bool dup;
+
+		if (priv->txqs == NULL || (*priv->txqs)[i] == NULL)
+			continue;
+		txq_data = (*priv->txqs)[i];
+		txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+		pp_id = txq_ctrl->rl.pp_id;
+		if (pp_id == 0)
+			continue;
+		dup = false;
+		for (j = 0; j < used; j++) {
+			if (seen[j] == pp_id) {
+				dup = true;
+				break;
+			}
+		}
+		if (!dup && used < RTE_DIM(seen))
+			seen[used++] = pp_id;
+	}
+	info->used = used;
+	return 0;
+}
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 698d7d2032..4033b9acc7 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -450,6 +450,33 @@ int
 rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
 
+/**
+ * Packet pacing rate table capacity information.
+ */
+struct rte_pmd_mlx5_pp_rate_table_info {
+	uint16_t total;		/**< Total HW rate table entries. */
+	uint16_t used;		/**< Currently allocated entries. */
+};
+
+/**
+ * Query packet pacing rate table capacity.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[out] info
+ *   Rate table capacity information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: info is NULL.
+ *   - -ENOTSUP: packet pacing not supported.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				 struct rte_pmd_mlx5_pp_rate_table_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (9 preceding siblings ...)
  2026-03-10  9:20 ` [PATCH v1 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
@ 2026-03-10 14:20 ` Stephen Hemminger
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
  12 siblings, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-10 14:20 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Tue, 10 Mar 2026 10:20:04 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> This series adds per-queue Tx rate limiting to the mlx5 PMD using
> the HW packet pacing (PP) rate table.
> 
> The ConnectX-6 Dx and later NICs expose a per-SQ
> packet_pacing_rate_limit_index that can be changed on a live SQ
> via modify_bitmask without queue teardown. The kernel mlx5 driver
> refcounts PP contexts internally, so queues configured at the same
> rate share a single HW rate table entry.
> 
> The series is structured as follows:
> 
>   1. Doc fix for stale packet pacing documentation
>   2-3. common/mlx5: query PP capabilities and extend SQ modify
>   4-6. net/mlx5: per-queue PP infrastructure, rate_limit callback,
>        burst pacing devargs (tx_burst_bound, tx_typical_pkt_sz)
>   7. net/mlx5: testpmd command to query per-queue rate state
>   8. ethdev: add rte_eth_get_queue_rate_limit() symmetric getter
>   9. net/mlx5: share PP rate table entries across queues
>   10. net/mlx5: rate table capacity query API
> 
> Usage with testpmd:
>   set port 0 queue 0 rate 1000
>   set port 0 queue 1 rate 5000
>   set port 0 queue 0 rate 0      # disable
>   mlx5 port 0 txq 0 rate show    # query
> 
> Tested on ConnectX-6 Dx only.
> 
> Vincent Jardin (11):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   mailmap: update Vincent Jardin email address
>   net/mlx5: share pacing rate table entries across queues
>   net/mlx5: add rate table capacity query API
> 
>  .mailmap                             |   3 +-
>  doc/guides/nics/mlx5.rst             | 125 +++++++++++++++++------
>  drivers/common/mlx5/mlx5_devx_cmds.c |  20 ++++
>  drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
>  drivers/net/mlx5/mlx5.c              |  46 +++++++++
>  drivers/net/mlx5/mlx5.h              |  13 +++
>  drivers/net/mlx5/mlx5_testpmd.c      |  93 +++++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c           |  89 +++++++++++++++++
>  drivers/net/mlx5/mlx5_tx.h           |   5 +
>  drivers/net/mlx5/mlx5_txpp.c         |  75 ++++++++++++++
>  drivers/net/mlx5/mlx5_txq.c          | 144 +++++++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h      |  57 +++++++++++
>  lib/ethdev/ethdev_driver.h           |   7 ++
>  lib/ethdev/rte_ethdev.c              |  28 ++++++
>  lib/ethdev/rte_ethdev.h              |  24 +++++
>  15 files changed, 710 insertions(+), 33 deletions(-)
> 

Lots to digest here, so did a first pass review using AI.

Review: [PATCH v1 00/10] mlx5 per-queue packet pacing rate limiting
Submitter: Vincent Jardin <vjardin@free.fr>

================================================================
Patch 4/10: net/mlx5: add per-queue packet pacing infrastructure
================================================================

Error: Integer overflow in Mbps-to-kbps conversion
  mlx5_txq_alloc_pp_rate_limit() computes:
    rate_kbps = rate_mbps * 1000;
  Both rate_mbps and rate_kbps are uint32_t. If rate_mbps > 4,294,967
  (roughly 4.3 Tbps), the multiplication overflows and silently wraps,
  programming a wrong rate into the HW rate table. While 4 Tbps is
  beyond current HW, the function has no upper-bound validation against
  hca_attr.qos.packet_pacing_max_rate. At minimum, validate rate_mbps
  against the HCA max rate before the multiply, or widen to uint64_t:
    uint64_t rate_kbps = (uint64_t)rate_mbps * 1000;
  and check it fits in the 32-bit PRM field.

Warning: No validation of rate_mbps against HCA min/max rate
  The HCA reports packet_pacing_min_rate and packet_pacing_max_rate
  (queried in patch 2). mlx5_txq_alloc_pp_rate_limit() does not check
  the requested rate against these bounds. A rate below the HW minimum
  will likely be silently rounded or rejected by FW with an opaque
  error. Validating early with a clear log message would be more
  helpful.

================================================================
Patch 5/10: net/mlx5: support per-queue rate limiting
================================================================

Error: PP index leaked when mlx5_devx_cmd_modify_sq() fails on rate set
  In mlx5_set_queue_rate_limit() for the tx_rate > 0 path:
    ret = mlx5_txq_alloc_pp_rate_limit(sh, &txq_ctrl->rl, tx_rate);
    if (ret)
        return ret;
    ...
    ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
    if (ret) {
        ...
        mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
        return ret;
    }
  This looks correct on error -- good. However, note that
  mlx5_txq_alloc_pp_rate_limit() calls mlx5_txq_free_pp_rate_limit()
  internally at the top ("Free previous allocation if any"), meaning
  the OLD PP index is freed BEFORE the new one is allocated, and before
  the SQ is modified. If the new allocation succeeds but modify_sq
  fails, the SQ still has the OLD pp_id programmed, but that old PP
  context was already freed. The SQ is now referencing a freed PP index
  until the next successful set or queue teardown.

  Suggested fix: Do not free the old PP context inside
  mlx5_txq_alloc_pp_rate_limit(). Instead, allocate the new PP into a
  temporary mlx5_txq_rate_limit, modify the SQ, and only on success
  free the old PP and swap in the new one. On failure, free the new PP
  and leave the old one intact.

Warning: mlx5_set_queue_rate_limit return value inconsistency
  The disable path (tx_rate == 0) returns the raw ret from
  mlx5_devx_cmd_modify_sq() without setting rte_errno, while the
  capability-check and validation paths return -rte_errno. The ethdev
  layer expects negative errno values. Verify that
  mlx5_devx_cmd_modify_sq() returns negative errno consistently.

================================================================
Patch 7/10: net/mlx5: add testpmd command to query per-queue rate
================================================================

Error: Inverted return value check for rte_eth_tx_queue_is_valid()
  In rte_pmd_mlx5_txq_rate_limit_query():
    if (rte_eth_tx_queue_is_valid(port_id, queue_id))
        return -EINVAL;
  rte_eth_tx_queue_is_valid() returns 0 on success (valid queue) and
  non-zero on error. This check returns -EINVAL when the queue IS
  valid, and proceeds when it is NOT valid -- the logic is inverted.
  Should be:
    if (!rte_eth_tx_queue_is_valid(port_id, queue_id))
  or:
    if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)

  Without this fix, the query always fails on valid queues and may
  dereference invalid memory on invalid queues.

Warning: SQ object field mismatch between set and query paths
  In mlx5_set_queue_rate_limit() (patch 7 update), the code uses
  txq_ctrl->obj->sq_obj.sq for modify_sq. But in
  rte_pmd_mlx5_txq_rate_limit_query(), the query uses
  txq_ctrl->obj->sq_obj.sq as well (via sq_out). Make sure
  mlx5_devx_cmd_query_sq() exists and accepts this object -- the
  existing codebase has mlx5_devx_cmd_query_sq() but verify the
  parameter is the same sq_obj.sq vs sq distinction.

================================================================
Patch 8/10: ethdev: add getter for per-queue Tx rate limit
================================================================

Warning: New ethdev API needs broader discussion
  Adding a new eth_dev_ops callback (get_queue_rate_limit) changes the
  ethdev driver interface. This typically requires:
  - An RFC or discussion on the mailing list before the patch
  - Agreement from ethdev maintainers (Thomas, Andrew, Ferruh)
  - Consideration of whether this belongs as a generic ethdev API or
    should remain PMD-specific
  The patch adds it only to mlx5; other PMDs that support
  set_queue_rate_limit (ixgbe, i40e, ice) would need implementations
  too, or applications will get inconsistent -ENOTSUP.

Warning: Missing release notes for new ethdev API
  rte_eth_get_queue_rate_limit() is a new public experimental API in
  lib/ethdev. This needs an entry in doc/guides/rel_notes/ for the
  26.07 release.

Warning: Missing RTE_EXPORT_EXPERIMENTAL_SYMBOL for mlx5 PMD functions
  rte_pmd_mlx5_txq_rate_limit_query (patch 7) has the export macro.
  Verify rte_pmd_mlx5_pp_rate_table_query (patch 10) does too -- it
  appears to, which is good.

================================================================
Patch 10/10: net/mlx5: add rate table capacity query API
================================================================

Warning: VLA-sized stack array in rte_pmd_mlx5_pp_rate_table_query()
  The function declares:
    uint16_t seen[RTE_MAX_QUEUES_PER_PORT];
  RTE_MAX_QUEUES_PER_PORT is typically 1024, so this is a 2KB stack
  allocation. While not enormous, this is inside a function that could
  be called from any context. The O(n*m) dedup loop (for each queue,
  scan seen[] for duplicates) is also inefficient for large queue
  counts. Consider using a bitmap or a small hash set, or just
  counting unique pp_ids from the shared context rather than scanning
  all queues.

Warning: "used" count only reflects this port's queues
  The function counts unique PP indices across priv->txqs_n queues
  for one port, but the HW rate table is shared across all ports on
  the same device (shared context). If multiple ports share the same
  PCI device, the "used" count will underreport. Consider iterating
  over all ports on the same sh, or documenting that the count is
  per-port only.

================================================================
General series observations
================================================================

Info: Patch 1 (doc fix) has a Fixes tag referencing the original
  scheduling devargs commit, which is appropriate. However, patches
  2-10 are new features and should not have Fixes tags (they don't,
  which is correct).

Info: The series adds three new experimental PMD APIs and one new
  ethdev API. All new APIs in headers have __rte_experimental on
  their own line, which is correct. The export macros use the 26.07
  version tag.

Info: The series lacks a cover letter in this bundle. For a 10-patch
  series adding a significant new feature, a cover letter explaining
  the overall design and testing would be expected by reviewers.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v2 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (10 preceding siblings ...)
  2026-03-10 14:20 ` [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
@ 2026-03-10 23:26 ` Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
                     ` (9 more replies)
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
  12 siblings, 10 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
ethdev API to read back the configured rate.

Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a
dedicated PP index per rate from the HW rate table, programs it into the
SQ via modify_sq, and shares identical rates across queues to conserve
table entries. A PMD-specific API exposes per-queue PP diagnostics and
rate table capacity.

Patch breakdown:

  1. doc/nics/mlx5: fix stale packet pacing documentation
  2-3. common/mlx5: query PP capabilities and extend SQ modify
  4-6. net/mlx5: per-queue PP infrastructure, rate_limit callback,
       burst pacing devargs (tx_burst_bound, tx_typical_pkt_sz)
  7. net/mlx5: testpmd command to query per-queue rate state
  8. ethdev: add rte_eth_get_queue_rate_limit() symmetric getter
  9. net/mlx5: share PP rate table entries across queues
  10. net/mlx5: rate table capacity query API

Usage with testpmd:
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0      # disable
  mlx5 port 0 txq 0 rate show    # query

Changes since v1:

Addressed review feedback from Stephen Hemminger's AI:

Patch 4 (per-queue packet pacing infrastructure):
  - Validate rate_mbps against HCA packet_pacing_min_rate and
    packet_pacing_max_rate bounds; return -ERANGE on out-of-range
  - Widen rate_kbps from uint32_t to uint64_t to prevent
    overflow on rate_mbps * 1000
  - Remove early mlx5_txq_free_pp_rate_limit() call from the
    allocator (moved to caller, see patch 5)

Patch 5 (support per-queue rate limiting):
  - Fix PP index leak on modify_sq failure: allocate new PP into a
    temporary struct mlx5_txq_rate_limit; only swap into txq_ctrl->rl
    after modify_sq succeeds. On failure the old PP context stays intact.
  - Set rte_errno = -ret before returning errors from both the
    disable (tx_rate=0) and enable paths

Patch 7 (testpmd command to query per-queue rate limit):
  - Fix inverted rte_eth_tx_queue_is_valid() return value check:
    was "if (rte_eth_tx_queue_is_valid(...))" (accepts invalid queues),
    changed to "if (rte_eth_tx_queue_is_valid(...) != 0)"

Patch 8 (ethdev getter):
  - Add release note for rte_eth_get_queue_rate_limit() in
    doc/guides/rel_notes/release_26_03.rst

Patch 10 (rate table capacity query):
  - Replace uint16_t seen[RTE_MAX_QUEUES_PER_PORT] (2 KB stack array)
    with heap-allocated mlx5_malloc(priv->txqs_n, ...) + mlx5_free()
  - Add early return when txqs == NULL || txqs_n == 0
  - Document in the API Doxygen that "used" reflects only the queried
    port's queues; other ports on the same device may also consume
    rate table entries
  - Add -ENOMEM to documented return values
  - Add release note for mlx5 per-queue rate limiting

Not addressed in v2 (requires discussion):

  - Patch 8: ethdev API breadth: rte_eth_get_queue_rate_limit()
    is currently only implemented by mlx5. Other PMDs (ixgbe, i40e, ice)
    would need implementations for full consistency. Feedbacks are
    welcomed.

Testing:
  - Build: GCC, no warnings
  - devtools/check-git-log.sh -n 11: 11/11 valid
  - devtools/checkpatches.sh -n 11: 10/11 valid (pre-existing
    stdout warning in testpmd command, not introduced by this series)
  - devtools/check-doc-vs-code.sh: clean
  - devtools/check-meson.py: clean

Hardware tested:
  - ConnectX-6 Dx (packet pacing with MLX5_DATA_RATE)

Vincent Jardin (10):
  doc/nics/mlx5: fix stale packet pacing documentation
  common/mlx5: query packet pacing rate table capabilities
  common/mlx5: extend SQ modify to support rate limit update
  net/mlx5: add per-queue packet pacing infrastructure
  net/mlx5: support per-queue rate limiting
  net/mlx5: add burst pacing devargs
  net/mlx5: add testpmd command to query per-queue rate limit
  ethdev: add getter for per-queue Tx rate limit
  net/mlx5: share pacing rate table entries across queues
  net/mlx5: add rate table capacity query API

 doc/guides/nics/mlx5.rst               | 125 +++++++++++++++-----
 doc/guides/rel_notes/release_26_03.rst |  10 ++
 drivers/common/mlx5/mlx5_devx_cmds.c   |  20 ++++
 drivers/common/mlx5/mlx5_devx_cmds.h   |  14 ++-
 drivers/net/mlx5/mlx5.c                |  46 ++++++++
 drivers/net/mlx5/mlx5.h                |  13 +++
 drivers/net/mlx5/mlx5_testpmd.c        |  93 +++++++++++++++
 drivers/net/mlx5/mlx5_tx.c             | 105 ++++++++++++++++-
 drivers/net/mlx5/mlx5_tx.h             |   5 +
 drivers/net/mlx5/mlx5_txpp.c           |  86 ++++++++++++++
 drivers/net/mlx5/mlx5_txq.c            | 151 +++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h        |  62 ++++++++++
 lib/ethdev/ethdev_driver.h             |   7 ++
 lib/ethdev/rte_ethdev.c                |  28 +++++
 lib/ethdev/rte_ethdev.h                |  24 ++++
 15 files changed, 755 insertions(+), 33 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v2 01/10] doc/nics/mlx5: fix stale packet pacing documentation
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
                     ` (8 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

The Tx Scheduling section incorrectly stated that timestamps can only
be put on the first packet in a burst. The driver actually checks every
packet's ol_flags for the timestamp dynamic flag and inserts a dedicated
WAIT WQE per timestamped packet. The eMPW path also breaks batches when
a timestamped packet is encountered.

Additionally, the ConnectX-7+ wait-on-time capability was only briefly
mentioned in the tx_pp parameter section with no explanation of how it
differs from the ConnectX-6 Dx Clock Queue approach.

This patch:
- Removes the stale first-packet-only limitation
- Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
  ConnectX-7+ wait-on-time) with separate requirements tables
- Clarifies that tx_pp is specific to ConnectX-6 Dx
- Fixes tx_skew applicability to cover both hardware generations
- Updates the Send Scheduling Counters intro to reflect that timestamp
  validation counters also apply to ConnectX-7+ wait-on-time mode

Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
 1 file changed, 78 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 2529c2f4c8..5b097dbc90 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -553,27 +553,32 @@ for an additional list of options shared with other mlx5 drivers.
 
 - ``tx_pp`` parameter [int]
 
+  This parameter applies to **ConnectX-6 Dx** only.
   If a nonzero value is specified the driver creates all necessary internal
-  objects to provide accurate packet send scheduling on mbuf timestamps.
+  objects (Clock Queue and Rearm Queue) to provide accurate packet send
+  scheduling on mbuf timestamps using a cross-channel approach.
   The positive value specifies the scheduling granularity in nanoseconds,
   the packet send will be accurate up to specified digits. The allowed range is
   from 500 to 1 million of nanoseconds. The negative value specifies the module
   of granularity and engages the special test mode the check the schedule rate.
   By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
-  feature is disabled.
+  feature is disabled on ConnectX-6 Dx.
 
-  Starting with ConnectX-7 the capability to schedule traffic directly
-  on timestamp specified in descriptor is provided,
-  no extra objects are needed anymore and scheduling capability
-  is advertised and handled regardless ``tx_pp`` parameter presence.
+  Starting with **ConnectX-7** the hardware provides a native wait-on-time
+  capability that inserts the scheduling delay directly in the WQE descriptor.
+  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter is not
+  required. The driver automatically advertises send scheduling support when
+  the HCA wait-on-time capability is detected. The ``tx_skew`` parameter can
+  still be used on ConnectX-7 and above to compensate for wire delay.
 
 - ``tx_skew`` parameter [int]
 
   The parameter adjusts the send packet scheduling on timestamps and represents
   the average delay between beginning of the transmitting descriptor processing
   by the hardware and appearance of actual packet data on the wire. The value
-  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
-  specified. The default value is zero.
+  should be provided in nanoseconds and applies to both ConnectX-6 Dx
+  (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
+  The default value is zero.
 
 - ``tx_vec_en`` parameter [int]
 
@@ -883,9 +888,13 @@ Send Scheduling Counters
 
 The mlx5 PMD provides a comprehensive set of counters designed for
 debugging and diagnostics related to packet scheduling during transmission.
-These counters are applicable only if the port was configured with the ``tx_pp`` devarg
-and reflect the status of the PMD scheduling infrastructure
-based on Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
+The first group of counters (prefixed ``tx_pp_``) reflects the status of the
+Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx and is
+applicable only if the port was configured with the ``tx_pp`` devarg.
+The timestamp validation counters
+(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
+``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and above
+in wait-on-time mode, without requiring ``tx_pp``.
 
 ``tx_pp_missed_interrupt_errors``
   Indicates that the Rearm Queue interrupt was not serviced on time.
@@ -1960,31 +1969,54 @@ Limitations
 Tx Scheduling
 ~~~~~~~~~~~~~
 
-When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on the packet
-being sent it tries to synchronize the time of packet appearing on
-the wire with the specified packet timestamp. If the specified one
-is in the past it should be ignored, if one is in the distant future
-it should be capped with some reasonable value (in range of seconds).
-These specific cases ("too late" and "distant future") can be optionally
-reported via device xstats to assist applications to detect the
-time-related problems.
-
-The timestamp upper "too-distant-future" limit
-at the moment of invoking the Tx burst routine
-can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on a packet
+being sent it inserts a dedicated WAIT WQE to synchronize the time of the
+packet appearing on the wire with the specified timestamp. Every packet
+in a burst that carries the timestamp dynamic flag is individually
+scheduled -- there is no restriction to the first packet only.
+
+If the specified timestamp is in the past, the packet is sent immediately.
+If it is in the distant future it should be capped with some reasonable
+value (in range of seconds). These specific cases ("too late" and
+"distant future") can be optionally reported via device xstats to assist
+applications to detect time-related problems.
+
+The eMPW (enhanced Multi-Packet Write) data path automatically breaks
+the batch when a timestamped packet is encountered, ensuring each
+scheduled packet gets its own WAIT WQE.
+
+Two hardware mechanisms are supported:
+
+**ConnectX-6 Dx -- Clock Queue (cross-channel)**
+   The driver creates a Clock Queue and a Rearm Queue that together
+   provide a time reference for scheduling. This mode requires the
+   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
+   "too-distant-future" limit at the moment of invoking the Tx burst
+   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
+   by 2^23.
+
+**ConnectX-7 and above -- wait-on-time**
+   The hardware supports placing the scheduling delay directly inside
+   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
+   ``tx_pp`` devarg is **not** required. The driver automatically
+   advertises send scheduling support when the HCA wait-on-time
+   capability is detected.
+
 Please note, for the testpmd txonly mode,
 the limit is deduced from the expression::
 
    (n_tx_descriptors / burst_size + 1) * inter_burst_gap
 
-There is no any packet reordering according timestamps is supposed,
-neither within packet burst, nor between packets, it is an entirely
-application responsibility to generate packets and its timestamps
-in desired order.
+There is no packet reordering according to timestamps,
+neither within a packet burst, nor between packets. It is entirely the
+application's responsibility to generate packets and their timestamps
+in the desired order.
 
 Requirements
 ^^^^^^^^^^^^
 
+ConnectX-6 Dx (Clock Queue mode):
+
 =========  =============
 Minimum    Version
 =========  =============
@@ -1996,20 +2028,35 @@ rdma-core
 DPDK       20.08
 =========  =============
 
+ConnectX-7 and above (wait-on-time mode):
+
+=========  =============
+Minimum    Version
+=========  =============
+hardware   ConnectX-7
+=========  =============
+
 Firmware configuration
 ^^^^^^^^^^^^^^^^^^^^^^
 
 Runtime configuration
 ^^^^^^^^^^^^^^^^^^^^^
 
-To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
-parameter should be specified.
+**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must be
+specified to enable send scheduling on mbuf timestamps.
+
+**ConnectX-7+**: no devarg is required. Send scheduling is automatically
+enabled when the HCA reports the wait-on-time capability.
+
+On both hardware generations the ``tx_skew`` parameter can be used to
+compensate for the delay between descriptor processing and actual wire
+time.
 
 Limitations
 ^^^^^^^^^^^
 
-#. The timestamps can be put only in the first packet
-   in the burst providing the entire burst scheduling.
+#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
+   are capped (see the ``tx_pp`` x 2^23 limit above).
 
 
 .. _mlx5_tx_inline:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 02/10] common/mlx5: query packet pacing rate table capabilities
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
                     ` (7 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Query additional QoS packet pacing capabilities from HCA attributes:
- packet_pacing_burst_bound: HW supports burst_upper_bound parameter
- packet_pacing_typical_size: HW supports typical_packet_size parameter
- packet_pacing_max_rate / packet_pacing_min_rate: rate range in kbps
- packet_pacing_rate_table_size: number of HW rate table entries

These capabilities are needed by the upcoming per-queue rate limiting
feature to validate devarg values and report HW limits.

Supported hardware:
- ConnectX-6 Dx and later (different boards expose different subsets)
- ConnectX-5 reports packet_pacing but not all extended fields
- ConnectX-7/8 report the full capability set
- BlueField-2 and later DPUs also report these capabilities

Not supported:
- ConnectX-4 Lx and earlier (no packet_pacing capability at all)
- ConnectX-5 Ex may not report burst_bound or typical_size

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 11 ++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index d12ebf8487..8f53303fa7 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1244,6 +1244,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 				MLX5_GET(qos_cap, hcattr, packet_pacing);
 		attr->qos.wqe_rate_pp =
 				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
+		attr->qos.packet_pacing_burst_bound =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_burst_bound);
+		attr->qos.packet_pacing_typical_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_typical_size);
+		attr->qos.packet_pacing_max_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_max_rate);
+		attr->qos.packet_pacing_min_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_min_rate);
+		attr->qos.packet_pacing_rate_table_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_rate_table_size);
 		if (attr->qos.flow_meter_aso_sup) {
 			attr->qos.log_meter_aso_granularity =
 				MLX5_GET(qos_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index da50fc686c..930ae2c072 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -67,7 +67,16 @@ struct mlx5_hca_qos_attr {
 	/* Power of the maximum allocation granularity Object. */
 	uint32_t log_max_num_meter_aso:5;
 	/* Power of the maximum number of supported objects. */
-
+	uint32_t packet_pacing_burst_bound:1;
+	/* HW supports burst_upper_bound PP parameter. */
+	uint32_t packet_pacing_typical_size:1;
+	/* HW supports typical_packet_size PP parameter. */
+	uint32_t packet_pacing_max_rate;
+	/* Maximum supported pacing rate in kbps. */
+	uint32_t packet_pacing_min_rate;
+	/* Minimum supported pacing rate in kbps. */
+	uint16_t packet_pacing_rate_table_size;
+	/* Number of entries in the HW rate table. */
 };
 
 struct mlx5_hca_vdpa_attr {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 03/10] common/mlx5: extend SQ modify to support rate limit update
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
                     ` (6 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add rl_update and packet_pacing_rate_limit_index fields to
mlx5_devx_modify_sq_attr. When rl_update is set, the modify SQ
command sets modify_bitmask bit 0 and writes the PP index into
the SQ context, allowing dynamic rate changes on a live RDY SQ
without teardown.

modify_sq_in.modify_bitmask[0x40] bit 0 controls the
packet_pacing_rate_limit_index.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same SQ context field, also supports wait-on-time
- BlueField-2/3: same modify_sq command support

Not supported:
- ConnectX-5: supports packet_pacing but only at SQ creation time,
  dynamic modify_bitmask update may not be supported on all FW
- ConnectX-4 Lx and earlier: no packet_pacing support

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 5 +++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 8f53303fa7..17378e1753 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2129,6 +2129,11 @@ mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
 	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	if (sq_attr->rl_update) {
+		MLX5_SET64(modify_sq_in, in, modify_bitmask, 1);
+		MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+			 sq_attr->packet_pacing_rate_limit_index);
+	}
 	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
 					 out, sizeof(out));
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 930ae2c072..82d949972b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -519,6 +519,9 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t state:4;
 	uint32_t hairpin_peer_rq:24;
 	uint32_t hairpin_peer_vhca:16;
+	uint32_t rl_update:1;
+	/* Set to update packet_pacing_rate_limit_index on a live SQ. */
+	uint32_t packet_pacing_rate_limit_index:16;
 };
 
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
                     ` (2 preceding siblings ...)
  2026-03-10 23:26   ` [PATCH v2 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-11 16:29     ` Stephen Hemminger
  2026-03-10 23:26   ` [PATCH v2 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
                     ` (5 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add mlx5_txq_rate_limit structure and alloc/free helpers for
per-queue data-rate packet pacing. Each Tx queue can now hold
its own PP (Packet Pacing) index allocated via mlx5dv_pp_alloc()
with MLX5_DATA_RATE mode.

mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM
rate_limit field and allocates a dedicated PP index from the HW
rate table. mlx5_txq_free_pp_rate_limit() releases it.

The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is
untouched — it uses MLX5_WQE_RATE for per-packet scheduling,
while per-queue rate limiting uses MLX5_DATA_RATE.

PP index cleanup is added to mlx5_txq_release() to prevent leaks
when queues are destroyed.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same mechanism, plus wait-on-time coexistence
- BlueField-2/3: same PP allocation support

Not supported:
- ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
  not be available on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.h      | 11 ++++++
 drivers/net/mlx5/mlx5_tx.h   |  1 +
 drivers/net/mlx5/mlx5_txpp.c | 75 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c  |  1 +
 4 files changed, 88 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b83dda5652..c48c3072d1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1296,6 +1296,13 @@ struct mlx5_txpp_ts {
 	RTE_ATOMIC(uint64_t) ts;
 };
 
+/* Per-queue rate limit tracking. */
+struct mlx5_txq_rate_limit {
+	void *pp;		/* Packet pacing context from dv_alloc_pp. */
+	uint16_t pp_id;		/* Packet pacing index. */
+	uint32_t rate_mbps;	/* Current rate in Mbps, 0 = disabled. */
+};
+
 /* Tx packet pacing structure. */
 struct mlx5_dev_txpp {
 	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
@@ -2634,6 +2641,10 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev,
 void mlx5_txpp_interrupt_handler(void *cb_arg);
 int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev);
 void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev);
+int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+				 struct mlx5_txq_rate_limit *rl,
+				 uint32_t rate_mbps);
+void mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl);
 
 /* mlx5_rxtx.c */
 
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 0134a2e003..b1b3653247 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -192,6 +192,7 @@ struct mlx5_txq_ctrl {
 	uint16_t dump_file_n; /* Number of dump files. */
 	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	uint32_t hairpin_status; /* Hairpin binding status. */
+	struct mlx5_txq_rate_limit rl; /* Per-queue rate limit. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 0e99b58bde..5469120a83 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -128,6 +128,81 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 #endif
 }
 
+/* Free a per-queue packet pacing index. */
+void
+mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	if (rl->pp) {
+		mlx5_glue->dv_free_pp(rl->pp);
+		rl->pp = NULL;
+		rl->pp_id = 0;
+		rl->rate_mbps = 0;
+	}
+#else
+	RTE_SET_USED(rl);
+#endif
+}
+
+/* Allocate a per-queue packet pacing index for data-rate limiting. */
+int
+mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_txq_rate_limit *rl,
+			     uint32_t rate_mbps)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
+	uint64_t rate_kbps;
+	struct mlx5_hca_qos_attr *qos = &sh->cdev->config.hca_attr.qos;
+
+	MLX5_ASSERT(rate_mbps > 0);
+	rate_kbps = (uint64_t)rate_mbps * 1000;
+	if (qos->packet_pacing_min_rate && rate_kbps < qos->packet_pacing_min_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps below HW minimum (%u kbps).",
+			rate_mbps, qos->packet_pacing_min_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	if (qos->packet_pacing_max_rate && rate_kbps > qos->packet_pacing_max_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps exceeds HW maximum (%u kbps).",
+			rate_mbps, qos->packet_pacing_max_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	memset(&pp, 0, sizeof(pp));
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	rl->pp = mlx5_glue->dv_alloc_pp
+				(sh->cdev->ctx, sizeof(pp), &pp,
+				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
+	if (rl->pp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
+			rate_mbps);
+		rte_errno = errno;
+		return -errno;
+	}
+	rl->pp_id = ((struct mlx5dv_pp *)(rl->pp))->index;
+	if (!rl->pp_id) {
+		DRV_LOG(ERR, "Zero PP index allocated for rate %u Mbps.",
+			rate_mbps);
+		mlx5_txq_free_pp_rate_limit(rl);
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	rl->rate_mbps = rate_mbps;
+	DRV_LOG(DEBUG, "Allocated PP index %u for rate %u Mbps.",
+		rl->pp_id, rate_mbps);
+	return 0;
+#else
+	RTE_SET_USED(sh);
+	RTE_SET_USED(rl);
+	RTE_SET_USED(rate_mbps);
+	DRV_LOG(ERR, "Per-queue rate limit requires rdma-core PP support.");
+	rte_errno = ENOTSUP;
+	return -ENOTSUP;
+#endif
+}
+
 static void
 mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9275efb58e..fa9bb48fd4 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1338,6 +1338,7 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
 	if (rte_atomic_fetch_sub_explicit(&txq_ctrl->refcnt, 1, rte_memory_order_relaxed) - 1 > 1)
 		return 1;
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
 	if (txq_ctrl->obj) {
 		priv->obj_ops.txq_obj_release(txq_ctrl->obj);
 		LIST_REMOVE(txq_ctrl->obj, next);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 05/10] net/mlx5: support per-queue rate limiting
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
                     ` (3 preceding siblings ...)
  2026-03-10 23:26   ` [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
                     ` (4 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback
allocates a per-queue PP index with the requested data rate, then
modifies the live SQ via modify_bitmask bit 0 to apply the new
packet_pacing_rate_limit_index — no queue teardown required.

Setting tx_rate=0 clears the PP index on the SQ and frees it.

Capability check uses hca_attr.qos.packet_pacing directly (not
dev_cap.txpp_en which requires Clock Queue prerequisites). This
allows per-queue rate limiting without the tx_pp devarg.

The callback rejects hairpin queues and queues whose SQ is not
yet created.

testpmd usage (no testpmd changes needed):
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0     # disable

Supported hardware:
- ConnectX-6 Dx: full support, per-SQ rate via HW rate table
- ConnectX-7/8: full support, coexists with wait-on-time scheduling
- BlueField-2/3: full support as DPU rep ports

Not supported:
- ConnectX-5: packet_pacing exists but dynamic SQ modify may not
  work on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.c     |   2 +
 drivers/net/mlx5/mlx5_tx.h  |   2 +
 drivers/net/mlx5/mlx5_txq.c | 103 ++++++++++++++++++++++++++++++++++++
 3 files changed, 107 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4d3bfddc36..c390406ac7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2690,6 +2690,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2783,6 +2784,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.count_aggr_ports = mlx5_count_aggr_ports,
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index b1b3653247..3a37f5bb4d 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index fa9bb48fd4..7863b529f6 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1363,6 +1363,109 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	return 0;
 }
 
+/**
+ * Set per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param tx_rate
+ *   TX rate in Mbps, 0 to disable rate limiting.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	int ret;
+
+	if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
+		DRV_LOG(ERR, "Port %u packet pacing not supported.",
+			dev->data->port_id);
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		DRV_LOG(ERR, "Port %u Tx queue %u out of range.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	if (txq_ctrl->is_hairpin) {
+		DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (tx_rate == 0) {
+		/* Disable rate limiting. */
+		if (txq_ctrl->rl.pp_id == 0)
+			return 0; /* Already disabled. */
+		sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.rl_update = 1;
+		sq_attr.packet_pacing_rate_limit_index = 0;
+		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+		if (ret) {
+			DRV_LOG(ERR,
+				"Port %u Tx queue %u failed to clear rate.",
+				dev->data->port_id, queue_idx);
+			rte_errno = -ret;
+			return ret;
+		}
+		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
+		DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
+			dev->data->port_id, queue_idx);
+		return 0;
+	}
+	/* Allocate a new PP index for the requested rate into a temp. */
+	struct mlx5_txq_rate_limit new_rl = { 0 };
+
+	ret = mlx5_txq_alloc_pp_rate_limit(sh, &new_rl, tx_rate);
+	if (ret)
+		return ret;
+	/* Modify live SQ to use the new PP index. */
+	sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+	sq_attr.state = MLX5_SQC_STATE_RDY;
+	sq_attr.rl_update = 1;
+	sq_attr.packet_pacing_rate_limit_index = new_rl.pp_id;
+	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
+			dev->data->port_id, queue_idx, tx_rate);
+		mlx5_txq_free_pp_rate_limit(&new_rl);
+		rte_errno = -ret;
+		return ret;
+	}
+	/* SQ updated — release old PP context, install new one. */
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
+	txq_ctrl->rl = new_rl;
+	DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx %u).",
+		dev->data->port_id, queue_idx, tx_rate, txq_ctrl->rl.pp_id);
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 06/10] net/mlx5: add burst pacing devargs
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
                     ` (4 preceding siblings ...)
  2026-03-10 23:26   ` [PATCH v2 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
                     ` (3 subsequent siblings)
  9 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Expose burst_upper_bound and typical_packet_size from the PRM
set_pp_rate_limit_context as devargs:
- tx_burst_bound=<bytes>: max burst before rate evaluation kicks in
- tx_typical_pkt_sz=<bytes>: typical packet size for accuracy

These parameters apply to both per-queue rate limiting
(rte_eth_set_queue_rate_limit) and Clock Queue pacing (tx_pp).

Values are validated against HCA capabilities
(packet_pacing_burst_bound and packet_pacing_typical_size).
If the HW does not support them, a warning is logged and the
value is silently zeroed. Test mode still overrides both values.

Shared context mismatch checks ensure all ports on the same
device use the same burst parameters.

Supported hardware:
- ConnectX-6 Dx: burst_upper_bound and typical_packet_size
  reported via packet_pacing_burst_bound / packet_pacing_typical_size
  QoS capability bits
- ConnectX-7/8: full support for both parameters
- BlueField-2/3: same capabilities as host-side ConnectX

Not supported:
- ConnectX-5: may not report burst_bound or typical_size caps
- ConnectX-4 Lx and earlier: no packet_pacing at all

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst     | 16 ++++++++++++++
 drivers/net/mlx5/mlx5.c      | 42 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_txpp.c | 12 +++++++++++
 4 files changed, 72 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 5b097dbc90..2507fae846 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,22 @@ for an additional list of options shared with other mlx5 drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+- ``tx_burst_bound`` parameter [int]
+
+  Specifies the burst upper bound in bytes for packet pacing rate evaluation.
+  When set, the hardware considers this burst size when enforcing the configured
+  rate limit. Only effective when the HCA reports ``packet_pacing_burst_bound``
+  capability. Applies to both per-queue rate limiting
+  (``rte_eth_set_queue_rate_limit()``) and Clock Queue pacing (``tx_pp``).
+  The default value is zero (hardware default).
+
+- ``tx_typical_pkt_sz`` parameter [int]
+
+  Specifies the typical packet size in bytes for packet pacing rate accuracy
+  improvement. Only effective when the HCA reports
+  ``packet_pacing_typical_size`` capability. Applies to both per-queue rate
+  limiting and Clock Queue pacing. The default value is zero (hardware default).
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector with ConnectX-5 NICs and above.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c390406ac7..f399e0d5c9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -119,6 +119,18 @@
  */
 #define MLX5_TX_SKEW "tx_skew"
 
+/*
+ * Device parameter to specify burst upper bound in bytes
+ * for packet pacing rate evaluation.
+ */
+#define MLX5_TX_BURST_BOUND "tx_burst_bound"
+
+/*
+ * Device parameter to specify typical packet size in bytes
+ * for packet pacing rate accuracy improvement.
+ */
+#define MLX5_TX_TYPICAL_PKT_SZ "tx_typical_pkt_sz"
+
 /*
  * Device parameter to enable hardware Tx vector.
  * Deprecated, ignored (no vectorized Tx routines anymore).
@@ -1405,6 +1417,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->tx_pp = tmp;
 	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
 		config->tx_skew = tmp;
+	} else if (strcmp(MLX5_TX_BURST_BOUND, key) == 0) {
+		config->tx_burst_bound = tmp;
+	} else if (strcmp(MLX5_TX_TYPICAL_PKT_SZ, key) == 0) {
+		config->tx_typical_pkt_sz = tmp;
 	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
 		config->l3_vxlan_en = !!tmp;
 	} else if (strcmp(MLX5_VF_NL_EN, key) == 0) {
@@ -1518,8 +1534,10 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 				struct mlx5_sh_config *config)
 {
 	const char **params = (const char *[]){
+		MLX5_TX_BURST_BOUND,
 		MLX5_TX_PP,
 		MLX5_TX_SKEW,
+		MLX5_TX_TYPICAL_PKT_SZ,
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_ESW_EN,
@@ -1626,6 +1644,18 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(WARNING,
 			"\"tx_skew\" doesn't affect without \"tx_pp\".");
 	}
+	if (config->tx_burst_bound &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_burst_bound) {
+		DRV_LOG(WARNING,
+			"HW does not support burst_upper_bound, ignoring.");
+		config->tx_burst_bound = 0;
+	}
+	if (config->tx_typical_pkt_sz &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_typical_size) {
+		DRV_LOG(WARNING,
+			"HW does not support typical_packet_size, ignoring.");
+		config->tx_typical_pkt_sz = 0;
+	}
 	/* Check for LRO support. */
 	if (mlx5_devx_obj_ops_en(sh) && sh->cdev->config.hca_attr.lro_cap) {
 		/* TBD check tunnel lro caps. */
@@ -3260,6 +3290,18 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.tx_burst_bound != config->tx_burst_bound) {
+		DRV_LOG(ERR, "\"tx_burst_bound\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
+	if (sh->config.tx_typical_pkt_sz != config->tx_typical_pkt_sz) {
+		DRV_LOG(ERR, "\"tx_typical_pkt_sz\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
 		DRV_LOG(ERR, "\"TxQ memory alignment\" "
 			"configuration mismatch for shared %s context. %u - %u",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c48c3072d1..a8d71482ac 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -382,6 +382,8 @@ struct mlx5_port_config {
 struct mlx5_sh_config {
 	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
 	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
+	uint32_t tx_burst_bound; /* Burst upper bound in bytes, 0 = default. */
+	uint32_t tx_typical_pkt_sz; /* Typical packet size in bytes, 0 = default. */
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 5469120a83..b87565778e 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -88,6 +88,12 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 	rate = NS_PER_S / sh->txpp.tick;
 	if (rate * sh->txpp.tick != NS_PER_S)
 		DRV_LOG(WARNING, "Packet pacing frequency is not precise.");
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	if (sh->txpp.test) {
 		uint32_t len;
 
@@ -172,6 +178,12 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 	memset(&pp, 0, sizeof(pp));
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	rl->pp = mlx5_glue->dv_alloc_pp
 				(sh->cdev->ctx, sizeof(pp), &pp,
 				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
                     ` (5 preceding siblings ...)
  2026-03-10 23:26   ` [PATCH v2 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-11 16:31     ` Stephen Hemminger
  2026-03-10 23:26   ` [PATCH v2 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
                     ` (2 subsequent siblings)
  9 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add a new testpmd command to display the per-queue packet pacing
rate limit state, including the PP index from both driver state
and FW SQ context readback:

  testpmd> mlx5 port <port_id> txq <queue_id> rate show

This helps verify that the FW actually applied the PP index to
the SQ after setting a per-queue rate limit.

Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that
queries txq_ctrl->rl for driver state and mlx5_devx_cmd_query_sq()
for the FW packet_pacing_rate_limit_index field.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_testpmd.c | 93 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c      | 40 +++++++++++++-
 drivers/net/mlx5/mlx5_txq.c     | 19 +++++--
 drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++++++++++
 4 files changed, 178 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 1bb5a89559..fd3efecc5d 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -1365,6 +1365,94 @@ cmdline_parse_inst_t mlx5_cmd_dump_rq_context_options = {
 	}
 };
 
+/* Show per-queue rate limit PP index for a given port/queue */
+struct mlx5_cmd_show_rate_limit_options {
+	cmdline_fixed_string_t mlx5;
+	cmdline_fixed_string_t port;
+	portid_t port_id;
+	cmdline_fixed_string_t txq;
+	queueid_t queue_id;
+	cmdline_fixed_string_t rate;
+	cmdline_fixed_string_t show;
+};
+
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_mlx5 =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 mlx5, "mlx5");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 port, "port");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_port_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      port_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_txq =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 txq, "txq");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_queue_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      queue_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 rate, "rate");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 show, "show");
+
+static void
+mlx5_cmd_show_rate_limit_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct mlx5_cmd_show_rate_limit_options *res = parsed_result;
+	struct rte_pmd_mlx5_txq_rate_limit_info info;
+	int ret;
+
+	ret = rte_pmd_mlx5_txq_rate_limit_query(res->port_id, res->queue_id,
+						 &info);
+	switch (ret) {
+	case 0:
+		break;
+	case -ENODEV:
+		fprintf(stderr, "invalid port_id %u\n", res->port_id);
+		return;
+	case -EINVAL:
+		fprintf(stderr, "invalid queue index (%u), out of range\n",
+			res->queue_id);
+		return;
+	case -EIO:
+		fprintf(stderr, "failed to query SQ context\n");
+		return;
+	default:
+		fprintf(stderr, "query failed (%d)\n", ret);
+		return;
+	}
+	fprintf(stdout, "Port %u Txq %u rate limit info:\n",
+		res->port_id, res->queue_id);
+	if (info.rate_mbps > 0)
+		fprintf(stdout, "  Configured rate: %u Mbps\n",
+			info.rate_mbps);
+	else
+		fprintf(stdout, "  Configured rate: disabled\n");
+	fprintf(stdout, "  PP index (driver): %u\n", info.pp_index);
+	fprintf(stdout, "  PP index (FW readback): %u\n", info.fw_pp_index);
+}
+
+cmdline_parse_inst_t mlx5_cmd_show_rate_limit = {
+	.f = mlx5_cmd_show_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "mlx5 port <port_id> txq <queue_id> rate show",
+	.tokens = {
+		(void *)&mlx5_cmd_show_rate_limit_mlx5,
+		(void *)&mlx5_cmd_show_rate_limit_port,
+		(void *)&mlx5_cmd_show_rate_limit_port_id,
+		(void *)&mlx5_cmd_show_rate_limit_txq,
+		(void *)&mlx5_cmd_show_rate_limit_queue_id,
+		(void *)&mlx5_cmd_show_rate_limit_rate,
+		(void *)&mlx5_cmd_show_rate_limit_show,
+		NULL,
+	}
+};
+
 static struct testpmd_driver_commands mlx5_driver_cmds = {
 	.commands = {
 		{
@@ -1440,6 +1528,11 @@ static struct testpmd_driver_commands mlx5_driver_cmds = {
 			.help = "mlx5 port (port_id) queue (queue_id) dump rq_context (file_name)\n"
 				"    Dump mlx5 RQ Context\n\n",
 		},
+		{
+			.ctx = &mlx5_cmd_show_rate_limit,
+			.help = "mlx5 port (port_id) txq (queue_id) rate show\n"
+				"    Show per-queue rate limit PP index\n\n",
+		},
 		{
 			.ctx = NULL,
 		},
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 8085b5c306..fa57d3ef98 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -800,7 +800,7 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	if (!rte_eth_dev_is_valid_port(port_id))
 		return -ENODEV;
 
-	if (rte_eth_tx_queue_is_valid(port_id, queue_id))
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
 		return -EINVAL;
 
 	fd = fopen(path, "w");
@@ -848,3 +848,41 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	fclose(fd);
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_txq_rate_limit_query, 26.07)
+int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				       struct rte_pmd_mlx5_txq_rate_limit_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	struct mlx5_txq_data *txq_data;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	uint32_t sq_out[MLX5_ST_SZ_DW(query_sq_out)] = {0};
+	int ret;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
+		return -EINVAL;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	txq_data = (*priv->txqs)[queue_id];
+	txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	info->rate_mbps = txq_ctrl->rl.rate_mbps;
+	info->pp_index = txq_ctrl->rl.pp_id;
+	if (txq_ctrl->obj == NULL) {
+		info->fw_pp_index = 0;
+		return 0;
+	}
+	ret = mlx5_devx_cmd_query_sq(txq_ctrl->obj->sq_obj.sq,
+				     sq_out, sizeof(sq_out));
+	if (ret)
+		return -EIO;
+	info->fw_pp_index = MLX5_GET(sqc,
+				     MLX5_ADDR_OF(query_sq_out, sq_out,
+						  sq_context),
+				     packet_pacing_rate_limit_index);
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 7863b529f6..155d544434 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1412,7 +1412,20 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {
+	if (txq_ctrl->obj == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * For non-hairpin queues the SQ DevX object lives in
+	 * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
+	 * queues use obj->sq directly.  These are different members
+	 * of a union inside mlx5_txq_obj.
+	 */
+	struct mlx5_devx_obj *sq_devx = txq_ctrl->obj->sq_obj.sq;
+	if (sq_devx == NULL) {
 		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
 			dev->data->port_id, queue_idx);
 		rte_errno = EINVAL;
@@ -1426,7 +1439,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 		sq_attr.state = MLX5_SQC_STATE_RDY;
 		sq_attr.rl_update = 1;
 		sq_attr.packet_pacing_rate_limit_index = 0;
-		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+		ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Port %u Tx queue %u failed to clear rate.",
@@ -1450,7 +1463,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	sq_attr.state = MLX5_SQC_STATE_RDY;
 	sq_attr.rl_update = 1;
 	sq_attr.packet_pacing_rate_limit_index = new_rl.pp_id;
-	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+	ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
 			dev->data->port_id, queue_idx, tx_rate);
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 7acfdae97d..698d7d2032 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -420,6 +420,36 @@ __rte_experimental
 int
 rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const char *filename);
 
+/**
+ * Per-queue rate limit information.
+ */
+struct rte_pmd_mlx5_txq_rate_limit_info {
+	uint32_t rate_mbps;	/**< Configured rate in Mbps, 0 = disabled. */
+	uint16_t pp_index;	/**< PP index from driver state. */
+	uint16_t fw_pp_index;	/**< PP index read back from FW SQ context. */
+};
+
+/**
+ * Query per-queue rate limit state for a given Tx queue.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[in] queue_id
+ *   Tx queue ID.
+ * @param[out] info
+ *   Rate limit information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: invalid queue_id.
+ *   - -EIO: FW query failed.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
                     ` (6 preceding siblings ...)
  2026-03-10 23:26   ` [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-11 16:17     ` Stephen Hemminger
  2026-03-11 16:26     ` Stephen Hemminger
  2026-03-10 23:26   ` [PATCH v2 09/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
  9 siblings, 2 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

The existing rte_eth_set_queue_rate_limit() API allows setting a
per-queue Tx rate but provides no way to read it back. Applications
such as grout are forced to maintain a shadow copy of the rate to
be able to report it.

Add rte_eth_get_queue_rate_limit() as the symmetric getter, following
the established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
rte_eth_dev_set_vlan_offload/get_vlan_offload).

This adds:
- eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
- rte_eth_get_queue_rate_limit() public experimental API (26.07)
- mlx5 PMD implementation reading from the existing per-queue
  rate_mbps tracking field

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/rel_notes/release_26_03.rst |  5 ++++
 drivers/net/mlx5/mlx5.c                |  2 ++
 drivers/net/mlx5/mlx5_tx.h             |  2 ++
 drivers/net/mlx5/mlx5_txq.c            | 34 ++++++++++++++++++++++++++
 lib/ethdev/ethdev_driver.h             |  7 ++++++
 lib/ethdev/rte_ethdev.c                | 28 +++++++++++++++++++++
 lib/ethdev/rte_ethdev.h                | 24 ++++++++++++++++++
 7 files changed, 102 insertions(+)

diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index b4499ec066..5afb2fd6d9 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -55,6 +55,11 @@ New Features
      Also, make sure to start the actual text at the margin.
      =======================================================
 
+* **Added per-queue Tx rate limit getter to ethdev.**
+
+  Added ``rte_eth_get_queue_rate_limit()`` experimental API
+  to retrieve the current per-queue Tx rate limit configured on a device.
+
 * **Added custom memory allocation hooks in ACL library.**
 
   Added a hook API mechanism
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f399e0d5c9..6e21ed31f3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2721,6 +2721,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2815,6 +2816,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 3a37f5bb4d..46e199d93e 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -224,6 +224,8 @@ int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
 int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 			      uint32_t tx_rate);
+int mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t *tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 155d544434..d5dab4f14a 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1479,6 +1479,40 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 }
 
+/**
+ * Get per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param[out] tx_rate
+ *   Pointer to store the TX rate in Mbps, 0 if rate limiting is disabled.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t *tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *txq_ctrl;
+
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	*tx_rate = txq_ctrl->rl.rate_mbps;
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 1255cd6f2c..0f336f9567 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -762,6 +762,11 @@ typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
 				uint16_t queue_idx,
 				uint32_t tx_rate);
 
+/** @internal Get queue Tx rate. */
+typedef int (*eth_get_queue_rate_limit_t)(struct rte_eth_dev *dev,
+				uint16_t queue_idx,
+				uint32_t *tx_rate);
+
 /** @internal Add tunneling UDP port. */
 typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *tunnel_udp);
@@ -1522,6 +1527,8 @@ struct eth_dev_ops {
 
 	/** Set queue rate limit */
 	eth_set_queue_rate_limit_t set_queue_rate_limit;
+	/** Get queue rate limit */
+	eth_get_queue_rate_limit_t get_queue_rate_limit;
 
 	/** Configure RSS hash protocols and hashing key */
 	rss_hash_update_t          rss_hash_update;
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 2edc7a362e..c6ad399033 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5694,6 +5694,34 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	return ret;
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_get_queue_rate_limit, 26.07)
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+					uint32_t *tx_rate)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (tx_rate == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: NULL tx_rate pointer",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: invalid queue ID=%u",
+			port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	if (dev->dev_ops->get_queue_rate_limit == NULL)
+		return -ENOTSUP;
+	return eth_err(port_id, dev->dev_ops->get_queue_rate_limit(dev, queue_idx, tx_rate));
+}
+
 RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_rx_avail_thresh_set, 22.07)
 int rte_eth_rx_avail_thresh_set(uint16_t port_id, uint16_t queue_id,
 			       uint8_t avail_thresh)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 0d8e2d0236..e525217b77 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4817,6 +4817,30 @@ int rte_eth_dev_uc_all_hash_table_set(uint16_t port_id, uint8_t on);
 int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 			uint32_t tx_rate);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Get the rate limitation for a queue on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_idx
+ *   The queue ID.
+ * @param[out] tx_rate
+ *   A pointer to retrieve the Tx rate in Mbps.
+ *   0 means rate limiting is disabled.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support this feature.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+			uint32_t *tx_rate);
+
 /**
  * Configuration of Receive Side Scaling hash computation of Ethernet device.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 09/10] net/mlx5: share pacing rate table entries across queues
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
                     ` (7 preceding siblings ...)
  2026-03-10 23:26   ` [PATCH v2 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-10 23:26   ` [PATCH v2 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
  9 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Allocating PP contexts with MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX
forces one HW rate table entry per TX queue. On ConnectX-6 Dx the
rate table is small (typically 128 entries), so setting the same rate
on many queues exhausts it quickly and returns ENOSPC.

Without the dedicated flag, the kernel mlx5 driver shares a single
rate table entry across all PP contexts with identical parameters
(rate, burst, packet size) using internal refcounting. Each queue
still gets its own PP handle for proper cleanup, but the underlying
HW index is shared.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_txpp.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index b87565778e..a377406e66 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -185,8 +185,7 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 		MLX5_SET(set_pp_rate_limit_context, &pp,
 			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	rl->pp = mlx5_glue->dv_alloc_pp
-				(sh->cdev->ctx, sizeof(pp), &pp,
-				 MLX5DV_PP_ALLOC_FLAGS_DEDICATED_INDEX);
+				(sh->cdev->ctx, sizeof(pp), &pp, 0);
 	if (rl->pp == NULL) {
 		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
 			rate_mbps);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v2 10/10] net/mlx5: add rate table capacity query API
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
                     ` (8 preceding siblings ...)
  2026-03-10 23:26   ` [PATCH v2 09/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
@ 2026-03-10 23:26   ` Vincent Jardin
  2026-03-11 16:31     ` Stephen Hemminger
  2026-03-11 16:35     ` Stephen Hemminger
  9 siblings, 2 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-10 23:26 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet
pacing rate table size and how many entries are currently in use.

The total comes from the HCA QoS capability
packet_pacing_rate_table_size. The used count is derived by
collecting unique non-zero PP indices across all TX queues.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/rel_notes/release_26_03.rst |  5 ++
 drivers/net/mlx5/mlx5_tx.c             | 65 ++++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h        | 32 +++++++++++++
 3 files changed, 102 insertions(+)

diff --git a/doc/guides/rel_notes/release_26_03.rst b/doc/guides/rel_notes/release_26_03.rst
index 5afb2fd6d9..44ff897b65 100644
--- a/doc/guides/rel_notes/release_26_03.rst
+++ b/doc/guides/rel_notes/release_26_03.rst
@@ -86,6 +86,11 @@ New Features
 
   * Added out-of-place support for CN20K SoC.
 
+* **Updated NVIDIA mlx5 net driver.**
+
+  * Added per-queue Tx rate limiting using hardware packet pacing.
+  * Added PMD-specific API to query per-queue rate limit and rate table capacity.
+
 * **Updated ZTE zxdh ethernet driver.**
 
   * Added support for modifying queue depth.
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index fa57d3ef98..cf3f93e635 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -19,6 +19,7 @@
 
 #include <mlx5_prm.h>
 #include <mlx5_common.h>
+#include <mlx5_malloc.h>
 
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -886,3 +887,67 @@ int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				     packet_pacing_rate_limit_index);
 	return 0;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query, 26.07)
+int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				     struct rte_pmd_mlx5_pp_rate_table_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint16_t used = 0;
+	uint16_t *seen;
+	unsigned int i;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	info->total = priv->sh->cdev->config.hca_attr.qos
+			.packet_pacing_rate_table_size;
+	if (priv->txqs == NULL || priv->txqs_n == 0) {
+		info->used = 0;
+		return 0;
+	}
+	seen = mlx5_malloc(MLX5_MEM_ZERO, priv->txqs_n * sizeof(*seen),
+			   0, SOCKET_ID_ANY);
+	if (seen == NULL)
+		return -ENOMEM;
+	/*
+	 * Count unique non-zero PP indices across this port's TX queues.
+	 * Note: the count reflects only queues on this port; other ports
+	 * sharing the same device may also consume rate table entries.
+	 */
+	for (i = 0; i < priv->txqs_n; i++) {
+		struct mlx5_txq_data *txq_data;
+		struct mlx5_txq_ctrl *txq_ctrl;
+		uint16_t pp_id;
+		uint16_t j;
+		bool dup;
+
+		if ((*priv->txqs)[i] == NULL)
+			continue;
+		txq_data = (*priv->txqs)[i];
+		txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+		pp_id = txq_ctrl->rl.pp_id;
+		if (pp_id == 0)
+			continue;
+		dup = false;
+		for (j = 0; j < used; j++) {
+			if (seen[j] == pp_id) {
+				dup = true;
+				break;
+			}
+		}
+		if (!dup)
+			seen[used++] = pp_id;
+	}
+	mlx5_free(seen);
+	info->used = used;
+	return 0;
+}
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 698d7d2032..f7970dd7fb 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -450,6 +450,38 @@ int
 rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
 
+/**
+ * Packet pacing rate table capacity information.
+ */
+struct rte_pmd_mlx5_pp_rate_table_info {
+	uint16_t total;		/**< Total HW rate table entries. */
+	uint16_t used;		/**< Currently allocated entries. */
+};
+
+/**
+ * Query packet pacing rate table capacity.
+ *
+ * The ``used`` count reflects only the queried port's TX queues.
+ * Other ports sharing the same physical device may also consume
+ * rate table entries that are not included in this count.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[out] info
+ *   Rate table capacity information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: info is NULL.
+ *   - -ENOTSUP: packet pacing not supported.
+ *   - -ENOMEM: allocation failure.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				 struct rte_pmd_mlx5_pp_rate_table_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* RE: [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation
  2026-03-10  9:20 ` [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
@ 2026-03-11 12:26   ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-11 12:26 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad

Nice clarification, Vincent.
Thank you.

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Tuesday, March 10, 2026 11:20 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation
> 
> The Tx Scheduling section incorrectly stated that timestamps can only be put on
> the first packet in a burst. The driver actually checks every packet's ol_flags for
> the timestamp dynamic flag and inserts a dedicated WAIT WQE per
> timestamped packet. The eMPW path also breaks batches when a timestamped
> packet is encountered.
> 
> Additionally, the ConnectX-7+ wait-on-time capability was only briefly
> mentioned in the tx_pp parameter section with no explanation of how it differs
> from the ConnectX-6 Dx Clock Queue approach.
> 
> This patch:
> - Removes the stale first-packet-only limitation
> - Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
>   ConnectX-7+ wait-on-time) with separate requirements tables
> - Clarifies that tx_pp is specific to ConnectX-6 Dx
> - Fixes tx_skew applicability to cover both hardware generations
> - Updates the Send Scheduling Counters intro to reflect that timestamp
>   validation counters also apply to ConnectX-7+ wait-on-time mode
> 
> Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
>  1 file changed, 78 insertions(+), 31 deletions(-)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 2529c2f4c8..5b097dbc90 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -553,27 +553,32 @@ for an additional list of options shared with other
> mlx5 drivers.
> 
>  - ``tx_pp`` parameter [int]
> 
> +  This parameter applies to **ConnectX-6 Dx** only.
>    If a nonzero value is specified the driver creates all necessary internal
> -  objects to provide accurate packet send scheduling on mbuf timestamps.
> +  objects (Clock Queue and Rearm Queue) to provide accurate packet send
> + scheduling on mbuf timestamps using a cross-channel approach.
>    The positive value specifies the scheduling granularity in nanoseconds,
>    the packet send will be accurate up to specified digits. The allowed range is
>    from 500 to 1 million of nanoseconds. The negative value specifies the module
>    of granularity and engages the special test mode the check the schedule rate.
>    By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
> -  feature is disabled.
> +  feature is disabled on ConnectX-6 Dx.
> 
> -  Starting with ConnectX-7 the capability to schedule traffic directly
> -  on timestamp specified in descriptor is provided,
> -  no extra objects are needed anymore and scheduling capability
> -  is advertised and handled regardless ``tx_pp`` parameter presence.
> +  Starting with **ConnectX-7** the hardware provides a native
> + wait-on-time  capability that inserts the scheduling delay directly in the WQE
> descriptor.
> +  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter
> + is not  required. The driver automatically advertises send scheduling
> + support when  the HCA wait-on-time capability is detected. The
> + ``tx_skew`` parameter can  still be used on ConnectX-7 and above to
> compensate for wire delay.
> 
>  - ``tx_skew`` parameter [int]
> 
>    The parameter adjusts the send packet scheduling on timestamps and
> represents
>    the average delay between beginning of the transmitting descriptor processing
>    by the hardware and appearance of actual packet data on the wire. The value
> -  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
> -  specified. The default value is zero.
> +  should be provided in nanoseconds and applies to both ConnectX-6 Dx
> + (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
> +  The default value is zero.
> 
>  - ``tx_vec_en`` parameter [int]
> 
> @@ -883,9 +888,13 @@ Send Scheduling Counters
> 
>  The mlx5 PMD provides a comprehensive set of counters designed for
> debugging and diagnostics related to packet scheduling during transmission.
> -These counters are applicable only if the port was configured with the ``tx_pp``
> devarg -and reflect the status of the PMD scheduling infrastructure -based on
> Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
> +The first group of counters (prefixed ``tx_pp_``) reflects the status
> +of the Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx
> +and is applicable only if the port was configured with the ``tx_pp`` devarg.
> +The timestamp validation counters
> +(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
> +``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and
> +above in wait-on-time mode, without requiring ``tx_pp``.
> 
>  ``tx_pp_missed_interrupt_errors``
>    Indicates that the Rearm Queue interrupt was not serviced on time.
> @@ -1960,31 +1969,54 @@ Limitations
>  Tx Scheduling
>  ~~~~~~~~~~~~~
> 
> -When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on
> the packet -being sent it tries to synchronize the time of packet appearing on -
> the wire with the specified packet timestamp. If the specified one -is in the past it
> should be ignored, if one is in the distant future -it should be capped with some
> reasonable value (in range of seconds).
> -These specific cases ("too late" and "distant future") can be optionally -
> reported via device xstats to assist applications to detect the -time-related
> problems.
> -
> -The timestamp upper "too-distant-future" limit -at the moment of invoking the
> Tx burst routine -can be estimated as ``tx_pp`` option (in nanoseconds)
> multiplied by 2^23.
> +When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on
> a
> +packet being sent it inserts a dedicated WAIT WQE to synchronize the
> +time of the packet appearing on the wire with the specified timestamp.
> +Every packet in a burst that carries the timestamp dynamic flag is
> +individually scheduled -- there is no restriction to the first packet only.
> +
> +If the specified timestamp is in the past, the packet is sent immediately.
> +If it is in the distant future it should be capped with some reasonable
> +value (in range of seconds). These specific cases ("too late" and
> +"distant future") can be optionally reported via device xstats to
> +assist applications to detect time-related problems.
> +
> +The eMPW (enhanced Multi-Packet Write) data path automatically breaks
> +the batch when a timestamped packet is encountered, ensuring each
> +scheduled packet gets its own WAIT WQE.
> +
> +Two hardware mechanisms are supported:
> +
> +**ConnectX-6 Dx -- Clock Queue (cross-channel)**
> +   The driver creates a Clock Queue and a Rearm Queue that together
> +   provide a time reference for scheduling. This mode requires the
> +   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
> +   "too-distant-future" limit at the moment of invoking the Tx burst
> +   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
> +   by 2^23.
> +
> +**ConnectX-7 and above -- wait-on-time**
> +   The hardware supports placing the scheduling delay directly inside
> +   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
> +   ``tx_pp`` devarg is **not** required. The driver automatically
> +   advertises send scheduling support when the HCA wait-on-time
> +   capability is detected.
> +
>  Please note, for the testpmd txonly mode,  the limit is deduced from the
> expression::
> 
>     (n_tx_descriptors / burst_size + 1) * inter_burst_gap
> 
> -There is no any packet reordering according timestamps is supposed, -neither
> within packet burst, nor between packets, it is an entirely -application
> responsibility to generate packets and its timestamps -in desired order.
> +There is no packet reordering according to timestamps, neither within a
> +packet burst, nor between packets. It is entirely the application's
> +responsibility to generate packets and their timestamps in the desired
> +order.
> 
>  Requirements
>  ^^^^^^^^^^^^
> 
> +ConnectX-6 Dx (Clock Queue mode):
> +
>  =========  =============
>  Minimum    Version
>  =========  =============
> @@ -1996,20 +2028,35 @@ rdma-core
>  DPDK       20.08
>  =========  =============
> 
> +ConnectX-7 and above (wait-on-time mode):
> +
> +=========  =============
> +Minimum    Version
> +=========  =============
> +hardware   ConnectX-7
> +=========  =============
> +
>  Firmware configuration
>  ^^^^^^^^^^^^^^^^^^^^^^
> 
>  Runtime configuration
>  ^^^^^^^^^^^^^^^^^^^^^
> 
> -To provide the packet send scheduling on mbuf timestamps the ``tx_pp`` -
> parameter should be specified.
> +**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must
> +be specified to enable send scheduling on mbuf timestamps.
> +
> +**ConnectX-7+**: no devarg is required. Send scheduling is
> +automatically enabled when the HCA reports the wait-on-time capability.
> +
> +On both hardware generations the ``tx_skew`` parameter can be used to
> +compensate for the delay between descriptor processing and actual wire
> +time.
> 
>  Limitations
>  ^^^^^^^^^^^
> 
> -#. The timestamps can be put only in the first packet
> -   in the burst providing the entire burst scheduling.
> +#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
> +   are capped (see the ``tx_pp`` x 2^23 limit above).
> 
> 
>  .. _mlx5_tx_inline:
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-10 23:26   ` [PATCH v2 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-11 16:17     ` Stephen Hemminger
  2026-03-11 16:26     ` Stephen Hemminger
  1 sibling, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-11 16:17 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Wed, 11 Mar 2026 00:26:51 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> +int
> +mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			  uint32_t *tx_rate)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +
> +	if (queue_idx >= dev->data->nb_tx_queues) {
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}

This check should be done in ethdev not driver; to be consistent
with other ethdev API's.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-10 23:26   ` [PATCH v2 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
  2026-03-11 16:17     ` Stephen Hemminger
@ 2026-03-11 16:26     ` Stephen Hemminger
  2026-03-12 15:54       ` Vincent Jardin
  1 sibling, 1 reply; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-11 16:26 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Wed, 11 Mar 2026 00:26:51 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> The existing rte_eth_set_queue_rate_limit() API allows setting a
> per-queue Tx rate but provides no way to read it back. Applications
> such as grout are forced to maintain a shadow copy of the rate to
> be able to report it.
> 
> Add rte_eth_get_queue_rate_limit() as the symmetric getter, following
> the established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
> rte_eth_dev_set_vlan_offload/get_vlan_offload).
> 
> This adds:
> - eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
> - rte_eth_get_queue_rate_limit() public experimental API (26.07)
> - mlx5 PMD implementation reading from the existing per-queue
>   rate_mbps tracking field
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>

A couple of observations about this new API.

1. The queue index validation is duplicated in both ethdev and mlx5
driver. Since this is a new API, the ethdev layer should be the single place
that validates queue_idx < nb_tx_queues and rejects NULL output
pointers — the driver callbacks should be able to assume these
preconditions are met. The same applies to the existing
mlx5_set_queue_rate_limit() which re-checks bounds that
rte_eth_set_queue_rate_limit() already validates. Remove the redundant
checks from the driver.

2. The new rte_eth_get_queue_rate_limit() API uses uint32_t for the
rate. Since this is a brand-new experimental API with no backward
compatibility constraint, use uint64_t for the rate value to
accommodate future devices with rates exceeding 4 Gbps. This applies to
both the getter and the setter callback — adding a new getter that
already needs a type change next year defeats the purpose of getting
the API right now.

Surprising to me, that AI review found nits, but totally missed
several architectural issues.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-10 23:26   ` [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-11 16:29     ` Stephen Hemminger
  0 siblings, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-11 16:29 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Wed, 11 Mar 2026 00:26:47 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> Add mlx5_txq_rate_limit structure and alloc/free helpers for
> per-queue data-rate packet pacing. Each Tx queue can now hold
> its own PP (Packet Pacing) index allocated via mlx5dv_pp_alloc()
> with MLX5_DATA_RATE mode.
> 
> mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM
> rate_limit field and allocates a dedicated PP index from the HW
> rate table. mlx5_txq_free_pp_rate_limit() releases it.
> 
> The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is
> untouched — it uses MLX5_WQE_RATE for per-packet scheduling,
> while per-queue rate limiting uses MLX5_DATA_RATE.
> 
> PP index cleanup is added to mlx5_txq_release() to prevent leaks
> when queues are destroyed.
> 
> Supported hardware:
> - ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
> - ConnectX-7/8: same mechanism, plus wait-on-time coexistence
> - BlueField-2/3: same PP allocation support
> 
> Not supported:
> - ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
>   not be available on all firmware versions
> - ConnectX-4 Lx and earlier: no packet_pacing capability
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>

For better type safety, void * pointers should be avoided.
In this patch, struct mlx5_txq_rate_limit stores the PP context as void *pp.
This opaque pointer hides the type and makes the code harder for static analysis.

Use the actual type (struct mlx5dv_pp *) behind the HAVE_MLX5DV_PP_ALLOC guard
so the cast at ((struct mlx5dv_pp *)(rl->pp))->index becomes a direct member access.

Minor nit:
The line break in the dv_alloc_pp call hurts readability — the function call, 
its arguments, and the NULL check would be clearer on fewer lines:


rl->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp), &pp, 0);
if (rl->pp == NULL) {

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-10 23:26   ` [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-11 16:31     ` Stephen Hemminger
  0 siblings, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-11 16:31 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Wed, 11 Mar 2026 00:26:50 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> Add a new testpmd command to display the per-queue packet pacing
> rate limit state, including the PP index from both driver state
> and FW SQ context readback:
> 
>   testpmd> mlx5 port <port_id> txq <queue_id> rate show  
> 
> This helps verify that the FW actually applied the PP index to
> the SQ after setting a per-queue rate limit.
> 
> Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that
> queries txq_ctrl->rl for driver state and mlx5_devx_cmd_query_sq()
> for the FW packet_pacing_rate_limit_index field.
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---

This was obvious to me, but not AI...


The testpmd query command is implemented as an mlx5-specific 
 mlx5 port <id> txq <id> rate show 
command, but per-queue rate limiting is an
ethdev-level feature (rte_eth_set_queue_rate_limit exists for any PMD).
The query side should also be generic — add a testpmd command like
  show port <id> txq <id> rate
that calls the new
rte_eth_get_queue_rate_limit(). 


Once again, AI sees the trees not the forest!

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 10/10] net/mlx5: add rate table capacity query API
  2026-03-10 23:26   ` [PATCH v2 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
@ 2026-03-11 16:31     ` Stephen Hemminger
  2026-03-11 16:35     ` Stephen Hemminger
  1 sibling, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-11 16:31 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Wed, 11 Mar 2026 00:26:53 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet
> pacing rate table size and how many entries are currently in use.
> 
> The total comes from the HCA QoS capability
> packet_pacing_rate_table_size. The used count is derived by
> collecting unique non-zero PP indices across all TX queues.
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>

This will be considered for 26.07 so need to hold off until those
release notes are setup

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 10/10] net/mlx5: add rate table capacity query API
  2026-03-10 23:26   ` [PATCH v2 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
  2026-03-11 16:31     ` Stephen Hemminger
@ 2026-03-11 16:35     ` Stephen Hemminger
  2026-03-12 15:05       ` Vincent Jardin
  1 sibling, 1 reply; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-11 16:35 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Wed, 11 Mar 2026 00:26:53 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query, 26.07)
> +int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
> +				     struct rte_pmd_mlx5_pp_rate_table_info *info)
> +{
> +	struct rte_eth_dev *dev;
> +	struct mlx5_priv *priv;
> +	uint16_t used = 0;
> +	uint16_t *seen;
> +	unsigned int i;
> +
> +	if (info == NULL)
> +		return -EINVAL;

Prefer NULL checks in ethdev layer

> +	if (!rte_eth_dev_is_valid_port(port_id))
> +		return -ENODEV;

Ditto checks for port_id should be at ethdev

> +	dev = &rte_eth_devices[port_id];
> +	priv = dev->data->dev_private;
> +	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
> +		rte_errno = ENOTSUP;
> +		return -ENOTSUP;
> +	}
> +	info->total = priv->sh->cdev->config.hca_attr.qos
> +			.packet_pacing_rate_table_size;

Since DPDK allows 100 character lines now, don't need line break

> +	if (priv->txqs == NULL || priv->txqs_n == 0) {
> +		info->used = 0;
> +		return 0;
> +	}
> +	seen = mlx5_malloc(MLX5_MEM_ZERO, priv->txqs_n * sizeof(*seen),
> +			   0, SOCKET_ID_ANY);

Since this only has lifetime of this function, use calloc() instead
since that avoids using huge page memory, and compiler and other checkers
"know about" malloc functions and engage more checks.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 10/10] net/mlx5: add rate table capacity query API
  2026-03-11 16:35     ` Stephen Hemminger
@ 2026-03-12 15:05       ` Vincent Jardin
  2026-03-12 16:01         ` Stephen Hemminger
  0 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 15:05 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

Hi Stephen,

Thank you for the review, see below,

Le 11/03/26 09:35, Stephen Hemminger a écrit :
> On Wed, 11 Mar 2026 00:26:53 +0100
> Vincent Jardin <vjardin@free.fr> wrote:
> 
> > +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query, 26.07)
> > +int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
> > +				     struct rte_pmd_mlx5_pp_rate_table_info *info)
> > +{
> > +	struct rte_eth_dev *dev;
> > +	struct mlx5_priv *priv;
> > +	uint16_t used = 0;
> > +	uint16_t *seen;
> > +	unsigned int i;
> > +
> > +	if (info == NULL)
> > +		return -EINVAL;
> 
> Prefer NULL checks in ethdev layer
> 
> > +	if (!rte_eth_dev_is_valid_port(port_id))
> > +		return -ENODEV;
> 
> Ditto checks for port_id should be at ethdev

This function is a PMD-specific API declared in rte_pmd_mlx5.h, not an ethdev op.
  application -> rte_pmd_mlx5_pp_rate_table_query() -> mlx5 internals

Therefore, the function must validate its own inputs:
 - port_id validity (via rte_eth_dev_get_by_name() / rte_eth_dev_is_valid_port())
 - info != NULL

Adding a generic ethdev op (ex eth_rate_table_query_t) for a concept
only one driver supports would be over-engineering.  If other drivers later
expose a similar rate table concept, that would be the time to factor out a
generic ethdev API.

> > +	dev = &rte_eth_devices[port_id];
> > +	priv = dev->data->dev_private;
> > +	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
> > +		rte_errno = ENOTSUP;
> > +		return -ENOTSUP;
> > +	}
> > +	info->total = priv->sh->cdev->config.hca_attr.qos
> > +			.packet_pacing_rate_table_size;
> 
> Since DPDK allows 100 character lines now, don't need line break

ok

> > +	if (priv->txqs == NULL || priv->txqs_n == 0) {
> > +		info->used = 0;
> > +		return 0;
> > +	}
> > +	seen = mlx5_malloc(MLX5_MEM_ZERO, priv->txqs_n * sizeof(*seen),
> > +			   0, SOCKET_ID_ANY);
> 
> Since this only has lifetime of this function, use calloc() instead
> since that avoids using huge page memory, and compiler and other checkers
> "know about" malloc functions and engage more checks.

Nope, I'll use
  mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO, ...
in order to remain consistent with mlx5's cases I could grep despite there are
still 3 other calloc() occurences that I did not analyze.

In any cases, it'll end up with calloc() (mlx5_malloc()).

Best regards,
  Vincent

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-11 16:26     ` Stephen Hemminger
@ 2026-03-12 15:54       ` Vincent Jardin
  0 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 15:54 UTC (permalink / raw)
  To: Stephen Hemminger
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

Hi,

Thanks for the review, see below.

Le 11/03/26 09:26, Stephen Hemminger a écrit :
> A couple of observations about this new API.
> 
> 1. The queue index validation is duplicated in both ethdev and mlx5
> driver. Since this is a new API, the ethdev layer should be the single place
> that validates queue_idx < nb_tx_queues and rejects NULL output
> pointers - the driver callbacks should be able to assume these
> preconditions are met. The same applies to the existing
> mlx5_set_queue_rate_limit() which re-checks bounds that
> rte_eth_set_queue_rate_limit() already validates. Remove the redundant
> checks from the driver.

You are right. In v3, I'll:

- rte_eth_get_queue_rate_limit() - validate queue_idx
  and NULL tx_rate pointer before calling the driver op. This is the
  single validation point.

- mlx5_get_queue_rate_limit() - no longer check queue_idx bounds or
  NULL -- it trusts the ethdev preconditions.

- mlx5_set_queue_rate_limit() - also set a redundant queue_idx bounds
  check that duplicated what rte_eth_set_queue_rate_limit() already
  does -- removed as well.

The only driver-level checks that remain are things the ethdev layer
cannot know about and whether the HW supports packet_pacing since
it seems a bit specific. Later on, it could be revisisted.

> 
> 2. The new rte_eth_get_queue_rate_limit() API uses uint32_t for the
> rate. Since this is a brand-new experimental API with no backward
> compatibility constraint, use uint64_t for the rate value to
> accommodate future devices with rates exceeding 4 Gbps. This applies to
> both the getter and the setter callback - adding a new getter that
> already needs a type change next year defeats the purpose of getting
> the API right now.

No, it would break API symmetry which I do not like.

The rate unit is Mbps, not bps. This is documented in both the setter and getter,
and matches how rte_eth_set_queue_rate_limit().

uint32_t in Mbps covers up to about 4.3 Tbps. Current top-end NICs are 800 Gbps
aggregate; per-queue rates are a fraction of that. We are more than
three orders of magnitude away from saturating uint32_t.

Changing the getter to uint64_t would break API symmetry -- the existing
setter is:
   int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
                                    uint32_t tx_rate);

The day per-queue rates approach the Tbps range, both setter and getter
would need to evolve together as part of a broader ethdev ABI revision.
Until then, let's keep it consistent.

> 
> Surprising to me, that AI review found nits, but totally missed
> several architectural issues.

+1, AI helps, but it is appreciated with the human reviews too ;) so we can have our flavours.

Best regards,
  Vincent

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v2 10/10] net/mlx5: add rate table capacity query API
  2026-03-12 15:05       ` Vincent Jardin
@ 2026-03-12 16:01         ` Stephen Hemminger
  0 siblings, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-12 16:01 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Thu, 12 Mar 2026 16:05:35 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> >   
> > > +	if (!rte_eth_dev_is_valid_port(port_id))
> > > +		return -ENODEV;  
> > 
> > Ditto checks for port_id should be at ethdev  
> 
> This function is a PMD-specific API declared in rte_pmd_mlx5.h, not an ethdev op.
>   application -> rte_pmd_mlx5_pp_rate_table_query() -> mlx5 internals
> 
> Therefore, the function must validate its own inputs:
>  - port_id validity (via rte_eth_dev_get_by_name() / rte_eth_dev_is_valid_port())
>  - info != NULL
> 
> Adding a generic ethdev op (ex eth_rate_table_query_t) for a concept
> only one driver supports would be over-engineering.  If other drivers later
> expose a similar rate table concept, that would be the time to factor out a
> generic ethdev API.

OK, but I have a different point view than most DPDK developers.
Allowing and having driver specific API's in the first place was a mistake.
It encourages one off API's and makes applications locked in to a specific
NIC; which is good for NIC vendors but bad design pattern.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                   ` (11 preceding siblings ...)
  2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
@ 2026-03-12 22:01 ` Vincent Jardin
  2026-03-12 22:01   ` [PATCH v3 01/9] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
                     ` (10 more replies)
  12 siblings, 11 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
ethdev API to read back the configured rate.

Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a
dedicated PP index per rate from the HW rate table, programs it into the
SQ via modify_sq, and shares identical rates across queues to conserve
table entries. A PMD-specific API exposes per-queue PP diagnostics and
rate table capacity.

Patch breakdown:

  1. doc/nics/mlx5: fix stale packet pacing documentation
  2-3. common/mlx5: query PP capabilities and extend SQ modify
  4-6. net/mlx5: per-queue PP infrastructure, rate_limit callback,
       burst pacing devargs (tx_burst_bound, tx_typical_pkt_sz)
  7. net/mlx5: testpmd command to query per-queue rate state
  8. ethdev: add rte_eth_get_queue_rate_limit() symmetric getter
       + testpmd "show port <id> queue <id> rate" command
  9. net/mlx5: rate table capacity query API

Usage with testpmd:
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0      # disable
  show port 0 queue 0 rate       # generic ethdev query
  mlx5 port 0 txq 0 rate show    # mlx5 PMD-specific query

Changes since v2:

Patch 4 (per-queue packet pacing infrastructure):
  - Folded "share pacing rate table entries across queues" into
    this patch (was a separate patch in v2)

Patch 5 (support per-queue rate limiting):
  - Remove redundant queue_idx >= nb_tx_queues check (ethdev
    layer already validates before calling the PMD callback)

Patch 8 (ethdev getter):
  - Add testpmd "show port <id> queue <id> rate" command
    in app/test-pmd/cmdline.c using rte_eth_get_queue_rate_limit()
  - Drop release notes (targeting 26.07, not 26.03)
  - Remove redundant queue_idx bounds check from mlx5 getter

Patch 9 (rate table capacity query):
  - Use MLX5_MEM_SYS flag in mlx5_malloc() for system memory
  - Minor code style cleanups (line wrapping, cast formatting)

Changes since v1:

Addressed review feedback from Stephen Hemminger's AI:

Patch 4 (per-queue packet pacing infrastructure):
  - Validate rate_mbps against HCA packet_pacing_min_rate and
    packet_pacing_max_rate bounds; return -ERANGE on out-of-range
  - Widen rate_kbps from uint32_t to uint64_t to prevent
    overflow on rate_mbps * 1000
  - Remove early mlx5_txq_free_pp_rate_limit() call from the
    allocator (moved to caller, see patch 5)

Patch 5 (support per-queue rate limiting):
  - Fix PP index leak on modify_sq failure: allocate new PP into a
    temporary struct mlx5_txq_rate_limit; only swap into txq_ctrl->rl
    after modify_sq succeeds. On failure the old PP context stays intact.
  - Set rte_errno = -ret before returning errors from both the
    disable (tx_rate=0) and enable paths

Patch 7 (testpmd command to query per-queue rate limit):
  - Fix inverted rte_eth_tx_queue_is_valid() return value check:
    was "if (rte_eth_tx_queue_is_valid(...))" (accepts invalid queues),
    changed to "if (rte_eth_tx_queue_is_valid(...) != 0)"

Patch 9 (rate table capacity query, was patch 10 in v1):
  - Replace uint16_t seen[RTE_MAX_QUEUES_PER_PORT] (2 KB stack array)
    with heap-allocated mlx5_malloc(priv->txqs_n, ...) + mlx5_free()
  - Add early return when txqs == NULL || txqs_n == 0
  - Document in the API Doxygen that "used" reflects only the queried
    port's queues; other ports on the same device may also consume
    rate table entries
  - Add -ENOMEM to documented return values

Hardware tested:
  - ConnectX-6 Dx (packet pacing with MLX5_DATA_RATE)

Vincent Jardin (9):
  doc/nics/mlx5: fix stale packet pacing documentation
  common/mlx5: query packet pacing rate table capabilities
  common/mlx5: extend SQ modify to support rate limit update
  net/mlx5: add per-queue packet pacing infrastructure
  net/mlx5: support per-queue rate limiting
  net/mlx5: add burst pacing devargs
  net/mlx5: add testpmd command to query per-queue rate limit
  ethdev: add getter for per-queue Tx rate limit
  net/mlx5: add rate table capacity query API

 app/test-pmd/cmdline.c               |  69 +++++++++++++
 doc/guides/nics/mlx5.rst             | 125 ++++++++++++++++++------
 drivers/common/mlx5/mlx5_devx_cmds.c |  20 ++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
 drivers/net/mlx5/mlx5.c              |  46 +++++++++
 drivers/net/mlx5/mlx5.h              |  13 +++
 drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c           | 104 +++++++++++++++++++-
 drivers/net/mlx5/mlx5_tx.h           |   5 +
 drivers/net/mlx5/mlx5_txpp.c         |  85 ++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c          | 141 +++++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h      |  62 ++++++++++++
 lib/ethdev/ethdev_driver.h           |   7 ++
 lib/ethdev/rte_ethdev.c              |  28 ++++++
 lib/ethdev/rte_ethdev.h              |  24 +++++
 15 files changed, 802 insertions(+), 33 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v3 01/9] doc/nics/mlx5: fix stale packet pacing documentation
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-12 22:01   ` [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
                     ` (9 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

The Tx Scheduling section incorrectly stated that timestamps can only
be put on the first packet in a burst. The driver actually checks every
packet's ol_flags for the timestamp dynamic flag and inserts a dedicated
WAIT WQE per timestamped packet. The eMPW path also breaks batches when
a timestamped packet is encountered.

Additionally, the ConnectX-7+ wait-on-time capability was only briefly
mentioned in the tx_pp parameter section with no explanation of how it
differs from the ConnectX-6 Dx Clock Queue approach.

This patch:
- Removes the stale first-packet-only limitation
- Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
  ConnectX-7+ wait-on-time) with separate requirements tables
- Clarifies that tx_pp is specific to ConnectX-6 Dx
- Fixes tx_skew applicability to cover both hardware generations
- Updates the Send Scheduling Counters intro to reflect that timestamp
  validation counters also apply to ConnectX-7+ wait-on-time mode

Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
 1 file changed, 78 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 2529c2f4c8..5b097dbc90 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -553,27 +553,32 @@ for an additional list of options shared with other mlx5 drivers.
 
 - ``tx_pp`` parameter [int]
 
+  This parameter applies to **ConnectX-6 Dx** only.
   If a nonzero value is specified the driver creates all necessary internal
-  objects to provide accurate packet send scheduling on mbuf timestamps.
+  objects (Clock Queue and Rearm Queue) to provide accurate packet send
+  scheduling on mbuf timestamps using a cross-channel approach.
   The positive value specifies the scheduling granularity in nanoseconds,
   the packet send will be accurate up to specified digits. The allowed range is
   from 500 to 1 million of nanoseconds. The negative value specifies the module
   of granularity and engages the special test mode the check the schedule rate.
   By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
-  feature is disabled.
+  feature is disabled on ConnectX-6 Dx.
 
-  Starting with ConnectX-7 the capability to schedule traffic directly
-  on timestamp specified in descriptor is provided,
-  no extra objects are needed anymore and scheduling capability
-  is advertised and handled regardless ``tx_pp`` parameter presence.
+  Starting with **ConnectX-7** the hardware provides a native wait-on-time
+  capability that inserts the scheduling delay directly in the WQE descriptor.
+  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter is not
+  required. The driver automatically advertises send scheduling support when
+  the HCA wait-on-time capability is detected. The ``tx_skew`` parameter can
+  still be used on ConnectX-7 and above to compensate for wire delay.
 
 - ``tx_skew`` parameter [int]
 
   The parameter adjusts the send packet scheduling on timestamps and represents
   the average delay between beginning of the transmitting descriptor processing
   by the hardware and appearance of actual packet data on the wire. The value
-  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
-  specified. The default value is zero.
+  should be provided in nanoseconds and applies to both ConnectX-6 Dx
+  (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
+  The default value is zero.
 
 - ``tx_vec_en`` parameter [int]
 
@@ -883,9 +888,13 @@ Send Scheduling Counters
 
 The mlx5 PMD provides a comprehensive set of counters designed for
 debugging and diagnostics related to packet scheduling during transmission.
-These counters are applicable only if the port was configured with the ``tx_pp`` devarg
-and reflect the status of the PMD scheduling infrastructure
-based on Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
+The first group of counters (prefixed ``tx_pp_``) reflects the status of the
+Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx and is
+applicable only if the port was configured with the ``tx_pp`` devarg.
+The timestamp validation counters
+(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
+``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and above
+in wait-on-time mode, without requiring ``tx_pp``.
 
 ``tx_pp_missed_interrupt_errors``
   Indicates that the Rearm Queue interrupt was not serviced on time.
@@ -1960,31 +1969,54 @@ Limitations
 Tx Scheduling
 ~~~~~~~~~~~~~
 
-When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on the packet
-being sent it tries to synchronize the time of packet appearing on
-the wire with the specified packet timestamp. If the specified one
-is in the past it should be ignored, if one is in the distant future
-it should be capped with some reasonable value (in range of seconds).
-These specific cases ("too late" and "distant future") can be optionally
-reported via device xstats to assist applications to detect the
-time-related problems.
-
-The timestamp upper "too-distant-future" limit
-at the moment of invoking the Tx burst routine
-can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on a packet
+being sent it inserts a dedicated WAIT WQE to synchronize the time of the
+packet appearing on the wire with the specified timestamp. Every packet
+in a burst that carries the timestamp dynamic flag is individually
+scheduled -- there is no restriction to the first packet only.
+
+If the specified timestamp is in the past, the packet is sent immediately.
+If it is in the distant future it should be capped with some reasonable
+value (in range of seconds). These specific cases ("too late" and
+"distant future") can be optionally reported via device xstats to assist
+applications to detect time-related problems.
+
+The eMPW (enhanced Multi-Packet Write) data path automatically breaks
+the batch when a timestamped packet is encountered, ensuring each
+scheduled packet gets its own WAIT WQE.
+
+Two hardware mechanisms are supported:
+
+**ConnectX-6 Dx -- Clock Queue (cross-channel)**
+   The driver creates a Clock Queue and a Rearm Queue that together
+   provide a time reference for scheduling. This mode requires the
+   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
+   "too-distant-future" limit at the moment of invoking the Tx burst
+   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
+   by 2^23.
+
+**ConnectX-7 and above -- wait-on-time**
+   The hardware supports placing the scheduling delay directly inside
+   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
+   ``tx_pp`` devarg is **not** required. The driver automatically
+   advertises send scheduling support when the HCA wait-on-time
+   capability is detected.
+
 Please note, for the testpmd txonly mode,
 the limit is deduced from the expression::
 
    (n_tx_descriptors / burst_size + 1) * inter_burst_gap
 
-There is no any packet reordering according timestamps is supposed,
-neither within packet burst, nor between packets, it is an entirely
-application responsibility to generate packets and its timestamps
-in desired order.
+There is no packet reordering according to timestamps,
+neither within a packet burst, nor between packets. It is entirely the
+application's responsibility to generate packets and their timestamps
+in the desired order.
 
 Requirements
 ^^^^^^^^^^^^
 
+ConnectX-6 Dx (Clock Queue mode):
+
 =========  =============
 Minimum    Version
 =========  =============
@@ -1996,20 +2028,35 @@ rdma-core
 DPDK       20.08
 =========  =============
 
+ConnectX-7 and above (wait-on-time mode):
+
+=========  =============
+Minimum    Version
+=========  =============
+hardware   ConnectX-7
+=========  =============
+
 Firmware configuration
 ^^^^^^^^^^^^^^^^^^^^^^
 
 Runtime configuration
 ^^^^^^^^^^^^^^^^^^^^^
 
-To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
-parameter should be specified.
+**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must be
+specified to enable send scheduling on mbuf timestamps.
+
+**ConnectX-7+**: no devarg is required. Send scheduling is automatically
+enabled when the HCA reports the wait-on-time capability.
+
+On both hardware generations the ``tx_skew`` parameter can be used to
+compensate for the delay between descriptor processing and actual wire
+time.
 
 Limitations
 ^^^^^^^^^^^
 
-#. The timestamps can be put only in the first packet
-   in the burst providing the entire burst scheduling.
+#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
+   are capped (see the ``tx_pp`` x 2^23 limit above).
 
 
 .. _mlx5_tx_inline:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
  2026-03-12 22:01   ` [PATCH v3 01/9] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 12:02     ` Slava Ovsiienko
  2026-03-12 22:01   ` [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
                     ` (8 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Query additional QoS packet pacing capabilities from HCA attributes:
- packet_pacing_burst_bound: HW supports burst_upper_bound parameter
- packet_pacing_typical_size: HW supports typical_packet_size parameter
- packet_pacing_max_rate / packet_pacing_min_rate: rate range in kbps
- packet_pacing_rate_table_size: number of HW rate table entries

These capabilities are needed by the upcoming per-queue rate limiting
feature to validate devarg values and report HW limits.

Supported hardware:
- ConnectX-6 Dx and later (different boards expose different subsets)
- ConnectX-5 reports packet_pacing but not all extended fields
- ConnectX-7/8 report the full capability set
- BlueField-2 and later DPUs also report these capabilities

Not supported:
- ConnectX-4 Lx and earlier (no packet_pacing capability at all)
- ConnectX-5 Ex may not report burst_bound or typical_size

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 11 ++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index d12ebf8487..8f53303fa7 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1244,6 +1244,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 				MLX5_GET(qos_cap, hcattr, packet_pacing);
 		attr->qos.wqe_rate_pp =
 				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
+		attr->qos.packet_pacing_burst_bound =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_burst_bound);
+		attr->qos.packet_pacing_typical_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_typical_size);
+		attr->qos.packet_pacing_max_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_max_rate);
+		attr->qos.packet_pacing_min_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_min_rate);
+		attr->qos.packet_pacing_rate_table_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_rate_table_size);
 		if (attr->qos.flow_meter_aso_sup) {
 			attr->qos.log_meter_aso_granularity =
 				MLX5_GET(qos_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index da50fc686c..930ae2c072 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -67,7 +67,16 @@ struct mlx5_hca_qos_attr {
 	/* Power of the maximum allocation granularity Object. */
 	uint32_t log_max_num_meter_aso:5;
 	/* Power of the maximum number of supported objects. */
-
+	uint32_t packet_pacing_burst_bound:1;
+	/* HW supports burst_upper_bound PP parameter. */
+	uint32_t packet_pacing_typical_size:1;
+	/* HW supports typical_packet_size PP parameter. */
+	uint32_t packet_pacing_max_rate;
+	/* Maximum supported pacing rate in kbps. */
+	uint32_t packet_pacing_min_rate;
+	/* Minimum supported pacing rate in kbps. */
+	uint16_t packet_pacing_rate_table_size;
+	/* Number of entries in the HW rate table. */
 };
 
 struct mlx5_hca_vdpa_attr {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
  2026-03-12 22:01   ` [PATCH v3 01/9] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
  2026-03-12 22:01   ` [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 12:01     ` Slava Ovsiienko
  2026-03-12 22:01   ` [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
                     ` (7 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add rl_update and packet_pacing_rate_limit_index fields to
mlx5_devx_modify_sq_attr. When rl_update is set, the modify SQ
command sets modify_bitmask bit 0 and writes the PP index into
the SQ context, allowing dynamic rate changes on a live RDY SQ
without teardown.

modify_sq_in.modify_bitmask[0x40] bit 0 controls the
packet_pacing_rate_limit_index.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same SQ context field, also supports wait-on-time
- BlueField-2/3: same modify_sq command support

Not supported:
- ConnectX-5: supports packet_pacing but only at SQ creation time,
  dynamic modify_bitmask update may not be supported on all FW
- ConnectX-4 Lx and earlier: no packet_pacing support

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 5 +++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++
 2 files changed, 8 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 8f53303fa7..17378e1753 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2129,6 +2129,11 @@ mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
 	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	if (sq_attr->rl_update) {
+		MLX5_SET64(modify_sq_in, in, modify_bitmask, 1);
+		MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+			 sq_attr->packet_pacing_rate_limit_index);
+	}
 	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
 					 out, sizeof(out));
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 930ae2c072..82d949972b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -519,6 +519,9 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t state:4;
 	uint32_t hairpin_peer_rq:24;
 	uint32_t hairpin_peer_vhca:16;
+	uint32_t rl_update:1;
+	/* Set to update packet_pacing_rate_limit_index on a live SQ. */
+	uint32_t packet_pacing_rate_limit_index:16;
 };
 
 
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (2 preceding siblings ...)
  2026-03-12 22:01   ` [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 12:51     ` Slava Ovsiienko
  2026-03-12 22:01   ` [PATCH v3 05/9] net/mlx5: support per-queue rate limiting Vincent Jardin
                     ` (6 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add mlx5_txq_rate_limit structure and alloc/free helpers for
per-queue data-rate packet pacing. Each Tx queue can now hold
its own PP (Packet Pacing) index allocated via mlx5dv_pp_alloc()
with MLX5_DATA_RATE mode.

mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM
rate_limit field and allocates a PP index from the HW rate table.
mlx5_txq_free_pp_rate_limit() releases it.

PP allocation uses shared mode (flags=0) so that the kernel mlx5
driver can reuse a single HW rate table entry for all PP contexts
with identical parameters (rate, burst, packet size). This avoids
exhausting the rate table (typically 128 entries on ConnectX-6 Dx)
when many queues share the same rate. Each queue still gets its
own PP handle for proper cleanup.

The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is
untouched — it uses MLX5_WQE_RATE for per-packet scheduling with
a dedicated index, while per-queue rate limiting uses MLX5_DATA_RATE.

PP index cleanup is added to mlx5_txq_release() to prevent leaks
when queues are destroyed.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same mechanism, plus wait-on-time coexistence
- BlueField-2/3: same PP allocation support

Not supported:
- ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
  not be available on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.h      | 11 ++++++
 drivers/net/mlx5/mlx5_tx.h   |  1 +
 drivers/net/mlx5/mlx5_txpp.c | 73 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c  |  1 +
 4 files changed, 86 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index b83dda5652..c48c3072d1 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1296,6 +1296,13 @@ struct mlx5_txpp_ts {
 	RTE_ATOMIC(uint64_t) ts;
 };
 
+/* Per-queue rate limit tracking. */
+struct mlx5_txq_rate_limit {
+	void *pp;		/* Packet pacing context from dv_alloc_pp. */
+	uint16_t pp_id;		/* Packet pacing index. */
+	uint32_t rate_mbps;	/* Current rate in Mbps, 0 = disabled. */
+};
+
 /* Tx packet pacing structure. */
 struct mlx5_dev_txpp {
 	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
@@ -2634,6 +2641,10 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev,
 void mlx5_txpp_interrupt_handler(void *cb_arg);
 int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev);
 void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev);
+int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+				 struct mlx5_txq_rate_limit *rl,
+				 uint32_t rate_mbps);
+void mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl);
 
 /* mlx5_rxtx.c */
 
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 0134a2e003..b1b3653247 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -192,6 +192,7 @@ struct mlx5_txq_ctrl {
 	uint16_t dump_file_n; /* Number of dump files. */
 	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	uint32_t hairpin_status; /* Hairpin binding status. */
+	struct mlx5_txq_rate_limit rl; /* Per-queue rate limit. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 0e99b58bde..0a883b0a94 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -128,6 +128,79 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 #endif
 }
 
+/* Free a per-queue packet pacing index. */
+void
+mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	if (rl->pp) {
+		mlx5_glue->dv_free_pp(rl->pp);
+		rl->pp = NULL;
+		rl->pp_id = 0;
+		rl->rate_mbps = 0;
+	}
+#else
+	RTE_SET_USED(rl);
+#endif
+}
+
+/* Allocate a per-queue packet pacing index for data-rate limiting. */
+int
+mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_txq_rate_limit *rl,
+			     uint32_t rate_mbps)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
+	uint64_t rate_kbps;
+	struct mlx5_hca_qos_attr *qos = &sh->cdev->config.hca_attr.qos;
+
+	MLX5_ASSERT(rate_mbps > 0);
+	rate_kbps = (uint64_t)rate_mbps * 1000;
+	if (qos->packet_pacing_min_rate && rate_kbps < qos->packet_pacing_min_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps below HW minimum (%u kbps).",
+			rate_mbps, qos->packet_pacing_min_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	if (qos->packet_pacing_max_rate && rate_kbps > qos->packet_pacing_max_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps exceeds HW maximum (%u kbps).",
+			rate_mbps, qos->packet_pacing_max_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	memset(&pp, 0, sizeof(pp));
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	rl->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp), &pp, 0);
+	if (rl->pp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
+			rate_mbps);
+		rte_errno = errno;
+		return -errno;
+	}
+	rl->pp_id = ((struct mlx5dv_pp *)rl->pp)->index;
+	if (!rl->pp_id) {
+		DRV_LOG(ERR, "Zero PP index allocated for rate %u Mbps.",
+			rate_mbps);
+		mlx5_txq_free_pp_rate_limit(rl);
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	rl->rate_mbps = rate_mbps;
+	DRV_LOG(DEBUG, "Allocated PP index %u for rate %u Mbps.",
+		rl->pp_id, rate_mbps);
+	return 0;
+#else
+	RTE_SET_USED(sh);
+	RTE_SET_USED(rl);
+	RTE_SET_USED(rate_mbps);
+	DRV_LOG(ERR, "Per-queue rate limit requires rdma-core PP support.");
+	rte_errno = ENOTSUP;
+	return -ENOTSUP;
+#endif
+}
+
 static void
 mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9275efb58e..fa9bb48fd4 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1338,6 +1338,7 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
 	if (rte_atomic_fetch_sub_explicit(&txq_ctrl->refcnt, 1, rte_memory_order_relaxed) - 1 > 1)
 		return 1;
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
 	if (txq_ctrl->obj) {
 		priv->obj_ops.txq_obj_release(txq_ctrl->obj);
 		LIST_REMOVE(txq_ctrl->obj, next);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 05/9] net/mlx5: support per-queue rate limiting
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (3 preceding siblings ...)
  2026-03-12 22:01   ` [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 15:11     ` Slava Ovsiienko
  2026-03-12 22:01   ` [PATCH v3 06/9] net/mlx5: add burst pacing devargs Vincent Jardin
                     ` (5 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback
allocates a per-queue PP index with the requested data rate, then
modifies the live SQ via modify_bitmask bit 0 to apply the new
packet_pacing_rate_limit_index — no queue teardown required.

Setting tx_rate=0 clears the PP index on the SQ and frees it.

Capability check uses hca_attr.qos.packet_pacing directly (not
dev_cap.txpp_en which requires Clock Queue prerequisites). This
allows per-queue rate limiting without the tx_pp devarg.

The callback rejects hairpin queues and queues whose SQ is not
yet created.

testpmd usage (no testpmd changes needed):
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0     # disable

Supported hardware:
- ConnectX-6 Dx: full support, per-SQ rate via HW rate table
- ConnectX-7/8: full support, coexists with wait-on-time scheduling
- BlueField-2/3: full support as DPU rep ports

Not supported:
- ConnectX-5: packet_pacing exists but dynamic SQ modify may not
  work on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.c     |  2 +
 drivers/net/mlx5/mlx5_tx.h  |  2 +
 drivers/net/mlx5/mlx5_txq.c | 97 +++++++++++++++++++++++++++++++++++++
 3 files changed, 101 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 4d3bfddc36..c390406ac7 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2690,6 +2690,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2783,6 +2784,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.count_aggr_ports = mlx5_count_aggr_ports,
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index b1b3653247..3a37f5bb4d 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index fa9bb48fd4..f2ed2454a0 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1363,6 +1363,103 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	return 0;
 }
 
+/**
+ * Set per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param tx_rate
+ *   TX rate in Mbps, 0 to disable rate limiting.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	int ret;
+
+	if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
+		DRV_LOG(ERR, "Port %u packet pacing not supported.",
+			dev->data->port_id);
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	if (txq_ctrl->is_hairpin) {
+		DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (tx_rate == 0) {
+		/* Disable rate limiting. */
+		if (txq_ctrl->rl.pp_id == 0)
+			return 0; /* Already disabled. */
+		sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.rl_update = 1;
+		sq_attr.packet_pacing_rate_limit_index = 0;
+		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+		if (ret) {
+			DRV_LOG(ERR,
+				"Port %u Tx queue %u failed to clear rate.",
+				dev->data->port_id, queue_idx);
+			rte_errno = -ret;
+			return ret;
+		}
+		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
+		DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
+			dev->data->port_id, queue_idx);
+		return 0;
+	}
+	/* Allocate a new PP index for the requested rate into a temp. */
+	struct mlx5_txq_rate_limit new_rl = { 0 };
+
+	ret = mlx5_txq_alloc_pp_rate_limit(sh, &new_rl, tx_rate);
+	if (ret)
+		return ret;
+	/* Modify live SQ to use the new PP index. */
+	sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+	sq_attr.state = MLX5_SQC_STATE_RDY;
+	sq_attr.rl_update = 1;
+	sq_attr.packet_pacing_rate_limit_index = new_rl.pp_id;
+	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
+			dev->data->port_id, queue_idx, tx_rate);
+		mlx5_txq_free_pp_rate_limit(&new_rl);
+		rte_errno = -ret;
+		return ret;
+	}
+	/* SQ updated — release old PP context, install new one. */
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
+	txq_ctrl->rl = new_rl;
+	DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx %u).",
+		dev->data->port_id, queue_idx, tx_rate, txq_ctrl->rl.pp_id);
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 06/9] net/mlx5: add burst pacing devargs
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (4 preceding siblings ...)
  2026-03-12 22:01   ` [PATCH v3 05/9] net/mlx5: support per-queue rate limiting Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 15:19     ` Slava Ovsiienko
  2026-03-12 22:01   ` [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
                     ` (4 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Expose burst_upper_bound and typical_packet_size from the PRM
set_pp_rate_limit_context as devargs:
- tx_burst_bound=<bytes>: max burst before rate evaluation kicks in
- tx_typical_pkt_sz=<bytes>: typical packet size for accuracy

These parameters apply to both per-queue rate limiting
(rte_eth_set_queue_rate_limit) and Clock Queue pacing (tx_pp).

Values are validated against HCA capabilities
(packet_pacing_burst_bound and packet_pacing_typical_size).
If the HW does not support them, a warning is logged and the
value is silently zeroed. Test mode still overrides both values.

Shared context mismatch checks ensure all ports on the same
device use the same burst parameters.

Supported hardware:
- ConnectX-6 Dx: burst_upper_bound and typical_packet_size
  reported via packet_pacing_burst_bound / packet_pacing_typical_size
  QoS capability bits
- ConnectX-7/8: full support for both parameters
- BlueField-2/3: same capabilities as host-side ConnectX

Not supported:
- ConnectX-5: may not report burst_bound or typical_size caps
- ConnectX-4 Lx and earlier: no packet_pacing at all

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst     | 16 ++++++++++++++
 drivers/net/mlx5/mlx5.c      | 42 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_txpp.c | 12 +++++++++++
 4 files changed, 72 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 5b097dbc90..2507fae846 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,22 @@ for an additional list of options shared with other mlx5 drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+- ``tx_burst_bound`` parameter [int]
+
+  Specifies the burst upper bound in bytes for packet pacing rate evaluation.
+  When set, the hardware considers this burst size when enforcing the configured
+  rate limit. Only effective when the HCA reports ``packet_pacing_burst_bound``
+  capability. Applies to both per-queue rate limiting
+  (``rte_eth_set_queue_rate_limit()``) and Clock Queue pacing (``tx_pp``).
+  The default value is zero (hardware default).
+
+- ``tx_typical_pkt_sz`` parameter [int]
+
+  Specifies the typical packet size in bytes for packet pacing rate accuracy
+  improvement. Only effective when the HCA reports
+  ``packet_pacing_typical_size`` capability. Applies to both per-queue rate
+  limiting and Clock Queue pacing. The default value is zero (hardware default).
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector with ConnectX-5 NICs and above.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index c390406ac7..f399e0d5c9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -119,6 +119,18 @@
  */
 #define MLX5_TX_SKEW "tx_skew"
 
+/*
+ * Device parameter to specify burst upper bound in bytes
+ * for packet pacing rate evaluation.
+ */
+#define MLX5_TX_BURST_BOUND "tx_burst_bound"
+
+/*
+ * Device parameter to specify typical packet size in bytes
+ * for packet pacing rate accuracy improvement.
+ */
+#define MLX5_TX_TYPICAL_PKT_SZ "tx_typical_pkt_sz"
+
 /*
  * Device parameter to enable hardware Tx vector.
  * Deprecated, ignored (no vectorized Tx routines anymore).
@@ -1405,6 +1417,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->tx_pp = tmp;
 	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
 		config->tx_skew = tmp;
+	} else if (strcmp(MLX5_TX_BURST_BOUND, key) == 0) {
+		config->tx_burst_bound = tmp;
+	} else if (strcmp(MLX5_TX_TYPICAL_PKT_SZ, key) == 0) {
+		config->tx_typical_pkt_sz = tmp;
 	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
 		config->l3_vxlan_en = !!tmp;
 	} else if (strcmp(MLX5_VF_NL_EN, key) == 0) {
@@ -1518,8 +1534,10 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 				struct mlx5_sh_config *config)
 {
 	const char **params = (const char *[]){
+		MLX5_TX_BURST_BOUND,
 		MLX5_TX_PP,
 		MLX5_TX_SKEW,
+		MLX5_TX_TYPICAL_PKT_SZ,
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_ESW_EN,
@@ -1626,6 +1644,18 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(WARNING,
 			"\"tx_skew\" doesn't affect without \"tx_pp\".");
 	}
+	if (config->tx_burst_bound &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_burst_bound) {
+		DRV_LOG(WARNING,
+			"HW does not support burst_upper_bound, ignoring.");
+		config->tx_burst_bound = 0;
+	}
+	if (config->tx_typical_pkt_sz &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_typical_size) {
+		DRV_LOG(WARNING,
+			"HW does not support typical_packet_size, ignoring.");
+		config->tx_typical_pkt_sz = 0;
+	}
 	/* Check for LRO support. */
 	if (mlx5_devx_obj_ops_en(sh) && sh->cdev->config.hca_attr.lro_cap) {
 		/* TBD check tunnel lro caps. */
@@ -3260,6 +3290,18 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.tx_burst_bound != config->tx_burst_bound) {
+		DRV_LOG(ERR, "\"tx_burst_bound\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
+	if (sh->config.tx_typical_pkt_sz != config->tx_typical_pkt_sz) {
+		DRV_LOG(ERR, "\"tx_typical_pkt_sz\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
 		DRV_LOG(ERR, "\"TxQ memory alignment\" "
 			"configuration mismatch for shared %s context. %u - %u",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index c48c3072d1..a8d71482ac 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -382,6 +382,8 @@ struct mlx5_port_config {
 struct mlx5_sh_config {
 	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
 	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
+	uint32_t tx_burst_bound; /* Burst upper bound in bytes, 0 = default. */
+	uint32_t tx_typical_pkt_sz; /* Typical packet size in bytes, 0 = default. */
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 0a883b0a94..756a772cc5 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -88,6 +88,12 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 	rate = NS_PER_S / sh->txpp.tick;
 	if (rate * sh->txpp.tick != NS_PER_S)
 		DRV_LOG(WARNING, "Packet pacing frequency is not precise.");
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	if (sh->txpp.test) {
 		uint32_t len;
 
@@ -172,6 +178,12 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 	memset(&pp, 0, sizeof(pp));
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	rl->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp), &pp, 0);
 	if (rl->pp == NULL) {
 		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (5 preceding siblings ...)
  2026-03-12 22:01   ` [PATCH v3 06/9] net/mlx5: add burst pacing devargs Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 15:38     ` Slava Ovsiienko
  2026-03-12 22:01   ` [PATCH v3 08/9] ethdev: add getter for per-queue Tx " Vincent Jardin
                     ` (3 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add a new testpmd command to display the per-queue packet pacing
rate limit state, including the PP index from both driver state
and FW SQ context readback:

  testpmd> mlx5 port <port_id> txq <queue_id> rate show

This helps verify that the FW actually applied the PP index to
the SQ after setting a per-queue rate limit.

Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that
queries txq_ctrl->rl for driver state and mlx5_devx_cmd_query_sq()
for the FW packet_pacing_rate_limit_index field.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_testpmd.c | 93 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c      | 40 +++++++++++++-
 drivers/net/mlx5/mlx5_txq.c     | 19 +++++--
 drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++++++++++
 4 files changed, 178 insertions(+), 4 deletions(-)

diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 1bb5a89559..fd3efecc5d 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -1365,6 +1365,94 @@ cmdline_parse_inst_t mlx5_cmd_dump_rq_context_options = {
 	}
 };
 
+/* Show per-queue rate limit PP index for a given port/queue */
+struct mlx5_cmd_show_rate_limit_options {
+	cmdline_fixed_string_t mlx5;
+	cmdline_fixed_string_t port;
+	portid_t port_id;
+	cmdline_fixed_string_t txq;
+	queueid_t queue_id;
+	cmdline_fixed_string_t rate;
+	cmdline_fixed_string_t show;
+};
+
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_mlx5 =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 mlx5, "mlx5");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 port, "port");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_port_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      port_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_txq =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 txq, "txq");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_queue_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      queue_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 rate, "rate");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 show, "show");
+
+static void
+mlx5_cmd_show_rate_limit_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct mlx5_cmd_show_rate_limit_options *res = parsed_result;
+	struct rte_pmd_mlx5_txq_rate_limit_info info;
+	int ret;
+
+	ret = rte_pmd_mlx5_txq_rate_limit_query(res->port_id, res->queue_id,
+						 &info);
+	switch (ret) {
+	case 0:
+		break;
+	case -ENODEV:
+		fprintf(stderr, "invalid port_id %u\n", res->port_id);
+		return;
+	case -EINVAL:
+		fprintf(stderr, "invalid queue index (%u), out of range\n",
+			res->queue_id);
+		return;
+	case -EIO:
+		fprintf(stderr, "failed to query SQ context\n");
+		return;
+	default:
+		fprintf(stderr, "query failed (%d)\n", ret);
+		return;
+	}
+	fprintf(stdout, "Port %u Txq %u rate limit info:\n",
+		res->port_id, res->queue_id);
+	if (info.rate_mbps > 0)
+		fprintf(stdout, "  Configured rate: %u Mbps\n",
+			info.rate_mbps);
+	else
+		fprintf(stdout, "  Configured rate: disabled\n");
+	fprintf(stdout, "  PP index (driver): %u\n", info.pp_index);
+	fprintf(stdout, "  PP index (FW readback): %u\n", info.fw_pp_index);
+}
+
+cmdline_parse_inst_t mlx5_cmd_show_rate_limit = {
+	.f = mlx5_cmd_show_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "mlx5 port <port_id> txq <queue_id> rate show",
+	.tokens = {
+		(void *)&mlx5_cmd_show_rate_limit_mlx5,
+		(void *)&mlx5_cmd_show_rate_limit_port,
+		(void *)&mlx5_cmd_show_rate_limit_port_id,
+		(void *)&mlx5_cmd_show_rate_limit_txq,
+		(void *)&mlx5_cmd_show_rate_limit_queue_id,
+		(void *)&mlx5_cmd_show_rate_limit_rate,
+		(void *)&mlx5_cmd_show_rate_limit_show,
+		NULL,
+	}
+};
+
 static struct testpmd_driver_commands mlx5_driver_cmds = {
 	.commands = {
 		{
@@ -1440,6 +1528,11 @@ static struct testpmd_driver_commands mlx5_driver_cmds = {
 			.help = "mlx5 port (port_id) queue (queue_id) dump rq_context (file_name)\n"
 				"    Dump mlx5 RQ Context\n\n",
 		},
+		{
+			.ctx = &mlx5_cmd_show_rate_limit,
+			.help = "mlx5 port (port_id) txq (queue_id) rate show\n"
+				"    Show per-queue rate limit PP index\n\n",
+		},
 		{
 			.ctx = NULL,
 		},
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 8085b5c306..fa57d3ef98 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -800,7 +800,7 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	if (!rte_eth_dev_is_valid_port(port_id))
 		return -ENODEV;
 
-	if (rte_eth_tx_queue_is_valid(port_id, queue_id))
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
 		return -EINVAL;
 
 	fd = fopen(path, "w");
@@ -848,3 +848,41 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	fclose(fd);
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_txq_rate_limit_query, 26.07)
+int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				       struct rte_pmd_mlx5_txq_rate_limit_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	struct mlx5_txq_data *txq_data;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	uint32_t sq_out[MLX5_ST_SZ_DW(query_sq_out)] = {0};
+	int ret;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
+		return -EINVAL;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	txq_data = (*priv->txqs)[queue_id];
+	txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	info->rate_mbps = txq_ctrl->rl.rate_mbps;
+	info->pp_index = txq_ctrl->rl.pp_id;
+	if (txq_ctrl->obj == NULL) {
+		info->fw_pp_index = 0;
+		return 0;
+	}
+	ret = mlx5_devx_cmd_query_sq(txq_ctrl->obj->sq_obj.sq,
+				     sq_out, sizeof(sq_out));
+	if (ret)
+		return -EIO;
+	info->fw_pp_index = MLX5_GET(sqc,
+				     MLX5_ADDR_OF(query_sq_out, sq_out,
+						  sq_context),
+				     packet_pacing_rate_limit_index);
+	return 0;
+}
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index f2ed2454a0..f0881f3560 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1406,7 +1406,20 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 		rte_errno = EINVAL;
 		return -rte_errno;
 	}
-	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {
+	if (txq_ctrl->obj == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * For non-hairpin queues the SQ DevX object lives in
+	 * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
+	 * queues use obj->sq directly.  These are different members
+	 * of a union inside mlx5_txq_obj.
+	 */
+	struct mlx5_devx_obj *sq_devx = txq_ctrl->obj->sq_obj.sq;
+	if (sq_devx == NULL) {
 		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
 			dev->data->port_id, queue_idx);
 		rte_errno = EINVAL;
@@ -1420,7 +1433,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 		sq_attr.state = MLX5_SQC_STATE_RDY;
 		sq_attr.rl_update = 1;
 		sq_attr.packet_pacing_rate_limit_index = 0;
-		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+		ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
 		if (ret) {
 			DRV_LOG(ERR,
 				"Port %u Tx queue %u failed to clear rate.",
@@ -1444,7 +1457,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	sq_attr.state = MLX5_SQC_STATE_RDY;
 	sq_attr.rl_update = 1;
 	sq_attr.packet_pacing_rate_limit_index = new_rl.pp_id;
-	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
+	ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
 	if (ret) {
 		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
 			dev->data->port_id, queue_idx, tx_rate);
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 7acfdae97d..698d7d2032 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -420,6 +420,36 @@ __rte_experimental
 int
 rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const char *filename);
 
+/**
+ * Per-queue rate limit information.
+ */
+struct rte_pmd_mlx5_txq_rate_limit_info {
+	uint32_t rate_mbps;	/**< Configured rate in Mbps, 0 = disabled. */
+	uint16_t pp_index;	/**< PP index from driver state. */
+	uint16_t fw_pp_index;	/**< PP index read back from FW SQ context. */
+};
+
+/**
+ * Query per-queue rate limit state for a given Tx queue.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[in] queue_id
+ *   Tx queue ID.
+ * @param[out] info
+ *   Rate limit information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: invalid queue_id.
+ *   - -EIO: FW query failed.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 08/9] ethdev: add getter for per-queue Tx rate limit
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (6 preceding siblings ...)
  2026-03-12 22:01   ` [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 15:44     ` Slava Ovsiienko
  2026-03-12 22:01   ` [PATCH v3 09/9] net/mlx5: add rate table capacity query API Vincent Jardin
                     ` (2 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

The existing rte_eth_set_queue_rate_limit() API allows setting a
per-queue Tx rate but provides no way to read it back. Applications
such as grout are forced to maintain a shadow copy of the rate to
be able to report it.

Add rte_eth_get_queue_rate_limit() as the symmetric getter, following
the established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
rte_eth_dev_set_vlan_offload/get_vlan_offload).

This adds:
- eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
- rte_eth_get_queue_rate_limit() public experimental API (26.07)
- mlx5 PMD implementation reading from the existing per-queue
  rate_mbps tracking field

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 app/test-pmd/cmdline.c      | 69 +++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.c     |  2 ++
 drivers/net/mlx5/mlx5_tx.h  |  2 ++
 drivers/net/mlx5/mlx5_txq.c | 30 ++++++++++++++++
 lib/ethdev/ethdev_driver.h  |  7 ++++
 lib/ethdev/rte_ethdev.c     | 28 +++++++++++++++
 lib/ethdev/rte_ethdev.h     | 24 +++++++++++++
 7 files changed, 162 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c33c66f327..ee532984e8 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8912,6 +8912,74 @@ static cmdline_parse_inst_t cmd_queue_rate_limit = {
 	},
 };
 
+/* *** SHOW RATE LIMIT FOR A QUEUE OF A PORT *** */
+struct cmd_show_queue_rate_limit_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	uint16_t port_num;
+	cmdline_fixed_string_t queue;
+	uint16_t queue_num;
+	cmdline_fixed_string_t rate;
+};
+
+static void cmd_show_queue_rate_limit_parsed(void *parsed_result,
+		__rte_unused struct cmdline *cl,
+		__rte_unused void *data)
+{
+	struct cmd_show_queue_rate_limit_result *res = parsed_result;
+	uint32_t tx_rate = 0;
+	int ret;
+
+	ret = rte_eth_get_queue_rate_limit(res->port_num, res->queue_num,
+					   &tx_rate);
+	if (ret) {
+		fprintf(stderr, "Get queue rate limit failed: %s\n",
+			rte_strerror(-ret));
+		return;
+	}
+	if (tx_rate)
+		printf("Port %u Queue %u rate limit: %u Mbps\n",
+		       res->port_num, res->queue_num, tx_rate);
+	else
+		printf("Port %u Queue %u rate limit: disabled\n",
+		       res->port_num, res->queue_num);
+}
+
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				show, "show");
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				port, "port");
+static cmdline_parse_token_num_t cmd_show_queue_rate_limit_portnum =
+	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				port_num, RTE_UINT16);
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_queue =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				queue, "queue");
+static cmdline_parse_token_num_t cmd_show_queue_rate_limit_queuenum =
+	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				queue_num, RTE_UINT16);
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				rate, "rate");
+
+static cmdline_parse_inst_t cmd_show_queue_rate_limit = {
+	.f = cmd_show_queue_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "show port <port_id> queue <queue_id> rate: "
+		"Show rate limit for a queue on port_id",
+	.tokens = {
+		(void *)&cmd_show_queue_rate_limit_show,
+		(void *)&cmd_show_queue_rate_limit_port,
+		(void *)&cmd_show_queue_rate_limit_portnum,
+		(void *)&cmd_show_queue_rate_limit_queue,
+		(void *)&cmd_show_queue_rate_limit_queuenum,
+		(void *)&cmd_show_queue_rate_limit_rate,
+		NULL,
+	},
+};
+
 /* *** SET RATE LIMIT FOR A VF OF A PORT *** */
 struct cmd_vf_rate_limit_result {
 	cmdline_fixed_string_t set;
@@ -14198,6 +14266,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	&cmd_set_uc_all_hash_filter,
 	&cmd_vf_mac_addr_filter,
 	&cmd_queue_rate_limit,
+	&cmd_show_queue_rate_limit,
 	&cmd_tunnel_udp_config,
 	&cmd_showport_rss_hash,
 	&cmd_showport_rss_hash_key,
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index f399e0d5c9..6e21ed31f3 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2721,6 +2721,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2815,6 +2816,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 3a37f5bb4d..46e199d93e 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -224,6 +224,8 @@ int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
 int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 			      uint32_t tx_rate);
+int mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t *tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index f0881f3560..c81542113e 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1473,6 +1473,36 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 }
 
+/**
+ * Get per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param[out] tx_rate
+ *   Pointer to store the TX rate in Mbps, 0 if rate limiting is disabled.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t *tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *txq_ctrl;
+
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	*tx_rate = txq_ctrl->rl.rate_mbps;
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 1255cd6f2c..0f336f9567 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -762,6 +762,11 @@ typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
 				uint16_t queue_idx,
 				uint32_t tx_rate);
 
+/** @internal Get queue Tx rate. */
+typedef int (*eth_get_queue_rate_limit_t)(struct rte_eth_dev *dev,
+				uint16_t queue_idx,
+				uint32_t *tx_rate);
+
 /** @internal Add tunneling UDP port. */
 typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *tunnel_udp);
@@ -1522,6 +1527,8 @@ struct eth_dev_ops {
 
 	/** Set queue rate limit */
 	eth_set_queue_rate_limit_t set_queue_rate_limit;
+	/** Get queue rate limit */
+	eth_get_queue_rate_limit_t get_queue_rate_limit;
 
 	/** Configure RSS hash protocols and hashing key */
 	rss_hash_update_t          rss_hash_update;
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 2edc7a362e..c6ad399033 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5694,6 +5694,34 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	return ret;
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_get_queue_rate_limit, 26.07)
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+					uint32_t *tx_rate)
+{
+	struct rte_eth_dev *dev;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (tx_rate == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: NULL tx_rate pointer",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: invalid queue ID=%u",
+			port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	if (dev->dev_ops->get_queue_rate_limit == NULL)
+		return -ENOTSUP;
+	return eth_err(port_id, dev->dev_ops->get_queue_rate_limit(dev, queue_idx, tx_rate));
+}
+
 RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_rx_avail_thresh_set, 22.07)
 int rte_eth_rx_avail_thresh_set(uint16_t port_id, uint16_t queue_id,
 			       uint8_t avail_thresh)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 0d8e2d0236..e525217b77 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4817,6 +4817,30 @@ int rte_eth_dev_uc_all_hash_table_set(uint16_t port_id, uint8_t on);
 int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 			uint32_t tx_rate);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Get the rate limitation for a queue on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_idx
+ *   The queue ID.
+ * @param[out] tx_rate
+ *   A pointer to retrieve the Tx rate in Mbps.
+ *   0 means rate limiting is disabled.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support this feature.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+			uint32_t *tx_rate);
+
 /**
  * Configuration of Receive Side Scaling hash computation of Ethernet device.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v3 09/9] net/mlx5: add rate table capacity query API
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (7 preceding siblings ...)
  2026-03-12 22:01   ` [PATCH v3 08/9] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-12 22:01   ` Vincent Jardin
  2026-03-20 15:49     ` Slava Ovsiienko
  2026-03-16 16:04   ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-12 22:01 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, Vincent Jardin

Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet
pacing rate table size and how many entries are currently in use.

The total comes from the HCA QoS capability
packet_pacing_rate_table_size. The used count is derived by
collecting unique non-zero PP indices across all TX queues.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_tx.c      | 64 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h | 32 +++++++++++++++++
 2 files changed, 96 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index fa57d3ef98..968aceb24f 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -19,6 +19,7 @@
 
 #include <mlx5_prm.h>
 #include <mlx5_common.h>
+#include <mlx5_malloc.h>
 
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -886,3 +887,66 @@ int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				     packet_pacing_rate_limit_index);
 	return 0;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query, 26.07)
+int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				     struct rte_pmd_mlx5_pp_rate_table_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint16_t used = 0;
+	uint16_t *seen;
+	unsigned int i;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	info->total = priv->sh->cdev->config.hca_attr.qos.packet_pacing_rate_table_size;
+	if (priv->txqs == NULL || priv->txqs_n == 0) {
+		info->used = 0;
+		return 0;
+	}
+	seen = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+			   priv->txqs_n * sizeof(*seen), 0, SOCKET_ID_ANY);
+	if (seen == NULL)
+		return -ENOMEM;
+	/*
+	 * Count unique non-zero PP indices across this port's TX queues.
+	 * Note: the count reflects only queues on this port; other ports
+	 * sharing the same device may also consume rate table entries.
+	 */
+	for (i = 0; i < priv->txqs_n; i++) {
+		struct mlx5_txq_data *txq_data;
+		struct mlx5_txq_ctrl *txq_ctrl;
+		uint16_t pp_id;
+		uint16_t j;
+		bool dup;
+
+		if ((*priv->txqs)[i] == NULL)
+			continue;
+		txq_data = (*priv->txqs)[i];
+		txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+		pp_id = txq_ctrl->rl.pp_id;
+		if (pp_id == 0)
+			continue;
+		dup = false;
+		for (j = 0; j < used; j++) {
+			if (seen[j] == pp_id) {
+				dup = true;
+				break;
+			}
+		}
+		if (!dup)
+			seen[used++] = pp_id;
+	}
+	mlx5_free(seen);
+	info->used = used;
+	return 0;
+}
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 698d7d2032..f7970dd7fb 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -450,6 +450,38 @@ int
 rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
 
+/**
+ * Packet pacing rate table capacity information.
+ */
+struct rte_pmd_mlx5_pp_rate_table_info {
+	uint16_t total;		/**< Total HW rate table entries. */
+	uint16_t used;		/**< Currently allocated entries. */
+};
+
+/**
+ * Query packet pacing rate table capacity.
+ *
+ * The ``used`` count reflects only the queried port's TX queues.
+ * Other ports sharing the same physical device may also consume
+ * rate table entries that are not included in this count.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[out] info
+ *   Rate table capacity information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: info is NULL.
+ *   - -ENOTSUP: packet pacing not supported.
+ *   - -ENOMEM: allocation failure.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				 struct rte_pmd_mlx5_pp_rate_table_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (8 preceding siblings ...)
  2026-03-12 22:01   ` [PATCH v3 09/9] net/mlx5: add rate table capacity query API Vincent Jardin
@ 2026-03-16 16:04   ` Stephen Hemminger
  2026-03-22 14:16     ` Vincent Jardin
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
  10 siblings, 1 reply; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-16 16:04 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan

On Thu, 12 Mar 2026 23:01:11 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
> hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
> ethdev API to read back the configured rate.
> 
> Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
> rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a
> dedicated PP index per rate from the HW rate table, programs it into the
> SQ via modify_sq, and shares identical rates across queues to conserve
> table entries. A PMD-specific API exposes per-queue PP diagnostics and
> rate table capacity.
> 
> Patch breakdown:
> 
>   1. doc/nics/mlx5: fix stale packet pacing documentation
>   2-3. common/mlx5: query PP capabilities and extend SQ modify
>   4-6. net/mlx5: per-queue PP infrastructure, rate_limit callback,
>        burst pacing devargs (tx_burst_bound, tx_typical_pkt_sz)
>   7. net/mlx5: testpmd command to query per-queue rate state
>   8. ethdev: add rte_eth_get_queue_rate_limit() symmetric getter
>        + testpmd "show port <id> queue <id> rate" command
>   9. net/mlx5: rate table capacity query API
> 
> Usage with testpmd:
>   set port 0 queue 0 rate 1000
>   set port 0 queue 1 rate 5000
>   set port 0 queue 0 rate 0      # disable
>   show port 0 queue 0 rate       # generic ethdev query
>   mlx5 port 0 txq 0 rate show    # mlx5 PMD-specific query
> 
> Changes since v2:
> 
> Patch 4 (per-queue packet pacing infrastructure):
>   - Folded "share pacing rate table entries across queues" into
>     this patch (was a separate patch in v2)
> 
> Patch 5 (support per-queue rate limiting):
>   - Remove redundant queue_idx >= nb_tx_queues check (ethdev
>     layer already validates before calling the PMD callback)
> 
> Patch 8 (ethdev getter):
>   - Add testpmd "show port <id> queue <id> rate" command
>     in app/test-pmd/cmdline.c using rte_eth_get_queue_rate_limit()
>   - Drop release notes (targeting 26.07, not 26.03)
>   - Remove redundant queue_idx bounds check from mlx5 getter
> 
> Patch 9 (rate table capacity query):
>   - Use MLX5_MEM_SYS flag in mlx5_malloc() for system memory
>   - Minor code style cleanups (line wrapping, cast formatting)
> 
> Changes since v1:
> 
> Addressed review feedback from Stephen Hemminger's AI:
> 
> Patch 4 (per-queue packet pacing infrastructure):
>   - Validate rate_mbps against HCA packet_pacing_min_rate and
>     packet_pacing_max_rate bounds; return -ERANGE on out-of-range
>   - Widen rate_kbps from uint32_t to uint64_t to prevent
>     overflow on rate_mbps * 1000
>   - Remove early mlx5_txq_free_pp_rate_limit() call from the
>     allocator (moved to caller, see patch 5)
> 
> Patch 5 (support per-queue rate limiting):
>   - Fix PP index leak on modify_sq failure: allocate new PP into a
>     temporary struct mlx5_txq_rate_limit; only swap into txq_ctrl->rl
>     after modify_sq succeeds. On failure the old PP context stays intact.
>   - Set rte_errno = -ret before returning errors from both the
>     disable (tx_rate=0) and enable paths
> 
> Patch 7 (testpmd command to query per-queue rate limit):
>   - Fix inverted rte_eth_tx_queue_is_valid() return value check:
>     was "if (rte_eth_tx_queue_is_valid(...))" (accepts invalid queues),
>     changed to "if (rte_eth_tx_queue_is_valid(...) != 0)"
> 
> Patch 9 (rate table capacity query, was patch 10 in v1):
>   - Replace uint16_t seen[RTE_MAX_QUEUES_PER_PORT] (2 KB stack array)
>     with heap-allocated mlx5_malloc(priv->txqs_n, ...) + mlx5_free()
>   - Add early return when txqs == NULL || txqs_n == 0
>   - Document in the API Doxygen that "used" reflects only the queried
>     port's queues; other ports on the same device may also consume
>     rate table entries
>   - Add -ENOMEM to documented return values
> 
> Hardware tested:
>   - ConnectX-6 Dx (packet pacing with MLX5_DATA_RATE)
> 
> Vincent Jardin (9):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: add rate table capacity query API
> 
>  app/test-pmd/cmdline.c               |  69 +++++++++++++
>  doc/guides/nics/mlx5.rst             | 125 ++++++++++++++++++------
>  drivers/common/mlx5/mlx5_devx_cmds.c |  20 ++++
>  drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
>  drivers/net/mlx5/mlx5.c              |  46 +++++++++
>  drivers/net/mlx5/mlx5.h              |  13 +++
>  drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c           | 104 +++++++++++++++++++-
>  drivers/net/mlx5/mlx5_tx.h           |   5 +
>  drivers/net/mlx5/mlx5_txpp.c         |  85 ++++++++++++++++
>  drivers/net/mlx5/mlx5_txq.c          | 141 +++++++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h      |  62 ++++++++++++
>  lib/ethdev/ethdev_driver.h           |   7 ++
>  lib/ethdev/rte_ethdev.c              |  28 ++++++
>  lib/ethdev/rte_ethdev.h              |  24 +++++
>  15 files changed, 802 insertions(+), 33 deletions(-)
> 

Only minor things left to address.

Error (1):

Patch 5 — git bisect breakage with obj->sq vs obj->sq_obj.sq. Patch 5 accesses the SQ DevX object via txq_ctrl->obj->sq, which is the hairpin union member. For non-hairpin DevX/HWS queues the correct field is obj->sq_obj.sq. Patch 7 fixes this, but patch 5 is broken as submitted — each commit needs to be independently correct for bisect. The sq_obj.sq change should be moved into patch 5.

Warnings (3):

Patch 8: mlx5_get_queue_rate_limit() doesn't bounds-check queue_idx before array access (the ethdev layer does, but the set path checks it too — inconsistent). Also missing release notes for the new ethdev API, and the new eth_dev_ops member needs ethdev maintainer agreement.
Patch 9: The "used" count is per-port but the API name suggests device-wide — consider renaming or iterating all ports on the shared context.


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update
  2026-03-12 22:01   ` [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
@ 2026-03-20 12:01     ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 12:01 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi,

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit
> update
> 
> Add rl_update and packet_pacing_rate_limit_index fields to
> mlx5_devx_modify_sq_attr. When rl_update is set, the modify SQ command
> sets modify_bitmask bit 0 and writes the PP index into the SQ context, allowing
> dynamic rate changes on a live RDY SQ without teardown.
> 
> modify_sq_in.modify_bitmask[0x40] bit 0 controls the
> packet_pacing_rate_limit_index.
> 
> Supported hardware:
> - ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
> - ConnectX-7/8: same SQ context field, also supports wait-on-time
> - BlueField-2/3: same modify_sq command support
> 
> Not supported:
> - ConnectX-5: supports packet_pacing but only at SQ creation time,
>   dynamic modify_bitmask update may not be supported on all FW
> - ConnectX-4 Lx and earlier: no packet_pacing support
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/common/mlx5/mlx5_devx_cmds.c | 5 +++++
> drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++
>  2 files changed, 8 insertions(+)
> 
> diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c
> b/drivers/common/mlx5/mlx5_devx_cmds.c
> index 8f53303fa7..17378e1753 100644
> --- a/drivers/common/mlx5/mlx5_devx_cmds.c
> +++ b/drivers/common/mlx5/mlx5_devx_cmds.c
> @@ -2129,6 +2129,11 @@ mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj
> *sq,
>  	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
>  	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
>  	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr-
> >hairpin_peer_vhca);
> +	if (sq_attr->rl_update) {
> +		MLX5_SET64(modify_sq_in, in, modify_bitmask, 1);

1. Please define enum for the bit: MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX
(please see MLX5_MODIFY_RQ_IN_MODIFY_xxx enum). Also no objections to define enums for other
MLX5_MODIFY_SQ_IN_MODIFY_BITMASK _xxx.

2. Please modify only the involved bits in the bitmask:

uint64_t bitmask;

bitmask = MLX5_GET64(modify_sq_in, in, modify_bitmask);
bitmask |= MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX;
MLX5_SET64(modify_sq_in, in, modify_bitmask, bitmask);

This would improve compatibility for the case bitmask is set somewhere before the "if (sq_attr->rl_update)".

With best regards,
Slava


> +		MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
> +			 sq_attr->packet_pacing_rate_limit_index);
> +	}
>  	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
>  					 out, sizeof(out));
>  	if (ret) {
> diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h
> b/drivers/common/mlx5/mlx5_devx_cmds.h
> index 930ae2c072..82d949972b 100644
> --- a/drivers/common/mlx5/mlx5_devx_cmds.h
> +++ b/drivers/common/mlx5/mlx5_devx_cmds.h
> @@ -519,6 +519,9 @@ struct mlx5_devx_modify_sq_attr {
>  	uint32_t state:4;
>  	uint32_t hairpin_peer_rq:24;
>  	uint32_t hairpin_peer_vhca:16;
> +	uint32_t rl_update:1;
> +	/* Set to update packet_pacing_rate_limit_index on a live SQ. */
> +	uint32_t packet_pacing_rate_limit_index:16;
>  };
> 
> 
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities
  2026-03-12 22:01   ` [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
@ 2026-03-20 12:02     ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 12:02 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 02/9] common/mlx5: query packet pacing rate table
> capabilities
> 
> Query additional QoS packet pacing capabilities from HCA attributes:
> - packet_pacing_burst_bound: HW supports burst_upper_bound parameter
> - packet_pacing_typical_size: HW supports typical_packet_size parameter
> - packet_pacing_max_rate / packet_pacing_min_rate: rate range in kbps
> - packet_pacing_rate_table_size: number of HW rate table entries
> 
> These capabilities are needed by the upcoming per-queue rate limiting feature
> to validate devarg values and report HW limits.
> 
> Supported hardware:
> - ConnectX-6 Dx and later (different boards expose different subsets)
> - ConnectX-5 reports packet_pacing but not all extended fields
> - ConnectX-7/8 report the full capability set
> - BlueField-2 and later DPUs also report these capabilities
> 
> Not supported:
> - ConnectX-4 Lx and earlier (no packet_pacing capability at all)
> - ConnectX-5 Ex may not report burst_bound or typical_size
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/common/mlx5/mlx5_devx_cmds.c | 15 +++++++++++++++
> drivers/common/mlx5/mlx5_devx_cmds.h | 11 ++++++++++-
>  2 files changed, 25 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c
> b/drivers/common/mlx5/mlx5_devx_cmds.c
> index d12ebf8487..8f53303fa7 100644
> --- a/drivers/common/mlx5/mlx5_devx_cmds.c
> +++ b/drivers/common/mlx5/mlx5_devx_cmds.c
> @@ -1244,6 +1244,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
>  				MLX5_GET(qos_cap, hcattr, packet_pacing);
>  		attr->qos.wqe_rate_pp =
>  				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
> +		attr->qos.packet_pacing_burst_bound =
> +				MLX5_GET(qos_cap, hcattr,
> +					packet_pacing_burst_bound);
> +		attr->qos.packet_pacing_typical_size =
> +				MLX5_GET(qos_cap, hcattr,
> +					packet_pacing_typical_size);
> +		attr->qos.packet_pacing_max_rate =
> +				MLX5_GET(qos_cap, hcattr,
> +					packet_pacing_max_rate);
> +		attr->qos.packet_pacing_min_rate =
> +				MLX5_GET(qos_cap, hcattr,
> +					packet_pacing_min_rate);
> +		attr->qos.packet_pacing_rate_table_size =
> +				MLX5_GET(qos_cap, hcattr,
> +					packet_pacing_rate_table_size);
>  		if (attr->qos.flow_meter_aso_sup) {
>  			attr->qos.log_meter_aso_granularity =
>  				MLX5_GET(qos_cap, hcattr,
> diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h
> b/drivers/common/mlx5/mlx5_devx_cmds.h
> index da50fc686c..930ae2c072 100644
> --- a/drivers/common/mlx5/mlx5_devx_cmds.h
> +++ b/drivers/common/mlx5/mlx5_devx_cmds.h
> @@ -67,7 +67,16 @@ struct mlx5_hca_qos_attr {
>  	/* Power of the maximum allocation granularity Object. */
>  	uint32_t log_max_num_meter_aso:5;
>  	/* Power of the maximum number of supported objects. */
> -
> +	uint32_t packet_pacing_burst_bound:1;
> +	/* HW supports burst_upper_bound PP parameter. */
> +	uint32_t packet_pacing_typical_size:1;
> +	/* HW supports typical_packet_size PP parameter. */
> +	uint32_t packet_pacing_max_rate;
> +	/* Maximum supported pacing rate in kbps. */
> +	uint32_t packet_pacing_min_rate;
> +	/* Minimum supported pacing rate in kbps. */
> +	uint16_t packet_pacing_rate_table_size;
> +	/* Number of entries in the HW rate table. */
>  };
> 
>  struct mlx5_hca_vdpa_attr {
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-12 22:01   ` [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-20 12:51     ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 12:51 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org, Saloni Pipada
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi,

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure
> 
> Add mlx5_txq_rate_limit structure and alloc/free helpers for per-queue data-
> rate packet pacing. Each Tx queue can now hold its own PP (Packet Pacing)
> index allocated via mlx5dv_pp_alloc() with MLX5_DATA_RATE mode.
> 
> mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM rate_limit
> field and allocates a PP index from the HW rate table.
> mlx5_txq_free_pp_rate_limit() releases it.
> 
> PP allocation uses shared mode (flags=0) so that the kernel mlx5 driver can
> reuse a single HW rate table entry for all PP contexts with identical parameters
> (rate, burst, packet size). This avoids exhausting the rate table (typically 128
> entries on ConnectX-6 Dx) when many queues share the same rate. Each queue
> still gets its own PP handle for proper cleanup.
> 
> The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is untouched — it
> uses MLX5_WQE_RATE for per-packet scheduling with a dedicated index, while
> per-queue rate limiting uses MLX5_DATA_RATE.
> 
> PP index cleanup is added to mlx5_txq_release() to prevent leaks when queues
> are destroyed.
> 
> Supported hardware:
> - ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
> - ConnectX-7/8: same mechanism, plus wait-on-time coexistence
> - BlueField-2/3: same PP allocation support
> 
> Not supported:
> - ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
>   not be available on all firmware versions
> - ConnectX-4 Lx and earlier: no packet_pacing capability
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5.h      | 11 ++++++
>  drivers/net/mlx5/mlx5_tx.h   |  1 +
>  drivers/net/mlx5/mlx5_txpp.c | 73
> ++++++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5_txq.c  |  1 +
>  4 files changed, 86 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> b83dda5652..c48c3072d1 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -1296,6 +1296,13 @@ struct mlx5_txpp_ts {
>  	RTE_ATOMIC(uint64_t) ts;
>  };
> 
> +/* Per-queue rate limit tracking. */
> +struct mlx5_txq_rate_limit {
> +	void *pp;		/* Packet pacing context from dv_alloc_pp. */
> +	uint16_t pp_id;		/* Packet pacing index. */
> +	uint32_t rate_mbps;	/* Current rate in Mbps, 0 = disabled. */
> +};
> +
>  /* Tx packet pacing structure. */
>  struct mlx5_dev_txpp {
>  	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */ @@ -
> 2634,6 +2641,10 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev
> *dev,  void mlx5_txpp_interrupt_handler(void *cb_arg);  int
> mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev);  void
> mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev);
> +int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
> +				 struct mlx5_txq_rate_limit *rl,
> +				 uint32_t rate_mbps);
> +void mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl);
> 
>  /* mlx5_rxtx.c */
> 
> diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h index
> 0134a2e003..b1b3653247 100644
> --- a/drivers/net/mlx5/mlx5_tx.h
> +++ b/drivers/net/mlx5/mlx5_tx.h
> @@ -192,6 +192,7 @@ struct mlx5_txq_ctrl {
>  	uint16_t dump_file_n; /* Number of dump files. */
>  	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
>  	uint32_t hairpin_status; /* Hairpin binding status. */
> +	struct mlx5_txq_rate_limit rl; /* Per-queue rate limit. */

Could we use "rate_limit" naming instead of "rl" ?
"rl" is a little bit out of the current naming style (please, see the other struct member naming).

>  	struct mlx5_txq_data txq; /* Data path structure. */
>  	/* Must be the last field in the structure, contains elts[]. */  }; diff --git
> a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c index
> 0e99b58bde..0a883b0a94 100644
> --- a/drivers/net/mlx5/mlx5_txpp.c
> +++ b/drivers/net/mlx5/mlx5_txpp.c
> @@ -128,6 +128,79 @@ mlx5_txpp_alloc_pp_index(struct
> mlx5_dev_ctx_shared *sh)  #endif  }
> 
> +/* Free a per-queue packet pacing index. */ void
> +mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rl) { #ifdef
> +HAVE_MLX5DV_PP_ALLOC
> +	if (rl->pp) {
> +		mlx5_glue->dv_free_pp(rl->pp);
> +		rl->pp = NULL;
> +		rl->pp_id = 0;
> +		rl->rate_mbps = 0;
> +	}
> +#else
> +	RTE_SET_USED(rl);
> +#endif
> +}
> +
> +/* Allocate a per-queue packet pacing index for data-rate limiting. */
> +int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
> +			     struct mlx5_txq_rate_limit *rl,
> +			     uint32_t rate_mbps)
> +{
> +#ifdef HAVE_MLX5DV_PP_ALLOC
> +	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
> +	uint64_t rate_kbps;
> +	struct mlx5_hca_qos_attr *qos = &sh->cdev->config.hca_attr.qos;
> +
> +	MLX5_ASSERT(rate_mbps > 0);

Should we check the rate_mbps in non-debug environment?

> +	rate_kbps = (uint64_t)rate_mbps * 1000;
> +	if (qos->packet_pacing_min_rate && rate_kbps < qos-
> >packet_pacing_min_rate) {
> +		DRV_LOG(ERR, "Rate %u Mbps below HW minimum (%u
> kbps).",
> +			rate_mbps, qos->packet_pacing_min_rate);
> +		rte_errno = ERANGE;
> +		return -ERANGE;
> +	}
> +	if (qos->packet_pacing_max_rate && rate_kbps > qos-
> >packet_pacing_max_rate) {
> +		DRV_LOG(ERR, "Rate %u Mbps exceeds HW maximum (%u
> kbps).",
> +			rate_mbps, qos->packet_pacing_max_rate);
> +		rte_errno = ERANGE;
> +		return -ERANGE;
> +	}
> +	memset(&pp, 0, sizeof(pp));
> +	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit,
> (uint32_t)rate_kbps);
> +	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode,
> MLX5_DATA_RATE);
> +	rl->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp), &pp, 0);
> +	if (rl->pp == NULL) {
> +		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
> +			rate_mbps);
> +		rte_errno = errno;
> +		return -errno;
> +	}
> +	rl->pp_id = ((struct mlx5dv_pp *)rl->pp)->index;
> +	if (!rl->pp_id) {
> +		DRV_LOG(ERR, "Zero PP index allocated for rate %u Mbps.",
> +			rate_mbps);
> +		mlx5_txq_free_pp_rate_limit(rl);
> +		rte_errno = ENOTSUP;
> +		return -ENOTSUP;
> +	}
> +	rl->rate_mbps = rate_mbps;
> +	DRV_LOG(DEBUG, "Allocated PP index %u for rate %u Mbps.",
> +		rl->pp_id, rate_mbps);
> +	return 0;
> +#else
> +	RTE_SET_USED(sh);
> +	RTE_SET_USED(rl);
> +	RTE_SET_USED(rate_mbps);
> +	DRV_LOG(ERR, "Per-queue rate limit requires rdma-core PP support.");
> +	rte_errno = ENOTSUP;
> +	return -ENOTSUP;
> +#endif
> +}
> +
>  static void
>  mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)  { diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> 9275efb58e..fa9bb48fd4 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1338,6 +1338,7 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t
> idx)
>  	txq_ctrl = container_of((*priv->txqs)[idx], struct mlx5_txq_ctrl, txq);
>  	if (rte_atomic_fetch_sub_explicit(&txq_ctrl->refcnt, 1,
> rte_memory_order_relaxed) - 1 > 1)
>  		return 1;
> +	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);

This might be a problematic teardown order.
Would be better to release rate_limit object AFTER queue destroying (in txq_obj_release).

>  	if (txq_ctrl->obj) {
>  		priv->obj_ops.txq_obj_release(txq_ctrl->obj);
>  		LIST_REMOVE(txq_ctrl->obj, next);
> --
> 2.43.0

With best regards,
Slava

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 05/9] net/mlx5: support per-queue rate limiting
  2026-03-12 22:01   ` [PATCH v3 05/9] net/mlx5: support per-queue rate limiting Vincent Jardin
@ 2026-03-20 15:11     ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 15:11 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi,

Should we also update "rate limitations" supported by mlx5 in the feature matrix ?

Also, we shouId  test the code with dv_flow_en=0 devarg setting, this forces mlx5 to use
legacy rdma-core Verbs API to handle queues, and this test ensures there is no unpleasant
compatibility surprise.

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 05/9] net/mlx5: support per-queue rate limiting
> 
> Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback allocates a
> per-queue PP index with the requested data rate, then modifies the live SQ via
> modify_bitmask bit 0 to apply the new packet_pacing_rate_limit_index — no
> queue teardown required.
> 
> Setting tx_rate=0 clears the PP index on the SQ and frees it.
> 
> Capability check uses hca_attr.qos.packet_pacing directly (not dev_cap.txpp_en
> which requires Clock Queue prerequisites). This allows per-queue rate limiting
> without the tx_pp devarg.
> 
> The callback rejects hairpin queues and queues whose SQ is not yet created.
> 
> testpmd usage (no testpmd changes needed):
>   set port 0 queue 0 rate 1000
>   set port 0 queue 1 rate 5000
>   set port 0 queue 0 rate 0     # disable
> 
> Supported hardware:
> - ConnectX-6 Dx: full support, per-SQ rate via HW rate table
> - ConnectX-7/8: full support, coexists with wait-on-time scheduling
> - BlueField-2/3: full support as DPU rep ports
> 
> Not supported:
> - ConnectX-5: packet_pacing exists but dynamic SQ modify may not
>   work on all firmware versions
> - ConnectX-4 Lx and earlier: no packet_pacing capability
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5.c     |  2 +
>  drivers/net/mlx5/mlx5_tx.h  |  2 +
>  drivers/net/mlx5/mlx5_txq.c | 97
> +++++++++++++++++++++++++++++++++++++
>  3 files changed, 101 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 4d3bfddc36..c390406ac7 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -2690,6 +2690,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
>  	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
>  	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
>  	.get_restore_flags = mlx5_get_restore_flags,
> +	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
>  };
> 
>  /* Available operations from secondary process. */ @@ -2783,6 +2784,7 @@
> const struct eth_dev_ops mlx5_dev_ops_isolate = {
>  	.count_aggr_ports = mlx5_count_aggr_ports,
>  	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
>  	.get_restore_flags = mlx5_get_restore_flags,
> +	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
>  };
> 
>  /**
> diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h index
> b1b3653247..3a37f5bb4d 100644
> --- a/drivers/net/mlx5/mlx5_tx.h
> +++ b/drivers/net/mlx5/mlx5_tx.h
> @@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev
> *dev, uint16_t idx);  int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t
> idx);  int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);  int
> mlx5_txq_verify(struct rte_eth_dev *dev);
> +int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			      uint32_t tx_rate);
>  int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);  void
> mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);  void
> mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl); diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> fa9bb48fd4..f2ed2454a0 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1363,6 +1363,103 @@ mlx5_txq_release(struct rte_eth_dev *dev,
> uint16_t idx)
>  	return 0;
>  }
> 
> +/**
> + * Set per-queue packet pacing rate limit.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param queue_idx
> + *   TX queue index.
> + * @param tx_rate
> + *   TX rate in Mbps, 0 to disable rate limiting.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			  uint32_t tx_rate)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_dev_ctx_shared *sh = priv->sh;
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
> +	int ret;
> +
> +	if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
> +		DRV_LOG(ERR, "Port %u packet pacing not supported.",
> +			dev->data->port_id);
> +		rte_errno = ENOTSUP;
> +		return -rte_errno;
> +	}
> +	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	txq_ctrl = container_of((*priv->txqs)[queue_idx],
> +				struct mlx5_txq_ctrl, txq);
> +	if (txq_ctrl->is_hairpin) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	if (tx_rate == 0) {
> +		/* Disable rate limiting. */
> +		if (txq_ctrl->rl.pp_id == 0)
> +			return 0; /* Already disabled. */
> +		sq_attr.sq_state = MLX5_SQC_STATE_RDY;
> +		sq_attr.state = MLX5_SQC_STATE_RDY;

This might be an issue - queue status is subject to be changed in runtime:
- with queue start/stop API
- on Tx error handling queue gets restarted
 ( tx_recover_qp()
   mlx5_queue_state_modify()
   mlx5_queue_state_modify_primary()
   mlx5_txq_devx_modify - goes over states ERR->RST->RDY)
   
From other side, if queue status is not RDY, 
the SQ modify call should be rejected by firmware.
We should notice the set_queue_rate_limit() works on the non-stopped queue.
And it is recommended to stop sending traffic while setting the rate.

> +		sq_attr.rl_update = 1;
> +		sq_attr.packet_pacing_rate_limit_index = 0;
> +		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
> +		if (ret) {
> +			DRV_LOG(ERR,
> +				"Port %u Tx queue %u failed to clear rate.",
> +				dev->data->port_id, queue_idx);
> +			rte_errno = -ret;
> +			return ret;
> +		}
We failed to modify SQ, so the existing rl should be considered as still attached to the SQ.
We should not free it.

> +		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
> +		DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
> +			dev->data->port_id, queue_idx);
> +		return 0;
> +	}
> +	/* Allocate a new PP index for the requested rate into a temp. */
> +	struct mlx5_txq_rate_limit new_rl = { 0 };

Style nit - could we move the variable declaration to the beginning of the "{" block ?

> +
> +	ret = mlx5_txq_alloc_pp_rate_limit(sh, &new_rl, tx_rate);
> +	if (ret)
> +		return ret;
> +	/* Modify live SQ to use the new PP index. */
> +	sq_attr.sq_state = MLX5_SQC_STATE_RDY;
> +	sq_attr.state = MLX5_SQC_STATE_RDY;
> +	sq_attr.rl_update = 1;
> +	sq_attr.packet_pacing_rate_limit_index = new_rl.pp_id;
> +	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
> +	if (ret) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u
> Mbps.",
> +			dev->data->port_id, queue_idx, tx_rate);
> +		mlx5_txq_free_pp_rate_limit(&new_rl);
> +		rte_errno = -ret;
> +		return ret;
> +	}
> +	/* SQ updated — release old PP context, install new one. */
> +	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rl);
> +	txq_ctrl->rl = new_rl;
> +	DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx
> %u).",
> +		dev->data->port_id, queue_idx, tx_rate, txq_ctrl->rl.pp_id);
> +	return 0;
> +}
> +
>  /**
>   * Verify if the queue can be released.
>   *
> --
> 2.43.0

With best regards,
Slava


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 06/9] net/mlx5: add burst pacing devargs
  2026-03-12 22:01   ` [PATCH v3 06/9] net/mlx5: add burst pacing devargs Vincent Jardin
@ 2026-03-20 15:19     ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 15:19 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi,

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 06/9] net/mlx5: add burst pacing devargs
> 
> Expose burst_upper_bound and typical_packet_size from the PRM
> set_pp_rate_limit_context as devargs:
> - tx_burst_bound=<bytes>: max burst before rate evaluation kicks in
> - tx_typical_pkt_sz=<bytes>: typical packet size for accuracy
> 
> These parameters apply to both per-queue rate limiting
> (rte_eth_set_queue_rate_limit) and Clock Queue pacing (tx_pp).

Clock Queue is special facility to overcome ConnectX-6DX hardware limitations
and handle send scheduling. It uses WQE rate pacing and doe not need 
the tx_burst_bound and tx_typical_pkt_sz be set. Please update the commit
message and remove update of mlx5_txpp_alloc_pp_index().

> 
> Values are validated against HCA capabilities (packet_pacing_burst_bound and
> packet_pacing_typical_size).
> If the HW does not support them, a warning is logged and the value is silently
> zeroed. Test mode still overrides both values.
> 
> Shared context mismatch checks ensure all ports on the same device use the
> same burst parameters.
> 
> Supported hardware:
> - ConnectX-6 Dx: burst_upper_bound and typical_packet_size
>   reported via packet_pacing_burst_bound / packet_pacing_typical_size
>   QoS capability bits
> - ConnectX-7/8: full support for both parameters
> - BlueField-2/3: same capabilities as host-side ConnectX
> 
> Not supported:
> - ConnectX-5: may not report burst_bound or typical_size caps
> - ConnectX-4 Lx and earlier: no packet_pacing at all
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  doc/guides/nics/mlx5.rst     | 16 ++++++++++++++
>  drivers/net/mlx5/mlx5.c      | 42 ++++++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5.h      |  2 ++
>  drivers/net/mlx5/mlx5_txpp.c | 12 +++++++++++
>  4 files changed, 72 insertions(+)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 5b097dbc90..2507fae846 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -580,6 +580,22 @@ for an additional list of options shared with other
> mlx5 drivers.
>    (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
>    The default value is zero.
> 
> +- ``tx_burst_bound`` parameter [int]
> +
> +  Specifies the burst upper bound in bytes for packet pacing rate evaluation.
> +  When set, the hardware considers this burst size when enforcing the
> + configured  rate limit. Only effective when the HCA reports
> + ``packet_pacing_burst_bound``  capability. Applies to both per-queue
> + rate limiting
> +  (``rte_eth_set_queue_rate_limit()``) and Clock Queue pacing (``tx_pp``).
> +  The default value is zero (hardware default).
> +
> +- ``tx_typical_pkt_sz`` parameter [int]
> +
> +  Specifies the typical packet size in bytes for packet pacing rate
> + accuracy  improvement. Only effective when the HCA reports
> + ``packet_pacing_typical_size`` capability. Applies to both per-queue
> + rate  limiting and Clock Queue pacing. The default value is zero (hardware
> default).
> +
>  - ``tx_vec_en`` parameter [int]
> 
>    A nonzero value enables Tx vector with ConnectX-5 NICs and above.
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> c390406ac7..f399e0d5c9 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -119,6 +119,18 @@
>   */
>  #define MLX5_TX_SKEW "tx_skew"
> 
> +/*
> + * Device parameter to specify burst upper bound in bytes
> + * for packet pacing rate evaluation.
> + */
> +#define MLX5_TX_BURST_BOUND "tx_burst_bound"
> +
> +/*
> + * Device parameter to specify typical packet size in bytes
> + * for packet pacing rate accuracy improvement.
> + */
> +#define MLX5_TX_TYPICAL_PKT_SZ "tx_typical_pkt_sz"
> +
>  /*
>   * Device parameter to enable hardware Tx vector.
>   * Deprecated, ignored (no vectorized Tx routines anymore).
> @@ -1405,6 +1417,10 @@ mlx5_dev_args_check_handler(const char *key,
> const char *val, void *opaque)
>  		config->tx_pp = tmp;
>  	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
>  		config->tx_skew = tmp;
> +	} else if (strcmp(MLX5_TX_BURST_BOUND, key) == 0) {
> +		config->tx_burst_bound = tmp;
> +	} else if (strcmp(MLX5_TX_TYPICAL_PKT_SZ, key) == 0) {
> +		config->tx_typical_pkt_sz = tmp;
>  	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
>  		config->l3_vxlan_en = !!tmp;
>  	} else if (strcmp(MLX5_VF_NL_EN, key) == 0) { @@ -1518,8 +1534,10
> @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
>  				struct mlx5_sh_config *config)
>  {
>  	const char **params = (const char *[]){
> +		MLX5_TX_BURST_BOUND,
>  		MLX5_TX_PP,
>  		MLX5_TX_SKEW,
> +		MLX5_TX_TYPICAL_PKT_SZ,
>  		MLX5_L3_VXLAN_EN,
>  		MLX5_VF_NL_EN,
>  		MLX5_DV_ESW_EN,
> @@ -1626,6 +1644,18 @@ mlx5_shared_dev_ctx_args_config(struct
> mlx5_dev_ctx_shared *sh,
>  		DRV_LOG(WARNING,
>  			"\"tx_skew\" doesn't affect without \"tx_pp\".");
>  	}
> +	if (config->tx_burst_bound &&
> +	    !sh->cdev->config.hca_attr.qos.packet_pacing_burst_bound) {
> +		DRV_LOG(WARNING,
> +			"HW does not support burst_upper_bound,
> ignoring.");
> +		config->tx_burst_bound = 0;
> +	}
> +	if (config->tx_typical_pkt_sz &&
> +	    !sh->cdev->config.hca_attr.qos.packet_pacing_typical_size) {
> +		DRV_LOG(WARNING,
> +			"HW does not support typical_packet_size, ignoring.");
> +		config->tx_typical_pkt_sz = 0;
> +	}
>  	/* Check for LRO support. */
>  	if (mlx5_devx_obj_ops_en(sh) && sh->cdev->config.hca_attr.lro_cap) {
>  		/* TBD check tunnel lro caps. */
> @@ -3260,6 +3290,18 @@ mlx5_probe_again_args_validate(struct
> mlx5_common_device *cdev,
>  			sh->ibdev_name);
>  		goto error;
>  	}
> +	if (sh->config.tx_burst_bound != config->tx_burst_bound) {
> +		DRV_LOG(ERR, "\"tx_burst_bound\" "
> +			"configuration mismatch for shared %s context.",
> +			sh->ibdev_name);
> +		goto error;
> +	}
> +	if (sh->config.tx_typical_pkt_sz != config->tx_typical_pkt_sz) {
> +		DRV_LOG(ERR, "\"tx_typical_pkt_sz\" "
> +			"configuration mismatch for shared %s context.",
> +			sh->ibdev_name);
> +		goto error;
> +	}
>  	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
>  		DRV_LOG(ERR, "\"TxQ memory alignment\" "
>  			"configuration mismatch for shared %s context. %u -
> %u", diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> c48c3072d1..a8d71482ac 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -382,6 +382,8 @@ struct mlx5_port_config {  struct mlx5_sh_config {
>  	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
>  	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
> +	uint32_t tx_burst_bound; /* Burst upper bound in bytes, 0 = default. */
> +	uint32_t tx_typical_pkt_sz; /* Typical packet size in bytes, 0 =
> +default. */
>  	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
>  	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
>  	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */ diff --
> git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c index
> 0a883b0a94..756a772cc5 100644
> --- a/drivers/net/mlx5/mlx5_txpp.c
> +++ b/drivers/net/mlx5/mlx5_txpp.c

Please remove the diffs from mlx5_txpp_alloc_pp_index().

> @@ -88,6 +88,12 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared
> *sh)
>  	rate = NS_PER_S / sh->txpp.tick;
>  	if (rate * sh->txpp.tick != NS_PER_S)
>  		DRV_LOG(WARNING, "Packet pacing frequency is not
> precise.");
> +	if (sh->config.tx_burst_bound)
> +		MLX5_SET(set_pp_rate_limit_context, &pp,
> +			 burst_upper_bound, sh->config.tx_burst_bound);
> +	if (sh->config.tx_typical_pkt_sz)
> +		MLX5_SET(set_pp_rate_limit_context, &pp,
> +			 typical_packet_size, sh->config.tx_typical_pkt_sz);
>  	if (sh->txpp.test) {
>  		uint32_t len;
> 
> @@ -172,6 +178,12 @@ mlx5_txq_alloc_pp_rate_limit(struct
> mlx5_dev_ctx_shared *sh,
>  	memset(&pp, 0, sizeof(pp));
>  	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit,
> (uint32_t)rate_kbps);
>  	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode,
> MLX5_DATA_RATE);
> +	if (sh->config.tx_burst_bound)
> +		MLX5_SET(set_pp_rate_limit_context, &pp,
> +			 burst_upper_bound, sh->config.tx_burst_bound);
> +	if (sh->config.tx_typical_pkt_sz)
> +		MLX5_SET(set_pp_rate_limit_context, &pp,
> +			 typical_packet_size, sh->config.tx_typical_pkt_sz);
>  	rl->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp), &pp, 0);
>  	if (rl->pp == NULL) {
>  		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
> --
> 2.43.0

With best regards,
Slava


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-12 22:01   ` [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-20 15:38     ` Slava Ovsiienko
  2026-03-22 14:02       ` Vincent Jardin
  0 siblings, 1 reply; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 15:38 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi,

BTW, we have mlx5_tx_burst_mode_get(), all information about tx rate limit could be shown there.
(I do not insist on, JFYI).

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue
> rate limit
> 
> Add a new testpmd command to display the per-queue packet pacing rate limit
> state, including the PP index from both driver state and FW SQ context
> readback:
> 
>   testpmd> mlx5 port <port_id> txq <queue_id> rate show
> 
> This helps verify that the FW actually applied the PP index to the SQ after setting
> a per-queue rate limit.
> 
> Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that queries
> txq_ctrl->rl for driver state and mlx5_devx_cmd_query_sq() for the FW
> packet_pacing_rate_limit_index field.
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5_testpmd.c | 93
> +++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c      | 40 +++++++++++++-
>  drivers/net/mlx5/mlx5_txq.c     | 19 +++++--
>  drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++++++++++
>  4 files changed, 178 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_testpmd.c
> b/drivers/net/mlx5/mlx5_testpmd.c index 1bb5a89559..fd3efecc5d 100644
> --- a/drivers/net/mlx5/mlx5_testpmd.c
> +++ b/drivers/net/mlx5/mlx5_testpmd.c
> @@ -1365,6 +1365,94 @@ cmdline_parse_inst_t
> mlx5_cmd_dump_rq_context_options = {
>  	}
>  };
> 
> +/* Show per-queue rate limit PP index for a given port/queue */ struct
> +mlx5_cmd_show_rate_limit_options {
> +	cmdline_fixed_string_t mlx5;
> +	cmdline_fixed_string_t port;
> +	portid_t port_id;
> +	cmdline_fixed_string_t txq;
> +	queueid_t queue_id;
> +	cmdline_fixed_string_t rate;
> +	cmdline_fixed_string_t show;
> +};
> +
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_mlx5 =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 mlx5, "mlx5");
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_port =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 port, "port");
> +cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_port_id =
> +	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
> +			      port_id, RTE_UINT16);
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_txq =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 txq, "txq");
> +cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_queue_id =
> +	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
> +			      queue_id, RTE_UINT16);
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_rate =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 rate, "rate");
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_show =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 show, "show");
> +
> +static void
> +mlx5_cmd_show_rate_limit_parsed(void *parsed_result,
> +				__rte_unused struct cmdline *cl,
> +				__rte_unused void *data)
> +{
> +	struct mlx5_cmd_show_rate_limit_options *res = parsed_result;
> +	struct rte_pmd_mlx5_txq_rate_limit_info info;
> +	int ret;
> +
> +	ret = rte_pmd_mlx5_txq_rate_limit_query(res->port_id, res-
> >queue_id,
> +						 &info);
> +	switch (ret) {
> +	case 0:
> +		break;
> +	case -ENODEV:
> +		fprintf(stderr, "invalid port_id %u\n", res->port_id);
> +		return;
> +	case -EINVAL:
> +		fprintf(stderr, "invalid queue index (%u), out of range\n",
> +			res->queue_id);
> +		return;
> +	case -EIO:
> +		fprintf(stderr, "failed to query SQ context\n");
> +		return;
> +	default:
> +		fprintf(stderr, "query failed (%d)\n", ret);
> +		return;
> +	}
> +	fprintf(stdout, "Port %u Txq %u rate limit info:\n",
> +		res->port_id, res->queue_id);
> +	if (info.rate_mbps > 0)
> +		fprintf(stdout, "  Configured rate: %u Mbps\n",
> +			info.rate_mbps);
> +	else
> +		fprintf(stdout, "  Configured rate: disabled\n");
> +	fprintf(stdout, "  PP index (driver): %u\n", info.pp_index);
> +	fprintf(stdout, "  PP index (FW readback): %u\n", info.fw_pp_index); }
> +
> +cmdline_parse_inst_t mlx5_cmd_show_rate_limit = {
> +	.f = mlx5_cmd_show_rate_limit_parsed,
> +	.data = NULL,
> +	.help_str = "mlx5 port <port_id> txq <queue_id> rate show",
> +	.tokens = {
> +		(void *)&mlx5_cmd_show_rate_limit_mlx5,
> +		(void *)&mlx5_cmd_show_rate_limit_port,
> +		(void *)&mlx5_cmd_show_rate_limit_port_id,
> +		(void *)&mlx5_cmd_show_rate_limit_txq,
> +		(void *)&mlx5_cmd_show_rate_limit_queue_id,
> +		(void *)&mlx5_cmd_show_rate_limit_rate,
> +		(void *)&mlx5_cmd_show_rate_limit_show,
> +		NULL,
> +	}
> +};
> +
>  static struct testpmd_driver_commands mlx5_driver_cmds = {
>  	.commands = {
>  		{
> @@ -1440,6 +1528,11 @@ static struct testpmd_driver_commands
> mlx5_driver_cmds = {
>  			.help = "mlx5 port (port_id) queue (queue_id) dump
> rq_context (file_name)\n"
>  				"    Dump mlx5 RQ Context\n\n",
>  		},
> +		{
> +			.ctx = &mlx5_cmd_show_rate_limit,
> +			.help = "mlx5 port (port_id) txq (queue_id) rate
> show\n"
> +				"    Show per-queue rate limit PP index\n\n",
> +		},
>  		{
>  			.ctx = NULL,
>  		},
> diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c index
> 8085b5c306..fa57d3ef98 100644
> --- a/drivers/net/mlx5/mlx5_tx.c
> +++ b/drivers/net/mlx5/mlx5_tx.c
> @@ -800,7 +800,7 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t
> port_id, uint16_t queue_id, const ch
>  	if (!rte_eth_dev_is_valid_port(port_id))
>  		return -ENODEV;
> 
> -	if (rte_eth_tx_queue_is_valid(port_id, queue_id))
> +	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
>  		return -EINVAL;
> 
>  	fd = fopen(path, "w");
> @@ -848,3 +848,41 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t
> port_id, uint16_t queue_id, const ch
>  	fclose(fd);
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_txq_rate_limit_query,
> +26.07) int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t
> queue_id,
> +				       struct rte_pmd_mlx5_txq_rate_limit_info
> *info) {
> +	struct rte_eth_dev *dev;
> +	struct mlx5_priv *priv;
> +	struct mlx5_txq_data *txq_data;
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +	uint32_t sq_out[MLX5_ST_SZ_DW(query_sq_out)] = {0};
> +	int ret;
> +
> +	if (info == NULL)
> +		return -EINVAL;
> +	if (!rte_eth_dev_is_valid_port(port_id))
> +		return -ENODEV;
> +	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
> +		return -EINVAL;
> +	dev = &rte_eth_devices[port_id];
> +	priv = dev->data->dev_private;
> +	txq_data = (*priv->txqs)[queue_id];
> +	txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
> +	info->rate_mbps = txq_ctrl->rl.rate_mbps;
> +	info->pp_index = txq_ctrl->rl.pp_id;
> +	if (txq_ctrl->obj == NULL) {
> +		info->fw_pp_index = 0;
> +		return 0;
> +	}
> +	ret = mlx5_devx_cmd_query_sq(txq_ctrl->obj->sq_obj.sq,
> +				     sq_out, sizeof(sq_out));
> +	if (ret)
> +		return -EIO;
> +	info->fw_pp_index = MLX5_GET(sqc,
> +				     MLX5_ADDR_OF(query_sq_out, sq_out,
> +						  sq_context),
> +				     packet_pacing_rate_limit_index);
> +	return 0;
> +}
> diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> f2ed2454a0..f0881f3560 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1406,7 +1406,20 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev
> *dev, uint16_t queue_idx,
>  		rte_errno = EINVAL;
>  		return -rte_errno;
>  	}
> -	if (txq_ctrl->obj == NULL || txq_ctrl->obj->sq == NULL) {

Should this code be in the commit introducing mlx5_set_queue_rate_limit() ?

> +	if (txq_ctrl->obj == NULL) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	/*
> +	 * For non-hairpin queues the SQ DevX object lives in
> +	 * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
> +	 * queues use obj->sq directly.  These are different members
> +	 * of a union inside mlx5_txq_obj.
> +	 */
> +	struct mlx5_devx_obj *sq_devx = txq_ctrl->obj->sq_obj.sq;

Style nit - could you please mode variable declaration to the {-block beginning ?

> +	if (sq_devx == NULL) {
>  		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
>  			dev->data->port_id, queue_idx);
>  		rte_errno = EINVAL;
> @@ -1420,7 +1433,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev
> *dev, uint16_t queue_idx,
>  		sq_attr.state = MLX5_SQC_STATE_RDY;
>  		sq_attr.rl_update = 1;
>  		sq_attr.packet_pacing_rate_limit_index = 0;
> -		ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
> +		ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
>  		if (ret) {
>  			DRV_LOG(ERR,
>  				"Port %u Tx queue %u failed to clear rate.",
> @@ -1444,7 +1457,7 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev
> *dev, uint16_t queue_idx,
>  	sq_attr.state = MLX5_SQC_STATE_RDY;
>  	sq_attr.rl_update = 1;
>  	sq_attr.packet_pacing_rate_limit_index = new_rl.pp_id;
> -	ret = mlx5_devx_cmd_modify_sq(txq_ctrl->obj->sq, &sq_attr);
> +	ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
>  	if (ret) {
>  		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u
> Mbps.",
>  			dev->data->port_id, queue_idx, tx_rate); diff --git
> a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h index
> 7acfdae97d..698d7d2032 100644
> --- a/drivers/net/mlx5/rte_pmd_mlx5.h
> +++ b/drivers/net/mlx5/rte_pmd_mlx5.h
> @@ -420,6 +420,36 @@ __rte_experimental
>  int
>  rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id,
> const char *filename);
> 
> +/**
> + * Per-queue rate limit information.
> + */
> +struct rte_pmd_mlx5_txq_rate_limit_info {
> +	uint32_t rate_mbps;	/**< Configured rate in Mbps, 0 = disabled. */
> +	uint16_t pp_index;	/**< PP index from driver state. */
> +	uint16_t fw_pp_index;	/**< PP index read back from FW SQ context.
> */
> +};
> +
> +/**
> + * Query per-queue rate limit state for a given Tx queue.
> + *
> + * @param[in] port_id
> + *   Port ID.
> + * @param[in] queue_id
> + *   Tx queue ID.
> + * @param[out] info
> + *   Rate limit information.
> + *
> + * @return
> + *   0 on success, negative errno on failure:
> + *   - -ENODEV: invalid port_id.
> + *   - -EINVAL: invalid queue_id.
> + *   - -EIO: FW query failed.
> + */
> +__rte_experimental
> +int
> +rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
> +				  struct rte_pmd_mlx5_txq_rate_limit_info
> *info);
> +
>  /** Type of mlx5 driver event for which custom callback is called. */  enum
> rte_pmd_mlx5_driver_event_cb_type {
>  	/** Called after HW Rx queue is created. */
> --
> 2.43.0
With best regards,
Slava

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 08/9] ethdev: add getter for per-queue Tx rate limit
  2026-03-12 22:01   ` [PATCH v3 08/9] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-20 15:44     ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 15:44 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi,

Should we split this commit into two ones? For rte and for the driver?
Should we add tracing to the rte_eth_get_queue_rate_limit() getter like we have
in the rte_eth_set_queue_rate_limit() ?

With best regards,
Slava

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 08/9] ethdev: add getter for per-queue Tx rate limit
> 
> The existing rte_eth_set_queue_rate_limit() API allows setting a per-queue Tx
> rate but provides no way to read it back. Applications such as grout are forced
> to maintain a shadow copy of the rate to be able to report it.
> 
> Add rte_eth_get_queue_rate_limit() as the symmetric getter, following the
> established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
> rte_eth_dev_set_vlan_offload/get_vlan_offload).
> 
> This adds:
> - eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
> - rte_eth_get_queue_rate_limit() public experimental API (26.07)
> - mlx5 PMD implementation reading from the existing per-queue
>   rate_mbps tracking field
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  app/test-pmd/cmdline.c      | 69
> +++++++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5.c     |  2 ++
>  drivers/net/mlx5/mlx5_tx.h  |  2 ++
>  drivers/net/mlx5/mlx5_txq.c | 30 ++++++++++++++++
> lib/ethdev/ethdev_driver.h  |  7 ++++
>  lib/ethdev/rte_ethdev.c     | 28 +++++++++++++++
>  lib/ethdev/rte_ethdev.h     | 24 +++++++++++++
>  7 files changed, 162 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> c33c66f327..ee532984e8 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -8912,6 +8912,74 @@ static cmdline_parse_inst_t cmd_queue_rate_limit
> = {
>  	},
>  };
> 
> +/* *** SHOW RATE LIMIT FOR A QUEUE OF A PORT *** */ struct
> +cmd_show_queue_rate_limit_result {
> +	cmdline_fixed_string_t show;
> +	cmdline_fixed_string_t port;
> +	uint16_t port_num;
> +	cmdline_fixed_string_t queue;
> +	uint16_t queue_num;
> +	cmdline_fixed_string_t rate;
> +};
> +
> +static void cmd_show_queue_rate_limit_parsed(void *parsed_result,
> +		__rte_unused struct cmdline *cl,
> +		__rte_unused void *data)
> +{
> +	struct cmd_show_queue_rate_limit_result *res = parsed_result;
> +	uint32_t tx_rate = 0;
> +	int ret;
> +
> +	ret = rte_eth_get_queue_rate_limit(res->port_num, res->queue_num,
> +					   &tx_rate);
> +	if (ret) {
> +		fprintf(stderr, "Get queue rate limit failed: %s\n",
> +			rte_strerror(-ret));
> +		return;
> +	}
> +	if (tx_rate)
> +		printf("Port %u Queue %u rate limit: %u Mbps\n",
> +		       res->port_num, res->queue_num, tx_rate);
> +	else
> +		printf("Port %u Queue %u rate limit: disabled\n",
> +		       res->port_num, res->queue_num); }
> +
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_show =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				show, "show");
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_port =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				port, "port");
> +static cmdline_parse_token_num_t cmd_show_queue_rate_limit_portnum =
> +	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
> +				port_num, RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_queue =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				queue, "queue");
> +static cmdline_parse_token_num_t cmd_show_queue_rate_limit_queuenum
> =
> +	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
> +				queue_num, RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_rate =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				rate, "rate");
> +
> +static cmdline_parse_inst_t cmd_show_queue_rate_limit = {
> +	.f = cmd_show_queue_rate_limit_parsed,
> +	.data = NULL,
> +	.help_str = "show port <port_id> queue <queue_id> rate: "
> +		"Show rate limit for a queue on port_id",
> +	.tokens = {
> +		(void *)&cmd_show_queue_rate_limit_show,
> +		(void *)&cmd_show_queue_rate_limit_port,
> +		(void *)&cmd_show_queue_rate_limit_portnum,
> +		(void *)&cmd_show_queue_rate_limit_queue,
> +		(void *)&cmd_show_queue_rate_limit_queuenum,
> +		(void *)&cmd_show_queue_rate_limit_rate,
> +		NULL,
> +	},
> +};
> +
>  /* *** SET RATE LIMIT FOR A VF OF A PORT *** */  struct
> cmd_vf_rate_limit_result {
>  	cmdline_fixed_string_t set;
> @@ -14198,6 +14266,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>  	&cmd_set_uc_all_hash_filter,
>  	&cmd_vf_mac_addr_filter,
>  	&cmd_queue_rate_limit,
> +	&cmd_show_queue_rate_limit,
>  	&cmd_tunnel_udp_config,
>  	&cmd_showport_rss_hash,
>  	&cmd_showport_rss_hash_key,
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> f399e0d5c9..6e21ed31f3 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -2721,6 +2721,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
>  	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
>  	.get_restore_flags = mlx5_get_restore_flags,
>  	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
> +	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
>  };
> 
>  /* Available operations from secondary process. */ @@ -2815,6 +2816,7 @@
> const struct eth_dev_ops mlx5_dev_ops_isolate = {
>  	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
>  	.get_restore_flags = mlx5_get_restore_flags,
>  	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
> +	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
>  };
> 
>  /**
> diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h index
> 3a37f5bb4d..46e199d93e 100644
> --- a/drivers/net/mlx5/mlx5_tx.h
> +++ b/drivers/net/mlx5/mlx5_tx.h
> @@ -224,6 +224,8 @@ int mlx5_txq_releasable(struct rte_eth_dev *dev,
> uint16_t idx);  int mlx5_txq_verify(struct rte_eth_dev *dev);  int
> mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
>  			      uint32_t tx_rate);
> +int mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			      uint32_t *tx_rate);
>  int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);  void
> mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);  void
> mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl); diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> f0881f3560..c81542113e 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1473,6 +1473,36 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev
> *dev, uint16_t queue_idx,
>  	return 0;
>  }
> 
> +/**
> + * Get per-queue packet pacing rate limit.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param queue_idx
> + *   TX queue index.
> + * @param[out] tx_rate
> + *   Pointer to store the TX rate in Mbps, 0 if rate limiting is disabled.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			  uint32_t *tx_rate)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +
> +	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	txq_ctrl = container_of((*priv->txqs)[queue_idx],
> +				struct mlx5_txq_ctrl, txq);
> +	*tx_rate = txq_ctrl->rl.rate_mbps;
> +	return 0;
> +}
> +
>  /**
>   * Verify if the queue can be released.
>   *
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> 1255cd6f2c..0f336f9567 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -762,6 +762,11 @@ typedef int (*eth_set_queue_rate_limit_t)(struct
> rte_eth_dev *dev,
>  				uint16_t queue_idx,
>  				uint32_t tx_rate);
> 
> +/** @internal Get queue Tx rate. */
> +typedef int (*eth_get_queue_rate_limit_t)(struct rte_eth_dev *dev,
> +				uint16_t queue_idx,
> +				uint32_t *tx_rate);
> +
>  /** @internal Add tunneling UDP port. */  typedef int
> (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
>  					 struct rte_eth_udp_tunnel
> *tunnel_udp); @@ -1522,6 +1527,8 @@ struct eth_dev_ops {
> 
>  	/** Set queue rate limit */
>  	eth_set_queue_rate_limit_t set_queue_rate_limit;
> +	/** Get queue rate limit */
> +	eth_get_queue_rate_limit_t get_queue_rate_limit;
> 
>  	/** Configure RSS hash protocols and hashing key */
>  	rss_hash_update_t          rss_hash_update;
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 2edc7a362e..c6ad399033 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -5694,6 +5694,34 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
>  	return ret;
>  }
> 
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_get_queue_rate_limit, 26.07)
> int
> +rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
> +					uint32_t *tx_rate)
> +{
> +	struct rte_eth_dev *dev;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (tx_rate == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR,
> +			"Get queue rate limit:port %u: NULL tx_rate pointer",
> +			port_id);
> +		return -EINVAL;
> +	}
> +
> +	if (queue_idx >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG_LINE(ERR,
> +			"Get queue rate limit:port %u: invalid queue ID=%u",
> +			port_id, queue_idx);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->dev_ops->get_queue_rate_limit == NULL)
> +		return -ENOTSUP;
> +	return eth_err(port_id, dev->dev_ops->get_queue_rate_limit(dev,
> +queue_idx, tx_rate)); }
> +
>  RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_rx_avail_thresh_set, 22.07)
> int rte_eth_rx_avail_thresh_set(uint16_t port_id, uint16_t queue_id,
>  			       uint8_t avail_thresh)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> 0d8e2d0236..e525217b77 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -4817,6 +4817,30 @@ int rte_eth_dev_uc_all_hash_table_set(uint16_t
> port_id, uint8_t on);  int rte_eth_set_queue_rate_limit(uint16_t port_id,
> uint16_t queue_idx,
>  			uint32_t tx_rate);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice.
> + *
> + * Get the rate limitation for a queue on an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_idx
> + *   The queue ID.
> + * @param[out] tx_rate
> + *   A pointer to retrieve the Tx rate in Mbps.
> + *   0 means rate limiting is disabled.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support this feature.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EIO) if device is removed.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
> +			uint32_t *tx_rate);
> +
>  /**
>   * Configuration of Receive Side Scaling hash computation of Ethernet device.
>   *
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v3 09/9] net/mlx5: add rate table capacity query API
  2026-03-12 22:01   ` [PATCH v3 09/9] net/mlx5: add rate table capacity query API Vincent Jardin
@ 2026-03-20 15:49     ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-20 15:49 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi,

Please, note - the pacing table is global shared HW resource.
It can be used by multiple entities - by firmware, by kernel, by multiple DPDK application instances.
So, the returned value of the "how many entries are currently in use" can be wrong.

And, did you try to set the same limits on several queues ? Does FW check for the parameters and
provide pacing table entry sharing ? If we have two (or more) queues, with the same limits, do they
use the same pp_id ? 
With best regards,
Slava

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Friday, March 13, 2026 12:01 AM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org; Vincent Jardin
> <vjardin@free.fr>
> Subject: [PATCH v3 09/9] net/mlx5: add rate table capacity query API
> 
> Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet pacing
> rate table size and how many entries are currently in use.
> 
> The total comes from the HCA QoS capability packet_pacing_rate_table_size.
> The used count is derived by collecting unique non-zero PP indices across all TX
> queues.
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5_tx.c      | 64 +++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h | 32 +++++++++++++++++
>  2 files changed, 96 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c index
> fa57d3ef98..968aceb24f 100644
> --- a/drivers/net/mlx5/mlx5_tx.c
> +++ b/drivers/net/mlx5/mlx5_tx.c
> @@ -19,6 +19,7 @@
> 
>  #include <mlx5_prm.h>
>  #include <mlx5_common.h>
> +#include <mlx5_malloc.h>
> 
>  #include "mlx5_autoconf.h"
>  #include "mlx5_defs.h"
> @@ -886,3 +887,66 @@ int rte_pmd_mlx5_txq_rate_limit_query(uint16_t
> port_id, uint16_t queue_id,
>  				     packet_pacing_rate_limit_index);
>  	return 0;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query,
> 26.07)
> +int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
> +				     struct rte_pmd_mlx5_pp_rate_table_info
> *info) {
> +	struct rte_eth_dev *dev;
> +	struct mlx5_priv *priv;
> +	uint16_t used = 0;
> +	uint16_t *seen;
> +	unsigned int i;
> +
> +	if (info == NULL)
> +		return -EINVAL;
> +	if (!rte_eth_dev_is_valid_port(port_id))
> +		return -ENODEV;
> +	dev = &rte_eth_devices[port_id];
> +	priv = dev->data->dev_private;
> +	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
> +		rte_errno = ENOTSUP;
> +		return -ENOTSUP;
> +	}
> +	info->total = priv->sh->cdev-
> >config.hca_attr.qos.packet_pacing_rate_table_size;
> +	if (priv->txqs == NULL || priv->txqs_n == 0) {
> +		info->used = 0;
> +		return 0;
> +	}
> +	seen = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
> +			   priv->txqs_n * sizeof(*seen), 0, SOCKET_ID_ANY);
> +	if (seen == NULL)
> +		return -ENOMEM;
> +	/*
> +	 * Count unique non-zero PP indices across this port's TX queues.
> +	 * Note: the count reflects only queues on this port; other ports
> +	 * sharing the same device may also consume rate table entries.
> +	 */
> +	for (i = 0; i < priv->txqs_n; i++) {
> +		struct mlx5_txq_data *txq_data;
> +		struct mlx5_txq_ctrl *txq_ctrl;
> +		uint16_t pp_id;
> +		uint16_t j;
> +		bool dup;
> +
> +		if ((*priv->txqs)[i] == NULL)
> +			continue;
> +		txq_data = (*priv->txqs)[i];
> +		txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
> +		pp_id = txq_ctrl->rl.pp_id;
> +		if (pp_id == 0)
> +			continue;
> +		dup = false;
> +		for (j = 0; j < used; j++) {
> +			if (seen[j] == pp_id) {
> +				dup = true;
> +				break;
> +			}
> +		}
> +		if (!dup)
> +			seen[used++] = pp_id;
> +	}
> +	mlx5_free(seen);
> +	info->used = used;
> +	return 0;
> +}
> diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h
> b/drivers/net/mlx5/rte_pmd_mlx5.h index 698d7d2032..f7970dd7fb 100644
> --- a/drivers/net/mlx5/rte_pmd_mlx5.h
> +++ b/drivers/net/mlx5/rte_pmd_mlx5.h
> @@ -450,6 +450,38 @@ int
>  rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
>  				  struct rte_pmd_mlx5_txq_rate_limit_info
> *info);
> 
> +/**
> + * Packet pacing rate table capacity information.
> + */
> +struct rte_pmd_mlx5_pp_rate_table_info {
> +	uint16_t total;		/**< Total HW rate table entries. */
> +	uint16_t used;		/**< Currently allocated entries. */
> +};
> +
> +/**
> + * Query packet pacing rate table capacity.
> + *
> + * The ``used`` count reflects only the queried port's TX queues.
> + * Other ports sharing the same physical device may also consume
> + * rate table entries that are not included in this count.
> + *
> + * @param[in] port_id
> + *   Port ID.
> + * @param[out] info
> + *   Rate table capacity information.
> + *
> + * @return
> + *   0 on success, negative errno on failure:
> + *   - -ENODEV: invalid port_id.
> + *   - -EINVAL: info is NULL.
> + *   - -ENOTSUP: packet pacing not supported.
> + *   - -ENOMEM: allocation failure.
> + */
> +__rte_experimental
> +int
> +rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
> +				 struct rte_pmd_mlx5_pp_rate_table_info
> *info);
> +
>  /** Type of mlx5 driver event for which custom callback is called. */  enum
> rte_pmd_mlx5_driver_event_cb_type {
>  	/** Called after HW Rx queue is created. */
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v4 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
                     ` (9 preceding siblings ...)
  2026-03-16 16:04   ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
@ 2026-03-22 13:46   ` Vincent Jardin
  2026-03-22 13:46     ` [PATCH v4 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
                       ` (11 more replies)
  10 siblings, 12 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin


This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
ethdev API to read back the configured rate.

Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a PP
context per queue from the HW rate table, programs the PP index into the
SQ via modify_sq, and relies on the kernel to share identical rates
across PP contexts to conserve table entries. A PMD-specific API exposes
per-queue PP diagnostics and rate table capacity.

Patch breakdown:

  01/10 doc/nics/mlx5: fix stale packet pacing documentation
  02/10 common/mlx5: query packet pacing rate table capabilities
  03/10 common/mlx5: extend SQ modify to support rate limit update
  04/10 net/mlx5: add per-queue packet pacing infrastructure
  05/10 net/mlx5: support per-queue rate limiting
  06/10 net/mlx5: add burst pacing devargs
  07/10 net/mlx5: add testpmd command to query per-queue rate limit
  08/10 ethdev: add getter for per-queue Tx rate limit
  09/10 net/mlx5: implement per-queue Tx rate limit getter
  10/10 net/mlx5: add rate table capacity query API

Release notes for the new ethdev API and mlx5 per-queue rate
limiting can be added to a release_26_07.rst once the file is
created at the start of the 26.07 development cycle.

Changes since v3:

  Addressed review feedback from Stephen and Slava (nvidia/Mellanox).

  Patch 02/10 (query caps):
  - Added Acked-by: Viacheslav Ovsiienko

  Patch 03/10 (SQ modify):
  - Define MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX
    enum in mlx5_prm.h, following the MLX5_MODIFY_RQ_IN_MODIFY_xxx pattern
  - Use read-modify-write for modify_bitmask (MLX5_GET64 | OR | MLX5_SET64)
    instead of direct overwrite, for forward compatibility

  Patch 04/10 (PP infrastructure):
  - Rename struct member and parameters from "rl" to "rate_limit"
    for consistency with codebase naming style
  - Replace MLX5_ASSERT(rate_mbps > 0) with runtime check returning
    -EINVAL in non-debug builds
  - Move mlx5_txq_free_pp_rate_limit() to after txq_obj_release() in
    mlx5_txq_release() — destroy the SQ before freeing the PP index
    it references
  - Clarify commit message: distinct PP handle per queue (for cleanup)
    but kernel shares the same pp_id for identical rate parameters

  Patch 05/10 (set rate):
  - Fix obj->sq vs obj->sq_obj.sq: use obj->sq_obj.sq from the start
    for non-hairpin queues (was introduced in patch 07 in v3, breaking
    git bisect)
  - Move all variable declarations to block top (sq_devx,
    new_rate_limit)
  - Add queue state check: reject set_queue_rate_limit if queue is not
    STARTED (SQ not in RDY state)
  - Update mlx5 feature matrix: Rate limitation = Y
  - Add Per-Queue Tx Rate Limiting documentation section in mlx5.rst
    covering DevX requirement, hardware support, rate table sharing,
    and testpmd usage

  Patch 06/10 (burst devargs):
  - Remove burst_upper_bound/typical_packet_size from Clock Queue
    path (mlx5_txpp_alloc_pp_index) — Clock Queue uses WQE rate
    pacing and does not need these parameters
  - Update commit message and documentation accordingly

  Patch 07/10 (testpmd + PMD query):
  - sq_obj.sq accessor change moved to patch 05 (see above)
  - sq_devx declaration moved to block top

  Patch 08/10 (ethdev getter) — split from v3 patch 08:
  - Split into ethdev API (this patch) and mlx5 driver (patch 09)
  - Add rte_eth_trace_get_queue_rate_limit() trace point matching
    the existing setter pattern

  Patch 09/10 — NEW (was part of v3 patch 08):
  - mlx5 driver implementation of get_queue_rate_limit callback,
    split out per Slava's request

  Patch 10/10 (rate table query):
  - Rename struct field "used" to "port_used" to clarify per-port
    scope
  - Strengthen Doxygen: rate table is a global shared HW resource
    (firmware, kernel, other DPDK instances may consume entries);
    port_used is a lower bound
  - Document PP sharing behavior with flags=0
  - Note that applications should aggregate across ports for
    device-wide visibility

Changes since v2:

  Addressed review feedback from Stephen Hemminger:

  Patch 04: cleaned redundant cast parentheses on (struct mlx5dv_pp *)
  Patch 04: consolidated dv_alloc_pp call onto one line
  Patch 05+08: removed redundant queue_idx bounds checks from driver
    callbacks — ethdev layer is the single validation point
  Patch 07: added generic testpmd command: show port <id> queue <id> rate
  Patch 08+10: removed release notes from release_26_03.rst (targets 26.07)
  Patch 10: use MLX5_MEM_SYS | MLX5_MEM_ZERO for heap allocation
  Patch 10: consolidated packet_pacing_rate_table_size onto one line

Changes since v1:

  Patch 01: Acked-by Viacheslav Ovsiienko
  Patch 04: rate bounds validation, uint64_t overflow fix, remove
    early PP free
  Patch 05: PP leak fix (temp struct pattern), rte_errno in error paths
  Patch 07: inverted rte_eth_tx_queue_is_valid() check
  Patch 10: stack array replaced with heap, per-port scope documented

Testing:

  - Build: GCC, no warnings
  - Hardware: ConnectX-6 Dx
  - DevX path (default): set/get/disable rate limiting verified
  - Verbs path (dv_flow_en=0): returns -EINVAL cleanly (SQ DevX
    object not available), no crash

Vincent Jardin (10):
  doc/nics/mlx5: fix stale packet pacing documentation
  common/mlx5: query packet pacing rate table capabilities
  common/mlx5: extend SQ modify to support rate limit update
  net/mlx5: add per-queue packet pacing infrastructure
  net/mlx5: support per-queue rate limiting
  net/mlx5: add burst pacing devargs
  net/mlx5: add testpmd command to query per-queue rate limit
  ethdev: add getter for per-queue Tx rate limit
  net/mlx5: implement per-queue Tx rate limit getter
  net/mlx5: add rate table capacity query API

Vincent Jardin (10):
  doc/nics/mlx5: fix stale packet pacing documentation
  common/mlx5: query packet pacing rate table capabilities
  common/mlx5: extend SQ modify to support rate limit update
  net/mlx5: add per-queue packet pacing infrastructure
  net/mlx5: support per-queue rate limiting
  net/mlx5: add burst pacing devargs
  net/mlx5: add testpmd command to query per-queue rate limit
  ethdev: add getter for per-queue Tx rate limit
  net/mlx5: implement per-queue Tx rate limit getter
  net/mlx5: add rate table capacity query API

 app/test-pmd/cmdline.c               |  69 ++++++++++
 doc/guides/nics/features/mlx5.ini    |   1 +
 doc/guides/nics/mlx5.rst             | 180 ++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.c |  23 ++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
 drivers/common/mlx5/mlx5_prm.h       |   7 ++
 drivers/net/mlx5/mlx5.c              |  46 +++++++
 drivers/net/mlx5/mlx5.h              |  13 ++
 drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++
 drivers/net/mlx5/mlx5_tx.c           | 104 +++++++++++++++-
 drivers/net/mlx5/mlx5_tx.h           |   5 +
 drivers/net/mlx5/mlx5_txpp.c         |  84 +++++++++++++
 drivers/net/mlx5/mlx5_txq.c          | 149 ++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h      |  74 +++++++++++
 lib/ethdev/ethdev_driver.h           |   7 ++
 lib/ethdev/ethdev_trace.h            |   9 ++
 lib/ethdev/ethdev_trace_points.c     |   3 +
 lib/ethdev/rte_ethdev.c              |  35 ++++++
 lib/ethdev/rte_ethdev.h              |  24 ++++
 19 files changed, 906 insertions(+), 33 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v4 01/10] doc/nics/mlx5: fix stale packet pacing documentation
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-22 13:46     ` [PATCH v4 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
                       ` (10 subsequent siblings)
  11 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

The Tx Scheduling section incorrectly stated that timestamps can only
be put on the first packet in a burst. The driver actually checks every
packet's ol_flags for the timestamp dynamic flag and inserts a dedicated
WAIT WQE per timestamped packet. The eMPW path also breaks batches when
a timestamped packet is encountered.

Additionally, the ConnectX-7+ wait-on-time capability was only briefly
mentioned in the tx_pp parameter section with no explanation of how it
differs from the ConnectX-6 Dx Clock Queue approach.

This patch:
- Removes the stale first-packet-only limitation
- Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
  ConnectX-7+ wait-on-time) with separate requirements tables
- Clarifies that tx_pp is specific to ConnectX-6 Dx
- Fixes tx_skew applicability to cover both hardware generations
- Updates the Send Scheduling Counters intro to reflect that timestamp
  validation counters also apply to ConnectX-7+ wait-on-time mode

Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
 1 file changed, 78 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9dcc93cc23..6bb8c07353 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -553,27 +553,32 @@ for an additional list of options shared with other mlx5 drivers.
 
 - ``tx_pp`` parameter [int]
 
+  This parameter applies to **ConnectX-6 Dx** only.
   If a nonzero value is specified the driver creates all necessary internal
-  objects to provide accurate packet send scheduling on mbuf timestamps.
+  objects (Clock Queue and Rearm Queue) to provide accurate packet send
+  scheduling on mbuf timestamps using a cross-channel approach.
   The positive value specifies the scheduling granularity in nanoseconds,
   the packet send will be accurate up to specified digits. The allowed range is
   from 500 to 1 million of nanoseconds. The negative value specifies the module
   of granularity and engages the special test mode the check the schedule rate.
   By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
-  feature is disabled.
+  feature is disabled on ConnectX-6 Dx.
 
-  Starting with ConnectX-7 the capability to schedule traffic directly
-  on timestamp specified in descriptor is provided,
-  no extra objects are needed anymore and scheduling capability
-  is advertised and handled regardless ``tx_pp`` parameter presence.
+  Starting with **ConnectX-7** the hardware provides a native wait-on-time
+  capability that inserts the scheduling delay directly in the WQE descriptor.
+  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter is not
+  required. The driver automatically advertises send scheduling support when
+  the HCA wait-on-time capability is detected. The ``tx_skew`` parameter can
+  still be used on ConnectX-7 and above to compensate for wire delay.
 
 - ``tx_skew`` parameter [int]
 
   The parameter adjusts the send packet scheduling on timestamps and represents
   the average delay between beginning of the transmitting descriptor processing
   by the hardware and appearance of actual packet data on the wire. The value
-  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
-  specified. The default value is zero.
+  should be provided in nanoseconds and applies to both ConnectX-6 Dx
+  (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
+  The default value is zero.
 
 - ``tx_vec_en`` parameter [int]
 
@@ -883,9 +888,13 @@ Send Scheduling Counters
 
 The mlx5 PMD provides a comprehensive set of counters designed for
 debugging and diagnostics related to packet scheduling during transmission.
-These counters are applicable only if the port was configured with the ``tx_pp`` devarg
-and reflect the status of the PMD scheduling infrastructure
-based on Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
+The first group of counters (prefixed ``tx_pp_``) reflects the status of the
+Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx and is
+applicable only if the port was configured with the ``tx_pp`` devarg.
+The timestamp validation counters
+(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
+``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and above
+in wait-on-time mode, without requiring ``tx_pp``.
 
 ``tx_pp_missed_interrupt_errors``
   Indicates that the Rearm Queue interrupt was not serviced on time.
@@ -1969,31 +1978,54 @@ Limitations
 Tx Scheduling
 ~~~~~~~~~~~~~
 
-When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on the packet
-being sent it tries to synchronize the time of packet appearing on
-the wire with the specified packet timestamp. If the specified one
-is in the past it should be ignored, if one is in the distant future
-it should be capped with some reasonable value (in range of seconds).
-These specific cases ("too late" and "distant future") can be optionally
-reported via device xstats to assist applications to detect the
-time-related problems.
-
-The timestamp upper "too-distant-future" limit
-at the moment of invoking the Tx burst routine
-can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on a packet
+being sent it inserts a dedicated WAIT WQE to synchronize the time of the
+packet appearing on the wire with the specified timestamp. Every packet
+in a burst that carries the timestamp dynamic flag is individually
+scheduled -- there is no restriction to the first packet only.
+
+If the specified timestamp is in the past, the packet is sent immediately.
+If it is in the distant future it should be capped with some reasonable
+value (in range of seconds). These specific cases ("too late" and
+"distant future") can be optionally reported via device xstats to assist
+applications to detect time-related problems.
+
+The eMPW (enhanced Multi-Packet Write) data path automatically breaks
+the batch when a timestamped packet is encountered, ensuring each
+scheduled packet gets its own WAIT WQE.
+
+Two hardware mechanisms are supported:
+
+**ConnectX-6 Dx -- Clock Queue (cross-channel)**
+   The driver creates a Clock Queue and a Rearm Queue that together
+   provide a time reference for scheduling. This mode requires the
+   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
+   "too-distant-future" limit at the moment of invoking the Tx burst
+   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
+   by 2^23.
+
+**ConnectX-7 and above -- wait-on-time**
+   The hardware supports placing the scheduling delay directly inside
+   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
+   ``tx_pp`` devarg is **not** required. The driver automatically
+   advertises send scheduling support when the HCA wait-on-time
+   capability is detected.
+
 Please note, for the testpmd txonly mode,
 the limit is deduced from the expression::
 
    (n_tx_descriptors / burst_size + 1) * inter_burst_gap
 
-There is no any packet reordering according timestamps is supposed,
-neither within packet burst, nor between packets, it is an entirely
-application responsibility to generate packets and its timestamps
-in desired order.
+There is no packet reordering according to timestamps,
+neither within a packet burst, nor between packets. It is entirely the
+application's responsibility to generate packets and their timestamps
+in the desired order.
 
 Requirements
 ^^^^^^^^^^^^
 
+ConnectX-6 Dx (Clock Queue mode):
+
 =========  =============
 Minimum    Version
 =========  =============
@@ -2005,20 +2037,35 @@ rdma-core
 DPDK       20.08
 =========  =============
 
+ConnectX-7 and above (wait-on-time mode):
+
+=========  =============
+Minimum    Version
+=========  =============
+hardware   ConnectX-7
+=========  =============
+
 Firmware configuration
 ^^^^^^^^^^^^^^^^^^^^^^
 
 Runtime configuration
 ^^^^^^^^^^^^^^^^^^^^^
 
-To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
-parameter should be specified.
+**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must be
+specified to enable send scheduling on mbuf timestamps.
+
+**ConnectX-7+**: no devarg is required. Send scheduling is automatically
+enabled when the HCA reports the wait-on-time capability.
+
+On both hardware generations the ``tx_skew`` parameter can be used to
+compensate for the delay between descriptor processing and actual wire
+time.
 
 Limitations
 ^^^^^^^^^^^
 
-#. The timestamps can be put only in the first packet
-   in the burst providing the entire burst scheduling.
+#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
+   are capped (see the ``tx_pp`` x 2^23 limit above).
 
 
 .. _mlx5_tx_inline:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 02/10] common/mlx5: query packet pacing rate table capabilities
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
  2026-03-22 13:46     ` [PATCH v4 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-22 13:46     ` [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
                       ` (9 subsequent siblings)
  11 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Query additional QoS packet pacing capabilities from HCA attributes:
- packet_pacing_burst_bound: HW supports burst_upper_bound parameter
- packet_pacing_typical_size: HW supports typical_packet_size parameter
- packet_pacing_max_rate / packet_pacing_min_rate: rate range in kbps
- packet_pacing_rate_table_size: number of HW rate table entries

These capabilities are needed by the upcoming per-queue rate limiting
feature to validate devarg values and report HW limits.

Supported hardware:
- ConnectX-6 Dx and later (different boards expose different subsets)
- ConnectX-5 reports packet_pacing but not all extended fields
- ConnectX-7/8 report the full capability set
- BlueField-2 and later DPUs also report these capabilities

Not supported:
- ConnectX-4 Lx and earlier (no packet_pacing capability at all)
- ConnectX-5 Ex may not report burst_bound or typical_size

Signed-off-by: Vincent Jardin <vjardin@free.fr>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 11 ++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index d12ebf8487..8f53303fa7 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1244,6 +1244,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 				MLX5_GET(qos_cap, hcattr, packet_pacing);
 		attr->qos.wqe_rate_pp =
 				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
+		attr->qos.packet_pacing_burst_bound =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_burst_bound);
+		attr->qos.packet_pacing_typical_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_typical_size);
+		attr->qos.packet_pacing_max_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_max_rate);
+		attr->qos.packet_pacing_min_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_min_rate);
+		attr->qos.packet_pacing_rate_table_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_rate_table_size);
 		if (attr->qos.flow_meter_aso_sup) {
 			attr->qos.log_meter_aso_granularity =
 				MLX5_GET(qos_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index da50fc686c..930ae2c072 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -67,7 +67,16 @@ struct mlx5_hca_qos_attr {
 	/* Power of the maximum allocation granularity Object. */
 	uint32_t log_max_num_meter_aso:5;
 	/* Power of the maximum number of supported objects. */
-
+	uint32_t packet_pacing_burst_bound:1;
+	/* HW supports burst_upper_bound PP parameter. */
+	uint32_t packet_pacing_typical_size:1;
+	/* HW supports typical_packet_size PP parameter. */
+	uint32_t packet_pacing_max_rate;
+	/* Maximum supported pacing rate in kbps. */
+	uint32_t packet_pacing_min_rate;
+	/* Minimum supported pacing rate in kbps. */
+	uint16_t packet_pacing_rate_table_size;
+	/* Number of entries in the HW rate table. */
 };
 
 struct mlx5_hca_vdpa_attr {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
  2026-03-22 13:46     ` [PATCH v4 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
  2026-03-22 13:46     ` [PATCH v4 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 12:59       ` Slava Ovsiienko
  2026-03-22 13:46     ` [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
                       ` (8 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add rl_update and packet_pacing_rate_limit_index fields to
mlx5_devx_modify_sq_attr. When rl_update is set, the modify SQ
command sets modify_bitmask bit 0 and writes the PP index into
the SQ context, allowing dynamic rate changes on a live RDY SQ
without teardown.

modify_sq_in.modify_bitmask[0x40] bit 0 controls the
packet_pacing_rate_limit_index.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same SQ context field, also supports wait-on-time
- BlueField-2/3: same modify_sq command support

Not supported:
- ConnectX-5: supports packet_pacing but only at SQ creation time,
  dynamic modify_bitmask update may not be supported on all FW
- ConnectX-4 Lx and earlier: no packet_pacing support

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 8 ++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++
 drivers/common/mlx5/mlx5_prm.h       | 7 +++++++
 3 files changed, 18 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 8f53303fa7..102f84fd5c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2129,6 +2129,14 @@ mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
 	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	if (sq_attr->rl_update) {
+		uint64_t msk = MLX5_GET64(modify_sq_in, in, modify_bitmask);
+
+		msk |= MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX;
+		MLX5_SET64(modify_sq_in, in, modify_bitmask, msk);
+		MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+			 sq_attr->packet_pacing_rate_limit_index);
+	}
 	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
 					 out, sizeof(out));
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 930ae2c072..82d949972b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -519,6 +519,9 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t state:4;
 	uint32_t hairpin_peer_rq:24;
 	uint32_t hairpin_peer_vhca:16;
+	uint32_t rl_update:1;
+	/* Set to update packet_pacing_rate_limit_index on a live SQ. */
+	uint32_t packet_pacing_rate_limit_index:16;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index ba33336e58..597d06362f 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2985,6 +2985,7 @@ struct mlx5_ifc_create_tis_in_bits {
 	struct mlx5_ifc_tisc_bits ctx;
 };
 
+/* Bits for modify_rq_in.modify_bitmask (Receive Queue). */
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -2992,6 +2993,12 @@ enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
 };
 
+/* Bits for modify_sq_in.modify_bitmask (Send Queue). */
+enum {
+	MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX =
+		1ULL << 0,
+};
+
 struct mlx5_ifc_modify_rq_in_bits {
 	u8 opcode[0x10];
 	u8 uid[0x10];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (2 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 13:00       ` Slava Ovsiienko
  2026-03-22 13:46     ` [PATCH v4 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
                       ` (7 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add mlx5_txq_rate_limit structure and alloc/free helpers for
per-queue data-rate packet pacing. Each Tx queue can now hold
its own PP (Packet Pacing) context allocated via mlx5dv_pp_alloc()
with MLX5_DATA_RATE mode.

mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM
rate_limit field and allocates a PP context from the HW rate table.
mlx5_txq_free_pp_rate_limit() releases it.

PP allocation uses shared mode (flags=0). Each dv_alloc_pp() call
returns a distinct PP handle (needed for per-queue dv_free_pp()
cleanup), but the kernel mlx5 driver internally maps identical
rate parameters to the same HW rate table entry (same pp_id) with
internal refcounting. This avoids exhausting the rate table
(typically 128 entries on ConnectX-6 Dx) when many queues share
the same rate.

The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is
untouched — it uses MLX5_WQE_RATE for per-packet scheduling with
a dedicated index, while per-queue rate limiting uses MLX5_DATA_RATE.

PP index cleanup is added to mlx5_txq_release() to prevent leaks
when queues are destroyed.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same mechanism, plus wait-on-time coexistence
- BlueField-2/3: same PP allocation support

Not supported:
- ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
  not be available on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.h      | 11 +++++
 drivers/net/mlx5/mlx5_tx.h   |  1 +
 drivers/net/mlx5/mlx5_txpp.c | 78 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c  |  1 +
 4 files changed, 91 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4da184eb47..33628d7987 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1297,6 +1297,13 @@ struct mlx5_txpp_ts {
 	RTE_ATOMIC(uint64_t) ts;
 };
 
+/* Per-queue rate limit tracking. */
+struct mlx5_txq_rate_limit {
+	void *pp;		/* Packet pacing context from dv_alloc_pp. */
+	uint16_t pp_id;		/* Packet pacing index. */
+	uint32_t rate_mbps;	/* Current rate in Mbps, 0 = disabled. */
+};
+
 /* Tx packet pacing structure. */
 struct mlx5_dev_txpp {
 	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
@@ -2630,6 +2637,10 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev,
 void mlx5_txpp_interrupt_handler(void *cb_arg);
 int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev);
 void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev);
+int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+				 struct mlx5_txq_rate_limit *rate_limit,
+				 uint32_t rate_mbps);
+void mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rate_limit);
 
 /* mlx5_rxtx.c */
 
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 0134a2e003..51f330454a 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -192,6 +192,7 @@ struct mlx5_txq_ctrl {
 	uint16_t dump_file_n; /* Number of dump files. */
 	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	uint32_t hairpin_status; /* Hairpin binding status. */
+	struct mlx5_txq_rate_limit rate_limit; /* Per-queue rate limit. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 0e99b58bde..e34e996e9b 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -128,6 +128,84 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 #endif
 }
 
+/* Free a per-queue packet pacing index. */
+void
+mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rate_limit)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	if (rate_limit->pp) {
+		mlx5_glue->dv_free_pp(rate_limit->pp);
+		rate_limit->pp = NULL;
+		rate_limit->pp_id = 0;
+		rate_limit->rate_mbps = 0;
+	}
+#else
+	RTE_SET_USED(rate_limit);
+#endif
+}
+
+/* Allocate a per-queue packet pacing index for data-rate limiting. */
+int
+mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_txq_rate_limit *rate_limit,
+			     uint32_t rate_mbps)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
+	uint64_t rate_kbps;
+	struct mlx5_hca_qos_attr *qos = &sh->cdev->config.hca_attr.qos;
+
+	if (rate_mbps == 0) {
+		DRV_LOG(ERR, "Rate must be greater than zero.");
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+	rate_kbps = (uint64_t)rate_mbps * 1000;
+	if (qos->packet_pacing_min_rate && rate_kbps < qos->packet_pacing_min_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps below HW minimum (%u kbps).",
+			rate_mbps, qos->packet_pacing_min_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	if (qos->packet_pacing_max_rate && rate_kbps > qos->packet_pacing_max_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps exceeds HW maximum (%u kbps).",
+			rate_mbps, qos->packet_pacing_max_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	memset(&pp, 0, sizeof(pp));
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	rate_limit->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp),
+						 &pp, 0);
+	if (rate_limit->pp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
+			rate_mbps);
+		rte_errno = errno;
+		return -errno;
+	}
+	rate_limit->pp_id = ((struct mlx5dv_pp *)rate_limit->pp)->index;
+	if (!rate_limit->pp_id) {
+		DRV_LOG(ERR, "Zero PP index allocated for rate %u Mbps.",
+			rate_mbps);
+		mlx5_txq_free_pp_rate_limit(rate_limit);
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	rate_limit->rate_mbps = rate_mbps;
+	DRV_LOG(DEBUG, "Allocated PP index %u for rate %u Mbps.",
+		rate_limit->pp_id, rate_mbps);
+	return 0;
+#else
+	RTE_SET_USED(sh);
+	RTE_SET_USED(rate_limit);
+	RTE_SET_USED(rate_mbps);
+	DRV_LOG(ERR, "Per-queue rate limit requires rdma-core PP support.");
+	rte_errno = ENOTSUP;
+	return -ENOTSUP;
+#endif
+}
+
 static void
 mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9275efb58e..3356c89758 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1344,6 +1344,7 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 		mlx5_free(txq_ctrl->obj);
 		txq_ctrl->obj = NULL;
 	}
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
 	if (!txq_ctrl->is_hairpin) {
 		if (txq_ctrl->txq.fcqs) {
 			mlx5_free(txq_ctrl->txq.fcqs);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 05/10] net/mlx5: support per-queue rate limiting
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (3 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 13:17       ` Slava Ovsiienko
  2026-03-22 13:46     ` [PATCH v4 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
                       ` (6 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback
allocates a per-queue PP index with the requested data rate, then
modifies the live SQ via modify_bitmask bit 0 to apply the new
packet_pacing_rate_limit_index — no queue teardown required.

Setting tx_rate=0 clears the PP index on the SQ and frees it.

Capability check uses hca_attr.qos.packet_pacing directly (not
dev_cap.txpp_en which requires Clock Queue prerequisites). This
allows per-queue rate limiting without the tx_pp devarg.

The callback rejects hairpin queues and queues whose SQ is not
yet created.

testpmd usage (no testpmd changes needed):
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0     # disable

Supported hardware:
- ConnectX-6 Dx: full support, per-SQ rate via HW rate table
- ConnectX-7/8: full support, coexists with wait-on-time scheduling
- BlueField-2/3: full support as DPU rep ports

Not supported:
- ConnectX-5: packet_pacing exists but dynamic SQ modify may not
  work on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/features/mlx5.ini |   1 +
 doc/guides/nics/mlx5.rst          |  54 ++++++++++++++
 drivers/net/mlx5/mlx5.c           |   2 +
 drivers/net/mlx5/mlx5_tx.h        |   2 +
 drivers/net/mlx5/mlx5_txq.c       | 118 ++++++++++++++++++++++++++++++
 5 files changed, 177 insertions(+)

diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
index 4f9c4c309b..3b3eda28b8 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -30,6 +30,7 @@ Inner RSS            = Y
 SR-IOV               = Y
 VLAN filter          = Y
 Flow control         = Y
+Rate limitation      = Y
 CRC offload          = Y
 VLAN offload         = Y
 L3 checksum offload  = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 6bb8c07353..c72a60f084 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,60 @@ for an additional list of options shared with other mlx5 drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+.. _mlx5_per_queue_rate_limit:
+
+Per-Queue Tx Rate Limiting
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The mlx5 PMD supports per-queue Tx rate limiting via the standard ethdev
+API ``rte_eth_set_queue_rate_limit()`` and ``rte_eth_get_queue_rate_limit()``.
+
+This feature uses the hardware packet pacing mechanism to enforce a data
+rate on individual TX queues without tearing down the queue. The rate is
+specified in Mbps.
+
+**Requirements:**
+
+- ConnectX-6 Dx or later with ``packet_pacing`` HCA capability.
+- The DevX path must be used (default). The legacy Verbs path
+  (``dv_flow_en=0``) does not support dynamic SQ modification and
+  returns ``-EINVAL``.
+- The queue must be started (SQ in RDY state) before setting a rate.
+
+**Supported hardware:**
+
+- ConnectX-6 Dx: per-SQ rate via HW rate table.
+- ConnectX-7/8: full support, coexists with wait-on-time scheduling.
+- BlueField-2/3: full support as DPU rep ports.
+
+**Not supported:**
+
+- ConnectX-5: ``packet_pacing`` exists but dynamic SQ modify may not
+  work on all firmware versions.
+- ConnectX-4 Lx and earlier: no ``packet_pacing`` capability.
+
+**Rate table sharing:**
+
+The hardware rate table has a limited number of entries (typically 128 on
+ConnectX-6 Dx). When multiple queues are configured with identical rate
+parameters, the kernel mlx5 driver shares a single rate table entry across
+them. Each queue still has its own independent SQ and enforces the rate
+independently — queues are never merged. The rate cap applies per-queue:
+if two queues share the same 1000 Mbps entry, each can send up to
+1000 Mbps independently, they do not share a combined budget.
+
+This sharing is transparent and only affects table capacity: 128 entries
+can serve thousands of queues as long as many use the same rate. Queues
+with different rates consume separate entries.
+
+**Usage with testpmd:**
+
+.. code-block:: console
+
+   testpmd> set port 0 queue 0 rate 1000
+   testpmd> show port 0 queue 0 rate
+   testpmd> set port 0 queue 0 rate 0
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector with ConnectX-5 NICs and above.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e795948187..e718f0fa8c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2621,6 +2621,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2714,6 +2715,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.count_aggr_ports = mlx5_count_aggr_ports,
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 51f330454a..975ff57acd 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 3356c89758..ce08363ca9 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1363,6 +1363,124 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	return 0;
 }
 
+/**
+ * Set per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param tx_rate
+ *   TX rate in Mbps, 0 to disable rate limiting.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_devx_obj *sq_devx;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_txq_rate_limit new_rate_limit = { 0 };
+	int ret;
+
+	if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
+		DRV_LOG(ERR, "Port %u packet pacing not supported.",
+			dev->data->port_id);
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	if (txq_ctrl->is_hairpin) {
+		DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (txq_ctrl->obj == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * For non-hairpin queues the SQ DevX object lives in
+	 * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
+	 * queues use obj->sq directly. These are different members
+	 * of a union inside mlx5_txq_obj.
+	 */
+	sq_devx = txq_ctrl->obj->sq_obj.sq;
+	if (sq_devx == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (dev->data->tx_queue_state[queue_idx] !=
+	    RTE_ETH_QUEUE_STATE_STARTED) {
+		DRV_LOG(ERR,
+			"Port %u Tx queue %u is not started, stop traffic before setting rate.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (tx_rate == 0) {
+		/* Disable rate limiting. */
+		if (txq_ctrl->rate_limit.pp_id == 0)
+			return 0; /* Already disabled. */
+		sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.rl_update = 1;
+		sq_attr.packet_pacing_rate_limit_index = 0;
+		ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
+		if (ret) {
+			DRV_LOG(ERR,
+				"Port %u Tx queue %u failed to clear rate.",
+				dev->data->port_id, queue_idx);
+			rte_errno = -ret;
+			return ret;
+		}
+		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
+		DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
+			dev->data->port_id, queue_idx);
+		return 0;
+	}
+	/* Allocate a new PP index for the requested rate into a temp. */
+	ret = mlx5_txq_alloc_pp_rate_limit(sh, &new_rate_limit, tx_rate);
+	if (ret)
+		return ret;
+	/* Modify live SQ to use the new PP index. */
+	sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+	sq_attr.state = MLX5_SQC_STATE_RDY;
+	sq_attr.rl_update = 1;
+	sq_attr.packet_pacing_rate_limit_index = new_rate_limit.pp_id;
+	ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
+			dev->data->port_id, queue_idx, tx_rate);
+		mlx5_txq_free_pp_rate_limit(&new_rate_limit);
+		rte_errno = -ret;
+		return ret;
+	}
+	/* SQ updated — release old PP context, install new one. */
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
+	txq_ctrl->rate_limit = new_rate_limit;
+	DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx %u).",
+		dev->data->port_id, queue_idx, tx_rate, txq_ctrl->rate_limit.pp_id);
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 06/10] net/mlx5: add burst pacing devargs
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (4 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 13:18       ` Slava Ovsiienko
  2026-03-22 13:46     ` [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
                       ` (5 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Expose burst_upper_bound and typical_packet_size from the PRM
set_pp_rate_limit_context as devargs:
- tx_burst_bound=<bytes>: max burst before rate evaluation kicks in
- tx_typical_pkt_sz=<bytes>: typical packet size for accuracy

These parameters apply to per-queue rate limiting
(rte_eth_set_queue_rate_limit) only. The Clock Queue path
(tx_pp devarg) uses WQE rate pacing and does not need these
parameters.

Values are validated against HCA capabilities
(packet_pacing_burst_bound and packet_pacing_typical_size).
If the HW does not support them, a warning is logged and the
value is silently zeroed. Test mode still overrides both values.

Shared context mismatch checks ensure all ports on the same
device use the same burst parameters.

Supported hardware:
- ConnectX-6 Dx: burst_upper_bound and typical_packet_size
  reported via packet_pacing_burst_bound / packet_pacing_typical_size
  QoS capability bits
- ConnectX-7/8: full support for both parameters
- BlueField-2/3: same capabilities as host-side ConnectX

Not supported:
- ConnectX-5: may not report burst_bound or typical_size caps
- ConnectX-4 Lx and earlier: no packet_pacing at all

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst     | 17 +++++++++++++++
 drivers/net/mlx5/mlx5.c      | 42 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_txpp.c |  6 ++++++
 4 files changed, 67 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index c72a60f084..d0b403dd5c 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,23 @@ for an additional list of options shared with other mlx5 drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+- ``tx_burst_bound`` parameter [int]
+
+  Specifies the burst upper bound in bytes for packet pacing rate evaluation.
+  When set, the hardware considers this burst size when enforcing the configured
+  rate limit. Only effective when the HCA reports ``packet_pacing_burst_bound``
+  capability. Applies to per-queue rate limiting
+  (``rte_eth_set_queue_rate_limit()``). The Clock Queue path (``tx_pp``)
+  uses WQE rate pacing and does not use this parameter.
+  The default value is zero (hardware default).
+
+- ``tx_typical_pkt_sz`` parameter [int]
+
+  Specifies the typical packet size in bytes for packet pacing rate accuracy
+  improvement. Only effective when the HCA reports
+  ``packet_pacing_typical_size`` capability. Applies to per-queue rate
+  limiting only. The default value is zero (hardware default).
+
 .. _mlx5_per_queue_rate_limit:
 
 Per-Queue Tx Rate Limiting
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e718f0fa8c..7d08d7886b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -119,6 +119,18 @@
  */
 #define MLX5_TX_SKEW "tx_skew"
 
+/*
+ * Device parameter to specify burst upper bound in bytes
+ * for packet pacing rate evaluation.
+ */
+#define MLX5_TX_BURST_BOUND "tx_burst_bound"
+
+/*
+ * Device parameter to specify typical packet size in bytes
+ * for packet pacing rate accuracy improvement.
+ */
+#define MLX5_TX_TYPICAL_PKT_SZ "tx_typical_pkt_sz"
+
 /*
  * Device parameter to enable hardware Tx vector.
  * Deprecated, ignored (no vectorized Tx routines anymore).
@@ -1407,6 +1419,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->tx_pp = tmp;
 	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
 		config->tx_skew = tmp;
+	} else if (strcmp(MLX5_TX_BURST_BOUND, key) == 0) {
+		config->tx_burst_bound = tmp;
+	} else if (strcmp(MLX5_TX_TYPICAL_PKT_SZ, key) == 0) {
+		config->tx_typical_pkt_sz = tmp;
 	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
 		config->l3_vxlan_en = !!tmp;
 	} else if (strcmp(MLX5_VF_NL_EN, key) == 0) {
@@ -1481,8 +1497,10 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 				struct mlx5_sh_config *config)
 {
 	const char **params = (const char *[]){
+		MLX5_TX_BURST_BOUND,
 		MLX5_TX_PP,
 		MLX5_TX_SKEW,
+		MLX5_TX_TYPICAL_PKT_SZ,
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_ESW_EN,
@@ -1557,6 +1575,18 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(WARNING,
 			"\"tx_skew\" doesn't affect without \"tx_pp\".");
 	}
+	if (config->tx_burst_bound &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_burst_bound) {
+		DRV_LOG(WARNING,
+			"HW does not support burst_upper_bound, ignoring.");
+		config->tx_burst_bound = 0;
+	}
+	if (config->tx_typical_pkt_sz &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_typical_size) {
+		DRV_LOG(WARNING,
+			"HW does not support typical_packet_size, ignoring.");
+		config->tx_typical_pkt_sz = 0;
+	}
 	/* Check for LRO support. */
 	if (mlx5_devx_obj_ops_en(sh) && sh->cdev->config.hca_attr.lro_cap) {
 		/* TBD check tunnel lro caps. */
@@ -3191,6 +3221,18 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.tx_burst_bound != config->tx_burst_bound) {
+		DRV_LOG(ERR, "\"tx_burst_bound\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
+	if (sh->config.tx_typical_pkt_sz != config->tx_typical_pkt_sz) {
+		DRV_LOG(ERR, "\"tx_typical_pkt_sz\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
 		DRV_LOG(ERR, "\"TxQ memory alignment\" "
 			"configuration mismatch for shared %s context. %u - %u",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 33628d7987..5ae01ec491 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -383,6 +383,8 @@ struct mlx5_port_config {
 struct mlx5_sh_config {
 	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
 	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
+	uint32_t tx_burst_bound; /* Burst upper bound in bytes, 0 = default. */
+	uint32_t tx_typical_pkt_sz; /* Typical packet size in bytes, 0 = default. */
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index e34e996e9b..707ef9d111 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -176,6 +176,12 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 	memset(&pp, 0, sizeof(pp));
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	rate_limit->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp),
 						 &pp, 0);
 	if (rate_limit->pp == NULL) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (5 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 13:19       ` Slava Ovsiienko
  2026-03-22 13:46     ` [PATCH v4 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
                       ` (4 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add a new testpmd command to display the per-queue packet pacing
rate limit state, including the PP index from both driver state
and FW SQ context readback:

  testpmd> mlx5 port <port_id> txq <queue_id> rate show

This helps verify that the FW actually applied the PP index to
the SQ after setting a per-queue rate limit.

Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that
queries txq_ctrl->rate_limit for driver state and
mlx5_devx_cmd_query_sq() for the FW
packet_pacing_rate_limit_index field.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_testpmd.c | 93 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c      | 40 +++++++++++++-
 drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++++++++++
 3 files changed, 162 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 1bb5a89559..fd3efecc5d 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -1365,6 +1365,94 @@ cmdline_parse_inst_t mlx5_cmd_dump_rq_context_options = {
 	}
 };
 
+/* Show per-queue rate limit PP index for a given port/queue */
+struct mlx5_cmd_show_rate_limit_options {
+	cmdline_fixed_string_t mlx5;
+	cmdline_fixed_string_t port;
+	portid_t port_id;
+	cmdline_fixed_string_t txq;
+	queueid_t queue_id;
+	cmdline_fixed_string_t rate;
+	cmdline_fixed_string_t show;
+};
+
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_mlx5 =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 mlx5, "mlx5");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 port, "port");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_port_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      port_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_txq =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 txq, "txq");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_queue_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      queue_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 rate, "rate");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 show, "show");
+
+static void
+mlx5_cmd_show_rate_limit_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct mlx5_cmd_show_rate_limit_options *res = parsed_result;
+	struct rte_pmd_mlx5_txq_rate_limit_info info;
+	int ret;
+
+	ret = rte_pmd_mlx5_txq_rate_limit_query(res->port_id, res->queue_id,
+						 &info);
+	switch (ret) {
+	case 0:
+		break;
+	case -ENODEV:
+		fprintf(stderr, "invalid port_id %u\n", res->port_id);
+		return;
+	case -EINVAL:
+		fprintf(stderr, "invalid queue index (%u), out of range\n",
+			res->queue_id);
+		return;
+	case -EIO:
+		fprintf(stderr, "failed to query SQ context\n");
+		return;
+	default:
+		fprintf(stderr, "query failed (%d)\n", ret);
+		return;
+	}
+	fprintf(stdout, "Port %u Txq %u rate limit info:\n",
+		res->port_id, res->queue_id);
+	if (info.rate_mbps > 0)
+		fprintf(stdout, "  Configured rate: %u Mbps\n",
+			info.rate_mbps);
+	else
+		fprintf(stdout, "  Configured rate: disabled\n");
+	fprintf(stdout, "  PP index (driver): %u\n", info.pp_index);
+	fprintf(stdout, "  PP index (FW readback): %u\n", info.fw_pp_index);
+}
+
+cmdline_parse_inst_t mlx5_cmd_show_rate_limit = {
+	.f = mlx5_cmd_show_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "mlx5 port <port_id> txq <queue_id> rate show",
+	.tokens = {
+		(void *)&mlx5_cmd_show_rate_limit_mlx5,
+		(void *)&mlx5_cmd_show_rate_limit_port,
+		(void *)&mlx5_cmd_show_rate_limit_port_id,
+		(void *)&mlx5_cmd_show_rate_limit_txq,
+		(void *)&mlx5_cmd_show_rate_limit_queue_id,
+		(void *)&mlx5_cmd_show_rate_limit_rate,
+		(void *)&mlx5_cmd_show_rate_limit_show,
+		NULL,
+	}
+};
+
 static struct testpmd_driver_commands mlx5_driver_cmds = {
 	.commands = {
 		{
@@ -1440,6 +1528,11 @@ static struct testpmd_driver_commands mlx5_driver_cmds = {
 			.help = "mlx5 port (port_id) queue (queue_id) dump rq_context (file_name)\n"
 				"    Dump mlx5 RQ Context\n\n",
 		},
+		{
+			.ctx = &mlx5_cmd_show_rate_limit,
+			.help = "mlx5 port (port_id) txq (queue_id) rate show\n"
+				"    Show per-queue rate limit PP index\n\n",
+		},
 		{
 			.ctx = NULL,
 		},
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 8085b5c306..7d71782d33 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -800,7 +800,7 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	if (!rte_eth_dev_is_valid_port(port_id))
 		return -ENODEV;
 
-	if (rte_eth_tx_queue_is_valid(port_id, queue_id))
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
 		return -EINVAL;
 
 	fd = fopen(path, "w");
@@ -848,3 +848,41 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	fclose(fd);
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_txq_rate_limit_query, 26.07)
+int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				       struct rte_pmd_mlx5_txq_rate_limit_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	struct mlx5_txq_data *txq_data;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	uint32_t sq_out[MLX5_ST_SZ_DW(query_sq_out)] = {0};
+	int ret;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
+		return -EINVAL;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	txq_data = (*priv->txqs)[queue_id];
+	txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	info->rate_mbps = txq_ctrl->rate_limit.rate_mbps;
+	info->pp_index = txq_ctrl->rate_limit.pp_id;
+	if (txq_ctrl->obj == NULL) {
+		info->fw_pp_index = 0;
+		return 0;
+	}
+	ret = mlx5_devx_cmd_query_sq(txq_ctrl->obj->sq_obj.sq,
+				     sq_out, sizeof(sq_out));
+	if (ret)
+		return -EIO;
+	info->fw_pp_index = MLX5_GET(sqc,
+				     MLX5_ADDR_OF(query_sq_out, sq_out,
+						  sq_context),
+				     packet_pacing_rate_limit_index);
+	return 0;
+}
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 7acfdae97d..698d7d2032 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -420,6 +420,36 @@ __rte_experimental
 int
 rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const char *filename);
 
+/**
+ * Per-queue rate limit information.
+ */
+struct rte_pmd_mlx5_txq_rate_limit_info {
+	uint32_t rate_mbps;	/**< Configured rate in Mbps, 0 = disabled. */
+	uint16_t pp_index;	/**< PP index from driver state. */
+	uint16_t fw_pp_index;	/**< PP index read back from FW SQ context. */
+};
+
+/**
+ * Query per-queue rate limit state for a given Tx queue.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[in] queue_id
+ *   Tx queue ID.
+ * @param[out] info
+ *   Rate limit information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: invalid queue_id.
+ *   - -EIO: FW query failed.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (6 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 13:19       ` Slava Ovsiienko
  2026-03-22 13:46     ` [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
                       ` (3 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

The existing rte_eth_set_queue_rate_limit() API allows setting a
per-queue Tx rate but provides no way to read it back. Applications
such as grout are forced to maintain a shadow copy of the rate to
be able to report it.

Add rte_eth_get_queue_rate_limit() as the symmetric getter, following
the established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
rte_eth_dev_set_vlan_offload/get_vlan_offload).

This adds:
- eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
- rte_eth_get_queue_rate_limit() public experimental API (26.07)
- Trace point matching the existing setter pattern
- Generic testpmd command: show port <id> queue <id> rate

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 app/test-pmd/cmdline.c           | 69 ++++++++++++++++++++++++++++++++
 lib/ethdev/ethdev_driver.h       |  7 ++++
 lib/ethdev/ethdev_trace.h        |  9 +++++
 lib/ethdev/ethdev_trace_points.c |  3 ++
 lib/ethdev/rte_ethdev.c          | 35 ++++++++++++++++
 lib/ethdev/rte_ethdev.h          | 24 +++++++++++
 6 files changed, 147 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c5abeb5730..cc9c462498 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8982,6 +8982,74 @@ static cmdline_parse_inst_t cmd_queue_rate_limit = {
 	},
 };
 
+/* *** SHOW RATE LIMIT FOR A QUEUE OF A PORT *** */
+struct cmd_show_queue_rate_limit_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	uint16_t port_num;
+	cmdline_fixed_string_t queue;
+	uint16_t queue_num;
+	cmdline_fixed_string_t rate;
+};
+
+static void cmd_show_queue_rate_limit_parsed(void *parsed_result,
+		__rte_unused struct cmdline *cl,
+		__rte_unused void *data)
+{
+	struct cmd_show_queue_rate_limit_result *res = parsed_result;
+	uint32_t tx_rate = 0;
+	int ret;
+
+	ret = rte_eth_get_queue_rate_limit(res->port_num, res->queue_num,
+					   &tx_rate);
+	if (ret) {
+		fprintf(stderr, "Get queue rate limit failed: %s\n",
+			rte_strerror(-ret));
+		return;
+	}
+	if (tx_rate)
+		printf("Port %u Queue %u rate limit: %u Mbps\n",
+		       res->port_num, res->queue_num, tx_rate);
+	else
+		printf("Port %u Queue %u rate limit: disabled\n",
+		       res->port_num, res->queue_num);
+}
+
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				show, "show");
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				port, "port");
+static cmdline_parse_token_num_t cmd_show_queue_rate_limit_portnum =
+	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				port_num, RTE_UINT16);
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_queue =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				queue, "queue");
+static cmdline_parse_token_num_t cmd_show_queue_rate_limit_queuenum =
+	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				queue_num, RTE_UINT16);
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				rate, "rate");
+
+static cmdline_parse_inst_t cmd_show_queue_rate_limit = {
+	.f = cmd_show_queue_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "show port <port_id> queue <queue_id> rate: "
+		"Show rate limit for a queue on port_id",
+	.tokens = {
+		(void *)&cmd_show_queue_rate_limit_show,
+		(void *)&cmd_show_queue_rate_limit_port,
+		(void *)&cmd_show_queue_rate_limit_portnum,
+		(void *)&cmd_show_queue_rate_limit_queue,
+		(void *)&cmd_show_queue_rate_limit_queuenum,
+		(void *)&cmd_show_queue_rate_limit_rate,
+		NULL,
+	},
+};
+
 /* *** SET RATE LIMIT FOR A VF OF A PORT *** */
 struct cmd_vf_rate_limit_result {
 	cmdline_fixed_string_t set;
@@ -14270,6 +14338,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	&cmd_set_uc_all_hash_filter,
 	&cmd_vf_mac_addr_filter,
 	&cmd_queue_rate_limit,
+	&cmd_show_queue_rate_limit,
 	&cmd_tunnel_udp_config,
 	&cmd_showport_rss_hash,
 	&cmd_showport_rss_hash_key,
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 1255cd6f2c..0f336f9567 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -762,6 +762,11 @@ typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
 				uint16_t queue_idx,
 				uint32_t tx_rate);
 
+/** @internal Get queue Tx rate. */
+typedef int (*eth_get_queue_rate_limit_t)(struct rte_eth_dev *dev,
+				uint16_t queue_idx,
+				uint32_t *tx_rate);
+
 /** @internal Add tunneling UDP port. */
 typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *tunnel_udp);
@@ -1522,6 +1527,8 @@ struct eth_dev_ops {
 
 	/** Set queue rate limit */
 	eth_set_queue_rate_limit_t set_queue_rate_limit;
+	/** Get queue rate limit */
+	eth_get_queue_rate_limit_t get_queue_rate_limit;
 
 	/** Configure RSS hash protocols and hashing key */
 	rss_hash_update_t          rss_hash_update;
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 482befc209..6554cc1a21 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -908,6 +908,15 @@ RTE_TRACE_POINT(
 	rte_trace_point_emit_int(ret);
 )
 
+RTE_TRACE_POINT(
+	rte_eth_trace_get_queue_rate_limit,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_idx,
+		int ret),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_idx);
+	rte_trace_point_emit_int(ret);
+)
+
 RTE_TRACE_POINT(
 	rte_eth_trace_rx_avail_thresh_set,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/ethdev/ethdev_trace_points.c b/lib/ethdev/ethdev_trace_points.c
index 071c508327..0a28378a56 100644
--- a/lib/ethdev/ethdev_trace_points.c
+++ b/lib/ethdev/ethdev_trace_points.c
@@ -347,6 +347,9 @@ RTE_TRACE_POINT_REGISTER(rte_ethdev_trace_uc_all_hash_table_set,
 RTE_TRACE_POINT_REGISTER(rte_eth_trace_set_queue_rate_limit,
 	lib.ethdev.set_queue_rate_limit)
 
+RTE_TRACE_POINT_REGISTER(rte_eth_trace_get_queue_rate_limit,
+	lib.ethdev.get_queue_rate_limit)
+
 RTE_TRACE_POINT_REGISTER(rte_eth_trace_rx_avail_thresh_set,
 	lib.ethdev.rx_avail_thresh_set)
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 2edc7a362e..5e763e1855 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5694,6 +5694,41 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	return ret;
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_get_queue_rate_limit, 26.07)
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+					uint32_t *tx_rate)
+{
+	struct rte_eth_dev *dev;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (tx_rate == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: NULL tx_rate pointer",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: invalid queue ID=%u",
+			port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	if (dev->dev_ops->get_queue_rate_limit == NULL)
+		return -ENOTSUP;
+	ret = eth_err(port_id,
+		      dev->dev_ops->get_queue_rate_limit(dev, queue_idx,
+							 tx_rate));
+
+	rte_eth_trace_get_queue_rate_limit(port_id, queue_idx, ret);
+
+	return ret;
+}
+
 RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_rx_avail_thresh_set, 22.07)
 int rte_eth_rx_avail_thresh_set(uint16_t port_id, uint16_t queue_id,
 			       uint8_t avail_thresh)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 0d8e2d0236..e525217b77 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4817,6 +4817,30 @@ int rte_eth_dev_uc_all_hash_table_set(uint16_t port_id, uint8_t on);
 int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 			uint32_t tx_rate);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Get the rate limitation for a queue on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_idx
+ *   The queue ID.
+ * @param[out] tx_rate
+ *   A pointer to retrieve the Tx rate in Mbps.
+ *   0 means rate limiting is disabled.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support this feature.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+			uint32_t *tx_rate);
+
 /**
  * Configuration of Receive Side Scaling hash computation of Ethernet device.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (7 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 13:20       ` Slava Ovsiienko
  2026-03-22 13:46     ` [PATCH v4 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
                       ` (2 subsequent siblings)
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Wire the mlx5 PMD to the new rte_eth_get_queue_rate_limit()
ethdev callback. The implementation reads the per-queue
rate_mbps tracking field from the txq_ctrl structure.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.c     |  2 ++
 drivers/net/mlx5/mlx5_tx.h  |  2 ++
 drivers/net/mlx5/mlx5_txq.c | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7d08d7886b..f5784761f9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2652,6 +2652,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2746,6 +2747,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 975ff57acd..02feb9e6fd 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -224,6 +224,8 @@ int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
 int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 			      uint32_t tx_rate);
+int mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t *tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index ce08363ca9..867ea4b994 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1481,6 +1481,36 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 }
 
+/**
+ * Get per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param[out] tx_rate
+ *   Pointer to store the TX rate in Mbps, 0 if rate limiting is disabled.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t *tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *txq_ctrl;
+
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	*tx_rate = txq_ctrl->rate_limit.rate_mbps;
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v4 10/10] net/mlx5: add rate table capacity query API
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (8 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
@ 2026-03-22 13:46     ` Vincent Jardin
  2026-03-23 13:20       ` Slava Ovsiienko
  2026-03-23 23:09     ` [PATCH v4 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
  11 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 13:46 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet
pacing rate table size and how many entries are used by this port.

The total comes from the HCA QoS capability
packet_pacing_rate_table_size. The port_used count is derived by
collecting unique non-zero PP indices across this port's TX queues.

The rate table is a global shared HW resource: firmware, kernel,
other DPDK ports on the same device, and other application
instances may all consume entries. The port_used count is therefore
a lower bound of actual HW usage.

With shared PP allocation (flags=0), the kernel mlx5 driver reuses
a single rate table entry for all PP contexts with identical
parameters (rate, burst, packet size). Multiple queues configured
with the same rate share one pp_id, so port_used counts unique
entries, not the number of queues with rate limiting enabled.

Applications that need device-wide visibility should query all
ports on the same physical device and aggregate the results,
similar to how the kernel mlx5 driver tracks usage internally.

Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_tx.c      | 64 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h | 44 +++++++++++++++++++++++
 2 files changed, 108 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 7d71782d33..615b792836 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -19,6 +19,7 @@
 
 #include <mlx5_prm.h>
 #include <mlx5_common.h>
+#include <mlx5_malloc.h>
 
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -886,3 +887,66 @@ int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				     packet_pacing_rate_limit_index);
 	return 0;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query, 26.07)
+int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				     struct rte_pmd_mlx5_pp_rate_table_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint16_t used = 0;
+	uint16_t *seen;
+	unsigned int i;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	info->total = priv->sh->cdev->config.hca_attr.qos.packet_pacing_rate_table_size;
+	if (priv->txqs == NULL || priv->txqs_n == 0) {
+		info->port_used = 0;
+		return 0;
+	}
+	seen = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+			   priv->txqs_n * sizeof(*seen), 0, SOCKET_ID_ANY);
+	if (seen == NULL)
+		return -ENOMEM;
+	/*
+	 * Count unique non-zero PP indices across this port's TX queues.
+	 * Note: the count reflects only queues on this port; other ports
+	 * sharing the same device may also consume rate table entries.
+	 */
+	for (i = 0; i < priv->txqs_n; i++) {
+		struct mlx5_txq_data *txq_data;
+		struct mlx5_txq_ctrl *txq_ctrl;
+		uint16_t pp_id;
+		uint16_t j;
+		bool dup;
+
+		if ((*priv->txqs)[i] == NULL)
+			continue;
+		txq_data = (*priv->txqs)[i];
+		txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+		pp_id = txq_ctrl->rate_limit.pp_id;
+		if (pp_id == 0)
+			continue;
+		dup = false;
+		for (j = 0; j < used; j++) {
+			if (seen[j] == pp_id) {
+				dup = true;
+				break;
+			}
+		}
+		if (!dup)
+			seen[used++] = pp_id;
+	}
+	mlx5_free(seen);
+	info->port_used = used;
+	return 0;
+}
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 698d7d2032..621d8c2b15 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -450,6 +450,50 @@ int
 rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
 
+/**
+ * Packet pacing rate table capacity information.
+ */
+struct rte_pmd_mlx5_pp_rate_table_info {
+	uint16_t total;		/**< Total HW rate table entries. */
+	uint16_t port_used;	/**< Entries used by this port's TX queues. */
+};
+
+/**
+ * Query packet pacing rate table capacity.
+ *
+ * The ``port_used`` count reflects only unique PP indices allocated
+ * by the queried port's TX queues. It is a lower bound of actual HW
+ * usage because the rate table is a global shared resource:
+ * - Other DPDK ports on the same physical device may hold entries.
+ * - The kernel mlx5 driver and firmware may also consume entries.
+ * - Multiple DPDK application instances may share the device.
+ *
+ * When multiple queues on the same port are configured with identical
+ * rate parameters, the kernel shares a single rate table entry across
+ * them (with flags=0 allocation), so ``port_used`` counts unique
+ * entries, not the number of queues with rate limiting enabled.
+ *
+ * Applications that need device-wide visibility should query all
+ * ports on the same physical device and aggregate the results,
+ * similar to how the kernel mlx5 driver tracks usage internally.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[out] info
+ *   Rate table capacity information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: info is NULL.
+ *   - -ENOTSUP: packet pacing not supported.
+ *   - -ENOMEM: allocation failure.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				 struct rte_pmd_mlx5_pp_rate_table_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-20 15:38     ` Slava Ovsiienko
@ 2026-03-22 14:02       ` Vincent Jardin
  0 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 14:02 UTC (permalink / raw)
  To: Slava Ovsiienko
  Cc: dev@dpdk.org, Raslan Darawsheh,
	NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org

Hi Slava,

Le 20/03/26 15:38, Slava Ovsiienko a écrit :
> BTW, we have mlx5_tx_burst_mode_get(), all information about tx rate limit could be shown there.
> (I do not insist on, JFYI).

Thank you for the suggestion. But I think keeping both makes sense for different use cases:

- mlx5_tx_burst_mode_get(), it returns a human-readable string describing
  the burst mode. Embedding rate limit info there would mix concerns
  and the information would only be available as unstructured text.

- rte_pmd_mlx5_txq_rate_limit_query() returns some structured data
  (rate_mbps, pp_index, fw_pp_index) that the applications and scripts
  can consume programmatically. The readback via mlx5_devx_cmd_query_sq() is also
  useful for verifying that the firmware actually applied the PP index to the SQ.

Best regards,
  Vincent

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-16 16:04   ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
@ 2026-03-22 14:16     ` Vincent Jardin
  0 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-22 14:16 UTC (permalink / raw)
  To: dev; +Cc: stephen

Hi Stephen,

> mlx5_get_queue_rate_limit() doesn't bounds-check queue_idx before
> array access (the ethdev layer does, but the set path checks it
> too - inconsistent).

the fix in v4 goes in another direction: the redundant check was removed
from the setter as well, not added to the getter.

Both rte_eth_set_queue_rate_limit() and rte_eth_get_queue_rate_limit()
in the ethdev layer validate queue_idx >= nb_tx_queues before calling
the driver op. The driver callbacks can therefore assume this
precondition is met.

In v4, neither mlx5_set_queue_rate_limit() nor
mlx5_get_queue_rate_limit() checks queue_idx bounds, ethdev is the
single validation point for both.

Best regards,
  Vincent

^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update
  2026-03-22 13:46     ` [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
@ 2026-03-23 12:59       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 12:59 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate
> limit update
> 
> Add rl_update and packet_pacing_rate_limit_index fields to
> mlx5_devx_modify_sq_attr. When rl_update is set, the modify SQ command
> sets modify_bitmask bit 0 and writes the PP index into the SQ context, allowing
> dynamic rate changes on a live RDY SQ without teardown.
> 
> modify_sq_in.modify_bitmask[0x40] bit 0 controls the
> packet_pacing_rate_limit_index.
> 
> Supported hardware:
> - ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
> - ConnectX-7/8: same SQ context field, also supports wait-on-time
> - BlueField-2/3: same modify_sq command support
> 
> Not supported:
> - ConnectX-5: supports packet_pacing but only at SQ creation time,
>   dynamic modify_bitmask update may not be supported on all FW
> - ConnectX-4 Lx and earlier: no packet_pacing support
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/common/mlx5/mlx5_devx_cmds.c | 8 ++++++++
> drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++
>  drivers/common/mlx5/mlx5_prm.h       | 7 +++++++
>  3 files changed, 18 insertions(+)
> 
> diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c
> b/drivers/common/mlx5/mlx5_devx_cmds.c
> index 8f53303fa7..102f84fd5c 100644
> --- a/drivers/common/mlx5/mlx5_devx_cmds.c
> +++ b/drivers/common/mlx5/mlx5_devx_cmds.c
> @@ -2129,6 +2129,14 @@ mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj
> *sq,
>  	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
>  	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
>  	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr-
> >hairpin_peer_vhca);
> +	if (sq_attr->rl_update) {
> +		uint64_t msk = MLX5_GET64(modify_sq_in, in,
> modify_bitmask);
> +
> +		msk |=
> MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_IND
> EX;
> +		MLX5_SET64(modify_sq_in, in, modify_bitmask, msk);
> +		MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
> +			 sq_attr->packet_pacing_rate_limit_index);
> +	}
>  	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
>  					 out, sizeof(out));
>  	if (ret) {
> diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h
> b/drivers/common/mlx5/mlx5_devx_cmds.h
> index 930ae2c072..82d949972b 100644
> --- a/drivers/common/mlx5/mlx5_devx_cmds.h
> +++ b/drivers/common/mlx5/mlx5_devx_cmds.h
> @@ -519,6 +519,9 @@ struct mlx5_devx_modify_sq_attr {
>  	uint32_t state:4;
>  	uint32_t hairpin_peer_rq:24;
>  	uint32_t hairpin_peer_vhca:16;
> +	uint32_t rl_update:1;
> +	/* Set to update packet_pacing_rate_limit_index on a live SQ. */
> +	uint32_t packet_pacing_rate_limit_index:16;
>  };
> 
> 
> diff --git a/drivers/common/mlx5/mlx5_prm.h
> b/drivers/common/mlx5/mlx5_prm.h index ba33336e58..597d06362f 100644
> --- a/drivers/common/mlx5/mlx5_prm.h
> +++ b/drivers/common/mlx5/mlx5_prm.h
> @@ -2985,6 +2985,7 @@ struct mlx5_ifc_create_tis_in_bits {
>  	struct mlx5_ifc_tisc_bits ctx;
>  };
> 
> +/* Bits for modify_rq_in.modify_bitmask (Receive Queue). */
>  enum {
>  	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
>  	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1, @@ -
> 2992,6 +2993,12 @@ enum {
>  	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID =
> 1ULL << 3,  };
> 
> +/* Bits for modify_sq_in.modify_bitmask (Send Queue). */ enum {
> +
> 	MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LI
> MIT_INDEX =
> +		1ULL << 0,
> +};
> +
>  struct mlx5_ifc_modify_rq_in_bits {
>  	u8 opcode[0x10];
>  	u8 uid[0x10];
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-22 13:46     ` [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-23 13:00       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 13:00 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 04/10] net/mlx5: add per-queue packet pacing
> infrastructure
> 
> Add mlx5_txq_rate_limit structure and alloc/free helpers for per-queue data-
> rate packet pacing. Each Tx queue can now hold its own PP (Packet Pacing)
> context allocated via mlx5dv_pp_alloc() with MLX5_DATA_RATE mode.
> 
> mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM rate_limit
> field and allocates a PP context from the HW rate table.
> mlx5_txq_free_pp_rate_limit() releases it.
> 
> PP allocation uses shared mode (flags=0). Each dv_alloc_pp() call returns a
> distinct PP handle (needed for per-queue dv_free_pp() cleanup), but the kernel
> mlx5 driver internally maps identical rate parameters to the same HW rate table
> entry (same pp_id) with internal refcounting. This avoids exhausting the rate
> table (typically 128 entries on ConnectX-6 Dx) when many queues share the
> same rate.
> 
> The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is untouched — it
> uses MLX5_WQE_RATE for per-packet scheduling with a dedicated index, while
> per-queue rate limiting uses MLX5_DATA_RATE.
> 
> PP index cleanup is added to mlx5_txq_release() to prevent leaks when queues
> are destroyed.
> 
> Supported hardware:
> - ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
> - ConnectX-7/8: same mechanism, plus wait-on-time coexistence
> - BlueField-2/3: same PP allocation support
> 
> Not supported:
> - ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
>   not be available on all firmware versions
> - ConnectX-4 Lx and earlier: no packet_pacing capability
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5.h      | 11 +++++
>  drivers/net/mlx5/mlx5_tx.h   |  1 +
>  drivers/net/mlx5/mlx5_txpp.c | 78
> ++++++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5_txq.c  |  1 +
>  4 files changed, 91 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 4da184eb47..33628d7987 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -1297,6 +1297,13 @@ struct mlx5_txpp_ts {
>  	RTE_ATOMIC(uint64_t) ts;
>  };
> 
> +/* Per-queue rate limit tracking. */
> +struct mlx5_txq_rate_limit {
> +	void *pp;		/* Packet pacing context from dv_alloc_pp. */
> +	uint16_t pp_id;		/* Packet pacing index. */
> +	uint32_t rate_mbps;	/* Current rate in Mbps, 0 = disabled. */
> +};
> +
>  /* Tx packet pacing structure. */
>  struct mlx5_dev_txpp {
>  	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */ @@ -
> 2630,6 +2637,10 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev
> *dev,  void mlx5_txpp_interrupt_handler(void *cb_arg);  int
> mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev);  void
> mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev);
> +int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
> +				 struct mlx5_txq_rate_limit *rate_limit,
> +				 uint32_t rate_mbps);
> +void mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit
> +*rate_limit);
> 
>  /* mlx5_rxtx.c */
> 
> diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h index
> 0134a2e003..51f330454a 100644
> --- a/drivers/net/mlx5/mlx5_tx.h
> +++ b/drivers/net/mlx5/mlx5_tx.h
> @@ -192,6 +192,7 @@ struct mlx5_txq_ctrl {
>  	uint16_t dump_file_n; /* Number of dump files. */
>  	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
>  	uint32_t hairpin_status; /* Hairpin binding status. */
> +	struct mlx5_txq_rate_limit rate_limit; /* Per-queue rate limit. */
>  	struct mlx5_txq_data txq; /* Data path structure. */
>  	/* Must be the last field in the structure, contains elts[]. */  }; diff --git
> a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c index
> 0e99b58bde..e34e996e9b 100644
> --- a/drivers/net/mlx5/mlx5_txpp.c
> +++ b/drivers/net/mlx5/mlx5_txpp.c
> @@ -128,6 +128,84 @@ mlx5_txpp_alloc_pp_index(struct
> mlx5_dev_ctx_shared *sh)  #endif  }
> 
> +/* Free a per-queue packet pacing index. */ void
> +mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rate_limit) {
> +#ifdef HAVE_MLX5DV_PP_ALLOC
> +	if (rate_limit->pp) {
> +		mlx5_glue->dv_free_pp(rate_limit->pp);
> +		rate_limit->pp = NULL;
> +		rate_limit->pp_id = 0;
> +		rate_limit->rate_mbps = 0;
> +	}
> +#else
> +	RTE_SET_USED(rate_limit);
> +#endif
> +}
> +
> +/* Allocate a per-queue packet pacing index for data-rate limiting. */
> +int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
> +			     struct mlx5_txq_rate_limit *rate_limit,
> +			     uint32_t rate_mbps)
> +{
> +#ifdef HAVE_MLX5DV_PP_ALLOC
> +	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
> +	uint64_t rate_kbps;
> +	struct mlx5_hca_qos_attr *qos = &sh->cdev->config.hca_attr.qos;
> +
> +	if (rate_mbps == 0) {
> +		DRV_LOG(ERR, "Rate must be greater than zero.");
> +		rte_errno = EINVAL;
> +		return -EINVAL;
> +	}
> +	rate_kbps = (uint64_t)rate_mbps * 1000;
> +	if (qos->packet_pacing_min_rate && rate_kbps < qos-
> >packet_pacing_min_rate) {
> +		DRV_LOG(ERR, "Rate %u Mbps below HW minimum (%u
> kbps).",
> +			rate_mbps, qos->packet_pacing_min_rate);
> +		rte_errno = ERANGE;
> +		return -ERANGE;
> +	}
> +	if (qos->packet_pacing_max_rate && rate_kbps > qos-
> >packet_pacing_max_rate) {
> +		DRV_LOG(ERR, "Rate %u Mbps exceeds HW maximum (%u
> kbps).",
> +			rate_mbps, qos->packet_pacing_max_rate);
> +		rte_errno = ERANGE;
> +		return -ERANGE;
> +	}
> +	memset(&pp, 0, sizeof(pp));
> +	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit,
> (uint32_t)rate_kbps);
> +	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode,
> MLX5_DATA_RATE);
> +	rate_limit->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp),
> +						 &pp, 0);
> +	if (rate_limit->pp == NULL) {
> +		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
> +			rate_mbps);
> +		rte_errno = errno;
> +		return -errno;
> +	}
> +	rate_limit->pp_id = ((struct mlx5dv_pp *)rate_limit->pp)->index;
> +	if (!rate_limit->pp_id) {
> +		DRV_LOG(ERR, "Zero PP index allocated for rate %u Mbps.",
> +			rate_mbps);
> +		mlx5_txq_free_pp_rate_limit(rate_limit);
> +		rte_errno = ENOTSUP;
> +		return -ENOTSUP;
> +	}
> +	rate_limit->rate_mbps = rate_mbps;
> +	DRV_LOG(DEBUG, "Allocated PP index %u for rate %u Mbps.",
> +		rate_limit->pp_id, rate_mbps);
> +	return 0;
> +#else
> +	RTE_SET_USED(sh);
> +	RTE_SET_USED(rate_limit);
> +	RTE_SET_USED(rate_mbps);
> +	DRV_LOG(ERR, "Per-queue rate limit requires rdma-core PP support.");
> +	rte_errno = ENOTSUP;
> +	return -ENOTSUP;
> +#endif
> +}
> +
>  static void
>  mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)  { diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> 9275efb58e..3356c89758 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1344,6 +1344,7 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t
> idx)
>  		mlx5_free(txq_ctrl->obj);
>  		txq_ctrl->obj = NULL;
>  	}
> +	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
>  	if (!txq_ctrl->is_hairpin) {
>  		if (txq_ctrl->txq.fcqs) {
>  			mlx5_free(txq_ctrl->txq.fcqs);
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 05/10] net/mlx5: support per-queue rate limiting
  2026-03-22 13:46     ` [PATCH v4 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
@ 2026-03-23 13:17       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 13:17 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 05/10] net/mlx5: support per-queue rate limiting
> 
> Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback allocates a
> per-queue PP index with the requested data rate, then modifies the live SQ via
> modify_bitmask bit 0 to apply the new packet_pacing_rate_limit_index — no
> queue teardown required.
> 
> Setting tx_rate=0 clears the PP index on the SQ and frees it.
> 
> Capability check uses hca_attr.qos.packet_pacing directly (not dev_cap.txpp_en
> which requires Clock Queue prerequisites). This allows per-queue rate limiting
> without the tx_pp devarg.
> 
> The callback rejects hairpin queues and queues whose SQ is not yet created.
> 
> testpmd usage (no testpmd changes needed):
>   set port 0 queue 0 rate 1000
>   set port 0 queue 1 rate 5000
>   set port 0 queue 0 rate 0     # disable
> 
> Supported hardware:
> - ConnectX-6 Dx: full support, per-SQ rate via HW rate table
> - ConnectX-7/8: full support, coexists with wait-on-time scheduling
> - BlueField-2/3: full support as DPU rep ports
> 
> Not supported:
> - ConnectX-5: packet_pacing exists but dynamic SQ modify may not
>   work on all firmware versions
> - ConnectX-4 Lx and earlier: no packet_pacing capability
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  doc/guides/nics/features/mlx5.ini |   1 +
>  doc/guides/nics/mlx5.rst          |  54 ++++++++++++++
>  drivers/net/mlx5/mlx5.c           |   2 +
>  drivers/net/mlx5/mlx5_tx.h        |   2 +
>  drivers/net/mlx5/mlx5_txq.c       | 118 ++++++++++++++++++++++++++++++
>  5 files changed, 177 insertions(+)
> 
> diff --git a/doc/guides/nics/features/mlx5.ini
> b/doc/guides/nics/features/mlx5.ini
> index 4f9c4c309b..3b3eda28b8 100644
> --- a/doc/guides/nics/features/mlx5.ini
> +++ b/doc/guides/nics/features/mlx5.ini
> @@ -30,6 +30,7 @@ Inner RSS            = Y
>  SR-IOV               = Y
>  VLAN filter          = Y
>  Flow control         = Y
> +Rate limitation      = Y
>  CRC offload          = Y
>  VLAN offload         = Y
>  L3 checksum offload  = Y
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> 6bb8c07353..c72a60f084 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -580,6 +580,60 @@ for an additional list of options shared with other
> mlx5 drivers.
>    (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
>    The default value is zero.
> 
> +.. _mlx5_per_queue_rate_limit:
> +
> +Per-Queue Tx Rate Limiting
> +~~~~~~~~~~~~~~~~~~~~~~~~~~
> +
> +The mlx5 PMD supports per-queue Tx rate limiting via the standard
> +ethdev API ``rte_eth_set_queue_rate_limit()`` and
> ``rte_eth_get_queue_rate_limit()``.
> +
> +This feature uses the hardware packet pacing mechanism to enforce a
> +data rate on individual TX queues without tearing down the queue. The
> +rate is specified in Mbps.
> +
> +**Requirements:**
> +
> +- ConnectX-6 Dx or later with ``packet_pacing`` HCA capability.
> +- The DevX path must be used (default). The legacy Verbs path
> +  (``dv_flow_en=0``) does not support dynamic SQ modification and
> +  returns ``-EINVAL``.
> +- The queue must be started (SQ in RDY state) before setting a rate.
> +
> +**Supported hardware:**
> +
> +- ConnectX-6 Dx: per-SQ rate via HW rate table.
> +- ConnectX-7/8: full support, coexists with wait-on-time scheduling.
> +- BlueField-2/3: full support as DPU rep ports.
> +
> +**Not supported:**
> +
> +- ConnectX-5: ``packet_pacing`` exists but dynamic SQ modify may not
> +  work on all firmware versions.
> +- ConnectX-4 Lx and earlier: no ``packet_pacing`` capability.
> +
> +**Rate table sharing:**
> +
> +The hardware rate table has a limited number of entries (typically 128
> +on
> +ConnectX-6 Dx). When multiple queues are configured with identical rate
> +parameters, the kernel mlx5 driver shares a single rate table entry
> +across them. Each queue still has its own independent SQ and enforces
> +the rate independently — queues are never merged. The rate cap applies per-
> queue:
> +if two queues share the same 1000 Mbps entry, each can send up to
> +1000 Mbps independently, they do not share a combined budget.
> +
> +This sharing is transparent and only affects table capacity: 128
> +entries can serve thousands of queues as long as many use the same
> +rate. Queues with different rates consume separate entries.
> +
> +**Usage with testpmd:**
> +
> +.. code-block:: console
> +
> +   testpmd> set port 0 queue 0 rate 1000
> +   testpmd> show port 0 queue 0 rate
> +   testpmd> set port 0 queue 0 rate 0
> +
>  - ``tx_vec_en`` parameter [int]
> 
>    A nonzero value enables Tx vector with ConnectX-5 NICs and above.
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> e795948187..e718f0fa8c 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -2621,6 +2621,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
>  	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
>  	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
>  	.get_restore_flags = mlx5_get_restore_flags,
> +	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
>  };
> 
>  /* Available operations from secondary process. */ @@ -2714,6 +2715,7 @@
> const struct eth_dev_ops mlx5_dev_ops_isolate = {
>  	.count_aggr_ports = mlx5_count_aggr_ports,
>  	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
>  	.get_restore_flags = mlx5_get_restore_flags,
> +	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
>  };
> 
>  /**
> diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h index
> 51f330454a..975ff57acd 100644
> --- a/drivers/net/mlx5/mlx5_tx.h
> +++ b/drivers/net/mlx5/mlx5_tx.h
> @@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev
> *dev, uint16_t idx);  int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t
> idx);  int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);  int
> mlx5_txq_verify(struct rte_eth_dev *dev);
> +int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			      uint32_t tx_rate);
>  int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);  void
> mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);  void
> mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl); diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> 3356c89758..ce08363ca9 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1363,6 +1363,124 @@ mlx5_txq_release(struct rte_eth_dev *dev,
> uint16_t idx)
>  	return 0;
>  }
> 
> +/**
> + * Set per-queue packet pacing rate limit.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param queue_idx
> + *   TX queue index.
> + * @param tx_rate
> + *   TX rate in Mbps, 0 to disable rate limiting.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			  uint32_t tx_rate)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_dev_ctx_shared *sh = priv->sh;
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +	struct mlx5_devx_obj *sq_devx;
> +	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
> +	struct mlx5_txq_rate_limit new_rate_limit = { 0 };
> +	int ret;
> +
> +	if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
> +		DRV_LOG(ERR, "Port %u packet pacing not supported.",
> +			dev->data->port_id);
> +		rte_errno = ENOTSUP;
> +		return -rte_errno;
> +	}
> +	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	txq_ctrl = container_of((*priv->txqs)[queue_idx],
> +				struct mlx5_txq_ctrl, txq);
> +	if (txq_ctrl->is_hairpin) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	if (txq_ctrl->obj == NULL) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	/*
> +	 * For non-hairpin queues the SQ DevX object lives in
> +	 * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
> +	 * queues use obj->sq directly. These are different members
> +	 * of a union inside mlx5_txq_obj.
> +	 */
> +	sq_devx = txq_ctrl->obj->sq_obj.sq;
> +	if (sq_devx == NULL) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	if (dev->data->tx_queue_state[queue_idx] !=
> +	    RTE_ETH_QUEUE_STATE_STARTED) {
> +		DRV_LOG(ERR,
> +			"Port %u Tx queue %u is not started, stop traffic before
> setting rate.",
> +			dev->data->port_id, queue_idx);
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	if (tx_rate == 0) {
> +		/* Disable rate limiting. */
> +		if (txq_ctrl->rate_limit.pp_id == 0)
> +			return 0; /* Already disabled. */
> +		sq_attr.sq_state = MLX5_SQC_STATE_RDY;
> +		sq_attr.state = MLX5_SQC_STATE_RDY;
> +		sq_attr.rl_update = 1;
> +		sq_attr.packet_pacing_rate_limit_index = 0;
> +		ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
> +		if (ret) {
> +			DRV_LOG(ERR,
> +				"Port %u Tx queue %u failed to clear rate.",
> +				dev->data->port_id, queue_idx);
> +			rte_errno = -ret;
> +			return ret;
> +		}
> +		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
> +		DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
> +			dev->data->port_id, queue_idx);
> +		return 0;
> +	}
> +	/* Allocate a new PP index for the requested rate into a temp. */
> +	ret = mlx5_txq_alloc_pp_rate_limit(sh, &new_rate_limit, tx_rate);
> +	if (ret)
> +		return ret;
> +	/* Modify live SQ to use the new PP index. */
> +	sq_attr.sq_state = MLX5_SQC_STATE_RDY;
> +	sq_attr.state = MLX5_SQC_STATE_RDY;
> +	sq_attr.rl_update = 1;
> +	sq_attr.packet_pacing_rate_limit_index = new_rate_limit.pp_id;
> +	ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
> +	if (ret) {
> +		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u
> Mbps.",
> +			dev->data->port_id, queue_idx, tx_rate);
> +		mlx5_txq_free_pp_rate_limit(&new_rate_limit);
> +		rte_errno = -ret;
> +		return ret;
> +	}
> +	/* SQ updated — release old PP context, install new one. */
> +	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
> +	txq_ctrl->rate_limit = new_rate_limit;
> +	DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx
> %u).",
> +		dev->data->port_id, queue_idx, tx_rate, txq_ctrl-
> >rate_limit.pp_id);
> +	return 0;
> +}
> +
>  /**
>   * Verify if the queue can be released.
>   *
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 06/10] net/mlx5: add burst pacing devargs
  2026-03-22 13:46     ` [PATCH v4 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
@ 2026-03-23 13:18       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 13:18 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 06/10] net/mlx5: add burst pacing devargs
> 
> Expose burst_upper_bound and typical_packet_size from the PRM
> set_pp_rate_limit_context as devargs:
> - tx_burst_bound=<bytes>: max burst before rate evaluation kicks in
> - tx_typical_pkt_sz=<bytes>: typical packet size for accuracy
> 
> These parameters apply to per-queue rate limiting
> (rte_eth_set_queue_rate_limit) only. The Clock Queue path (tx_pp devarg) uses
> WQE rate pacing and does not need these parameters.
> 
> Values are validated against HCA capabilities (packet_pacing_burst_bound and
> packet_pacing_typical_size).
> If the HW does not support them, a warning is logged and the value is silently
> zeroed. Test mode still overrides both values.
> 
> Shared context mismatch checks ensure all ports on the same device use the
> same burst parameters.
> 
> Supported hardware:
> - ConnectX-6 Dx: burst_upper_bound and typical_packet_size
>   reported via packet_pacing_burst_bound / packet_pacing_typical_size
>   QoS capability bits
> - ConnectX-7/8: full support for both parameters
> - BlueField-2/3: same capabilities as host-side ConnectX
> 
> Not supported:
> - ConnectX-5: may not report burst_bound or typical_size caps
> - ConnectX-4 Lx and earlier: no packet_pacing at all
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  doc/guides/nics/mlx5.rst     | 17 +++++++++++++++
>  drivers/net/mlx5/mlx5.c      | 42 ++++++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5.h      |  2 ++
>  drivers/net/mlx5/mlx5_txpp.c |  6 ++++++
>  4 files changed, 67 insertions(+)
> 
> diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst index
> c72a60f084..d0b403dd5c 100644
> --- a/doc/guides/nics/mlx5.rst
> +++ b/doc/guides/nics/mlx5.rst
> @@ -580,6 +580,23 @@ for an additional list of options shared with other
> mlx5 drivers.
>    (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
>    The default value is zero.
> 
> +- ``tx_burst_bound`` parameter [int]
> +
> +  Specifies the burst upper bound in bytes for packet pacing rate evaluation.
> +  When set, the hardware considers this burst size when enforcing the
> + configured  rate limit. Only effective when the HCA reports
> + ``packet_pacing_burst_bound``  capability. Applies to per-queue rate
> + limiting  (``rte_eth_set_queue_rate_limit()``). The Clock Queue path
> + (``tx_pp``)  uses WQE rate pacing and does not use this parameter.
> +  The default value is zero (hardware default).
> +
> +- ``tx_typical_pkt_sz`` parameter [int]
> +
> +  Specifies the typical packet size in bytes for packet pacing rate
> + accuracy  improvement. Only effective when the HCA reports
> + ``packet_pacing_typical_size`` capability. Applies to per-queue rate
> + limiting only. The default value is zero (hardware default).
> +
>  .. _mlx5_per_queue_rate_limit:
> 
>  Per-Queue Tx Rate Limiting
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> e718f0fa8c..7d08d7886b 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -119,6 +119,18 @@
>   */
>  #define MLX5_TX_SKEW "tx_skew"
> 
> +/*
> + * Device parameter to specify burst upper bound in bytes
> + * for packet pacing rate evaluation.
> + */
> +#define MLX5_TX_BURST_BOUND "tx_burst_bound"
> +
> +/*
> + * Device parameter to specify typical packet size in bytes
> + * for packet pacing rate accuracy improvement.
> + */
> +#define MLX5_TX_TYPICAL_PKT_SZ "tx_typical_pkt_sz"
> +
>  /*
>   * Device parameter to enable hardware Tx vector.
>   * Deprecated, ignored (no vectorized Tx routines anymore).
> @@ -1407,6 +1419,10 @@ mlx5_dev_args_check_handler(const char *key,
> const char *val, void *opaque)
>  		config->tx_pp = tmp;
>  	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
>  		config->tx_skew = tmp;
> +	} else if (strcmp(MLX5_TX_BURST_BOUND, key) == 0) {
> +		config->tx_burst_bound = tmp;
> +	} else if (strcmp(MLX5_TX_TYPICAL_PKT_SZ, key) == 0) {
> +		config->tx_typical_pkt_sz = tmp;
>  	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
>  		config->l3_vxlan_en = !!tmp;
>  	} else if (strcmp(MLX5_VF_NL_EN, key) == 0) { @@ -1481,8 +1497,10
> @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
>  				struct mlx5_sh_config *config)
>  {
>  	const char **params = (const char *[]){
> +		MLX5_TX_BURST_BOUND,
>  		MLX5_TX_PP,
>  		MLX5_TX_SKEW,
> +		MLX5_TX_TYPICAL_PKT_SZ,
>  		MLX5_L3_VXLAN_EN,
>  		MLX5_VF_NL_EN,
>  		MLX5_DV_ESW_EN,
> @@ -1557,6 +1575,18 @@ mlx5_shared_dev_ctx_args_config(struct
> mlx5_dev_ctx_shared *sh,
>  		DRV_LOG(WARNING,
>  			"\"tx_skew\" doesn't affect without \"tx_pp\".");
>  	}
> +	if (config->tx_burst_bound &&
> +	    !sh->cdev->config.hca_attr.qos.packet_pacing_burst_bound) {
> +		DRV_LOG(WARNING,
> +			"HW does not support burst_upper_bound,
> ignoring.");
> +		config->tx_burst_bound = 0;
> +	}
> +	if (config->tx_typical_pkt_sz &&
> +	    !sh->cdev->config.hca_attr.qos.packet_pacing_typical_size) {
> +		DRV_LOG(WARNING,
> +			"HW does not support typical_packet_size, ignoring.");
> +		config->tx_typical_pkt_sz = 0;
> +	}
>  	/* Check for LRO support. */
>  	if (mlx5_devx_obj_ops_en(sh) && sh->cdev->config.hca_attr.lro_cap) {
>  		/* TBD check tunnel lro caps. */
> @@ -3191,6 +3221,18 @@ mlx5_probe_again_args_validate(struct
> mlx5_common_device *cdev,
>  			sh->ibdev_name);
>  		goto error;
>  	}
> +	if (sh->config.tx_burst_bound != config->tx_burst_bound) {
> +		DRV_LOG(ERR, "\"tx_burst_bound\" "
> +			"configuration mismatch for shared %s context.",
> +			sh->ibdev_name);
> +		goto error;
> +	}
> +	if (sh->config.tx_typical_pkt_sz != config->tx_typical_pkt_sz) {
> +		DRV_LOG(ERR, "\"tx_typical_pkt_sz\" "
> +			"configuration mismatch for shared %s context.",
> +			sh->ibdev_name);
> +		goto error;
> +	}
>  	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
>  		DRV_LOG(ERR, "\"TxQ memory alignment\" "
>  			"configuration mismatch for shared %s context. %u -
> %u", diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h index
> 33628d7987..5ae01ec491 100644
> --- a/drivers/net/mlx5/mlx5.h
> +++ b/drivers/net/mlx5/mlx5.h
> @@ -383,6 +383,8 @@ struct mlx5_port_config {  struct mlx5_sh_config {
>  	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
>  	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
> +	uint32_t tx_burst_bound; /* Burst upper bound in bytes, 0 = default. */
> +	uint32_t tx_typical_pkt_sz; /* Typical packet size in bytes, 0 =
> +default. */
>  	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
>  	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
>  	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */ diff --
> git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c index
> e34e996e9b..707ef9d111 100644
> --- a/drivers/net/mlx5/mlx5_txpp.c
> +++ b/drivers/net/mlx5/mlx5_txpp.c
> @@ -176,6 +176,12 @@ mlx5_txq_alloc_pp_rate_limit(struct
> mlx5_dev_ctx_shared *sh,
>  	memset(&pp, 0, sizeof(pp));
>  	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit,
> (uint32_t)rate_kbps);
>  	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode,
> MLX5_DATA_RATE);
> +	if (sh->config.tx_burst_bound)
> +		MLX5_SET(set_pp_rate_limit_context, &pp,
> +			 burst_upper_bound, sh->config.tx_burst_bound);
> +	if (sh->config.tx_typical_pkt_sz)
> +		MLX5_SET(set_pp_rate_limit_context, &pp,
> +			 typical_packet_size, sh->config.tx_typical_pkt_sz);
>  	rate_limit->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp),
>  						 &pp, 0);
>  	if (rate_limit->pp == NULL) {
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-22 13:46     ` [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-23 13:19       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 13:19 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 07/10] net/mlx5: add testpmd command to query per-
> queue rate limit
> 
> Add a new testpmd command to display the per-queue packet pacing rate limit
> state, including the PP index from both driver state and FW SQ context
> readback:
> 
>   testpmd> mlx5 port <port_id> txq <queue_id> rate show
> 
> This helps verify that the FW actually applied the PP index to the SQ after setting
> a per-queue rate limit.
> 
> Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that queries
> txq_ctrl->rate_limit for driver state and
> mlx5_devx_cmd_query_sq() for the FW
> packet_pacing_rate_limit_index field.
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5_testpmd.c | 93
> +++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c      | 40 +++++++++++++-
>  drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++++++++++
>  3 files changed, 162 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/mlx5/mlx5_testpmd.c
> b/drivers/net/mlx5/mlx5_testpmd.c index 1bb5a89559..fd3efecc5d 100644
> --- a/drivers/net/mlx5/mlx5_testpmd.c
> +++ b/drivers/net/mlx5/mlx5_testpmd.c
> @@ -1365,6 +1365,94 @@ cmdline_parse_inst_t
> mlx5_cmd_dump_rq_context_options = {
>  	}
>  };
> 
> +/* Show per-queue rate limit PP index for a given port/queue */ struct
> +mlx5_cmd_show_rate_limit_options {
> +	cmdline_fixed_string_t mlx5;
> +	cmdline_fixed_string_t port;
> +	portid_t port_id;
> +	cmdline_fixed_string_t txq;
> +	queueid_t queue_id;
> +	cmdline_fixed_string_t rate;
> +	cmdline_fixed_string_t show;
> +};
> +
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_mlx5 =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 mlx5, "mlx5");
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_port =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 port, "port");
> +cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_port_id =
> +	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
> +			      port_id, RTE_UINT16);
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_txq =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 txq, "txq");
> +cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_queue_id =
> +	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
> +			      queue_id, RTE_UINT16);
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_rate =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 rate, "rate");
> +cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_show =
> +	TOKEN_STRING_INITIALIZER(struct
> mlx5_cmd_show_rate_limit_options,
> +				 show, "show");
> +
> +static void
> +mlx5_cmd_show_rate_limit_parsed(void *parsed_result,
> +				__rte_unused struct cmdline *cl,
> +				__rte_unused void *data)
> +{
> +	struct mlx5_cmd_show_rate_limit_options *res = parsed_result;
> +	struct rte_pmd_mlx5_txq_rate_limit_info info;
> +	int ret;
> +
> +	ret = rte_pmd_mlx5_txq_rate_limit_query(res->port_id, res-
> >queue_id,
> +						 &info);
> +	switch (ret) {
> +	case 0:
> +		break;
> +	case -ENODEV:
> +		fprintf(stderr, "invalid port_id %u\n", res->port_id);
> +		return;
> +	case -EINVAL:
> +		fprintf(stderr, "invalid queue index (%u), out of range\n",
> +			res->queue_id);
> +		return;
> +	case -EIO:
> +		fprintf(stderr, "failed to query SQ context\n");
> +		return;
> +	default:
> +		fprintf(stderr, "query failed (%d)\n", ret);
> +		return;
> +	}
> +	fprintf(stdout, "Port %u Txq %u rate limit info:\n",
> +		res->port_id, res->queue_id);
> +	if (info.rate_mbps > 0)
> +		fprintf(stdout, "  Configured rate: %u Mbps\n",
> +			info.rate_mbps);
> +	else
> +		fprintf(stdout, "  Configured rate: disabled\n");
> +	fprintf(stdout, "  PP index (driver): %u\n", info.pp_index);
> +	fprintf(stdout, "  PP index (FW readback): %u\n", info.fw_pp_index); }
> +
> +cmdline_parse_inst_t mlx5_cmd_show_rate_limit = {
> +	.f = mlx5_cmd_show_rate_limit_parsed,
> +	.data = NULL,
> +	.help_str = "mlx5 port <port_id> txq <queue_id> rate show",
> +	.tokens = {
> +		(void *)&mlx5_cmd_show_rate_limit_mlx5,
> +		(void *)&mlx5_cmd_show_rate_limit_port,
> +		(void *)&mlx5_cmd_show_rate_limit_port_id,
> +		(void *)&mlx5_cmd_show_rate_limit_txq,
> +		(void *)&mlx5_cmd_show_rate_limit_queue_id,
> +		(void *)&mlx5_cmd_show_rate_limit_rate,
> +		(void *)&mlx5_cmd_show_rate_limit_show,
> +		NULL,
> +	}
> +};
> +
>  static struct testpmd_driver_commands mlx5_driver_cmds = {
>  	.commands = {
>  		{
> @@ -1440,6 +1528,11 @@ static struct testpmd_driver_commands
> mlx5_driver_cmds = {
>  			.help = "mlx5 port (port_id) queue (queue_id) dump
> rq_context (file_name)\n"
>  				"    Dump mlx5 RQ Context\n\n",
>  		},
> +		{
> +			.ctx = &mlx5_cmd_show_rate_limit,
> +			.help = "mlx5 port (port_id) txq (queue_id) rate
> show\n"
> +				"    Show per-queue rate limit PP index\n\n",
> +		},
>  		{
>  			.ctx = NULL,
>  		},
> diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c index
> 8085b5c306..7d71782d33 100644
> --- a/drivers/net/mlx5/mlx5_tx.c
> +++ b/drivers/net/mlx5/mlx5_tx.c
> @@ -800,7 +800,7 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t
> port_id, uint16_t queue_id, const ch
>  	if (!rte_eth_dev_is_valid_port(port_id))
>  		return -ENODEV;
> 
> -	if (rte_eth_tx_queue_is_valid(port_id, queue_id))
> +	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
>  		return -EINVAL;
> 
>  	fd = fopen(path, "w");
> @@ -848,3 +848,41 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t
> port_id, uint16_t queue_id, const ch
>  	fclose(fd);
>  	return ret;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_txq_rate_limit_query,
> +26.07) int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t
> queue_id,
> +				       struct rte_pmd_mlx5_txq_rate_limit_info
> *info) {
> +	struct rte_eth_dev *dev;
> +	struct mlx5_priv *priv;
> +	struct mlx5_txq_data *txq_data;
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +	uint32_t sq_out[MLX5_ST_SZ_DW(query_sq_out)] = {0};
> +	int ret;
> +
> +	if (info == NULL)
> +		return -EINVAL;
> +	if (!rte_eth_dev_is_valid_port(port_id))
> +		return -ENODEV;
> +	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
> +		return -EINVAL;
> +	dev = &rte_eth_devices[port_id];
> +	priv = dev->data->dev_private;
> +	txq_data = (*priv->txqs)[queue_id];
> +	txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
> +	info->rate_mbps = txq_ctrl->rate_limit.rate_mbps;
> +	info->pp_index = txq_ctrl->rate_limit.pp_id;
> +	if (txq_ctrl->obj == NULL) {
> +		info->fw_pp_index = 0;
> +		return 0;
> +	}
> +	ret = mlx5_devx_cmd_query_sq(txq_ctrl->obj->sq_obj.sq,
> +				     sq_out, sizeof(sq_out));
> +	if (ret)
> +		return -EIO;
> +	info->fw_pp_index = MLX5_GET(sqc,
> +				     MLX5_ADDR_OF(query_sq_out, sq_out,
> +						  sq_context),
> +				     packet_pacing_rate_limit_index);
> +	return 0;
> +}
> diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h
> b/drivers/net/mlx5/rte_pmd_mlx5.h index 7acfdae97d..698d7d2032 100644
> --- a/drivers/net/mlx5/rte_pmd_mlx5.h
> +++ b/drivers/net/mlx5/rte_pmd_mlx5.h
> @@ -420,6 +420,36 @@ __rte_experimental
>  int
>  rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id,
> const char *filename);
> 
> +/**
> + * Per-queue rate limit information.
> + */
> +struct rte_pmd_mlx5_txq_rate_limit_info {
> +	uint32_t rate_mbps;	/**< Configured rate in Mbps, 0 = disabled. */
> +	uint16_t pp_index;	/**< PP index from driver state. */
> +	uint16_t fw_pp_index;	/**< PP index read back from FW SQ context.
> */
> +};
> +
> +/**
> + * Query per-queue rate limit state for a given Tx queue.
> + *
> + * @param[in] port_id
> + *   Port ID.
> + * @param[in] queue_id
> + *   Tx queue ID.
> + * @param[out] info
> + *   Rate limit information.
> + *
> + * @return
> + *   0 on success, negative errno on failure:
> + *   - -ENODEV: invalid port_id.
> + *   - -EINVAL: invalid queue_id.
> + *   - -EIO: FW query failed.
> + */
> +__rte_experimental
> +int
> +rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
> +				  struct rte_pmd_mlx5_txq_rate_limit_info
> *info);
> +
>  /** Type of mlx5 driver event for which custom callback is called. */  enum
> rte_pmd_mlx5_driver_event_cb_type {
>  	/** Called after HW Rx queue is created. */
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-22 13:46     ` [PATCH v4 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-23 13:19       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 13:19 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 08/10] ethdev: add getter for per-queue Tx rate limit
> 
> The existing rte_eth_set_queue_rate_limit() API allows setting a per-queue Tx
> rate but provides no way to read it back. Applications such as grout are forced
> to maintain a shadow copy of the rate to be able to report it.
> 
> Add rte_eth_get_queue_rate_limit() as the symmetric getter, following the
> established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
> rte_eth_dev_set_vlan_offload/get_vlan_offload).
> 
> This adds:
> - eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
> - rte_eth_get_queue_rate_limit() public experimental API (26.07)
> - Trace point matching the existing setter pattern
> - Generic testpmd command: show port <id> queue <id> rate
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  app/test-pmd/cmdline.c           | 69 ++++++++++++++++++++++++++++++++
>  lib/ethdev/ethdev_driver.h       |  7 ++++
>  lib/ethdev/ethdev_trace.h        |  9 +++++
>  lib/ethdev/ethdev_trace_points.c |  3 ++
>  lib/ethdev/rte_ethdev.c          | 35 ++++++++++++++++
>  lib/ethdev/rte_ethdev.h          | 24 +++++++++++
>  6 files changed, 147 insertions(+)
> 
> diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c index
> c5abeb5730..cc9c462498 100644
> --- a/app/test-pmd/cmdline.c
> +++ b/app/test-pmd/cmdline.c
> @@ -8982,6 +8982,74 @@ static cmdline_parse_inst_t cmd_queue_rate_limit
> = {
>  	},
>  };
> 
> +/* *** SHOW RATE LIMIT FOR A QUEUE OF A PORT *** */ struct
> +cmd_show_queue_rate_limit_result {
> +	cmdline_fixed_string_t show;
> +	cmdline_fixed_string_t port;
> +	uint16_t port_num;
> +	cmdline_fixed_string_t queue;
> +	uint16_t queue_num;
> +	cmdline_fixed_string_t rate;
> +};
> +
> +static void cmd_show_queue_rate_limit_parsed(void *parsed_result,
> +		__rte_unused struct cmdline *cl,
> +		__rte_unused void *data)
> +{
> +	struct cmd_show_queue_rate_limit_result *res = parsed_result;
> +	uint32_t tx_rate = 0;
> +	int ret;
> +
> +	ret = rte_eth_get_queue_rate_limit(res->port_num, res->queue_num,
> +					   &tx_rate);
> +	if (ret) {
> +		fprintf(stderr, "Get queue rate limit failed: %s\n",
> +			rte_strerror(-ret));
> +		return;
> +	}
> +	if (tx_rate)
> +		printf("Port %u Queue %u rate limit: %u Mbps\n",
> +		       res->port_num, res->queue_num, tx_rate);
> +	else
> +		printf("Port %u Queue %u rate limit: disabled\n",
> +		       res->port_num, res->queue_num); }
> +
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_show =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				show, "show");
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_port =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				port, "port");
> +static cmdline_parse_token_num_t cmd_show_queue_rate_limit_portnum =
> +	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
> +				port_num, RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_queue =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				queue, "queue");
> +static cmdline_parse_token_num_t cmd_show_queue_rate_limit_queuenum
> =
> +	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
> +				queue_num, RTE_UINT16);
> +static cmdline_parse_token_string_t cmd_show_queue_rate_limit_rate =
> +	TOKEN_STRING_INITIALIZER(struct
> cmd_show_queue_rate_limit_result,
> +				rate, "rate");
> +
> +static cmdline_parse_inst_t cmd_show_queue_rate_limit = {
> +	.f = cmd_show_queue_rate_limit_parsed,
> +	.data = NULL,
> +	.help_str = "show port <port_id> queue <queue_id> rate: "
> +		"Show rate limit for a queue on port_id",
> +	.tokens = {
> +		(void *)&cmd_show_queue_rate_limit_show,
> +		(void *)&cmd_show_queue_rate_limit_port,
> +		(void *)&cmd_show_queue_rate_limit_portnum,
> +		(void *)&cmd_show_queue_rate_limit_queue,
> +		(void *)&cmd_show_queue_rate_limit_queuenum,
> +		(void *)&cmd_show_queue_rate_limit_rate,
> +		NULL,
> +	},
> +};
> +
>  /* *** SET RATE LIMIT FOR A VF OF A PORT *** */  struct
> cmd_vf_rate_limit_result {
>  	cmdline_fixed_string_t set;
> @@ -14270,6 +14338,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
>  	&cmd_set_uc_all_hash_filter,
>  	&cmd_vf_mac_addr_filter,
>  	&cmd_queue_rate_limit,
> +	&cmd_show_queue_rate_limit,
>  	&cmd_tunnel_udp_config,
>  	&cmd_showport_rss_hash,
>  	&cmd_showport_rss_hash_key,
> diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index
> 1255cd6f2c..0f336f9567 100644
> --- a/lib/ethdev/ethdev_driver.h
> +++ b/lib/ethdev/ethdev_driver.h
> @@ -762,6 +762,11 @@ typedef int (*eth_set_queue_rate_limit_t)(struct
> rte_eth_dev *dev,
>  				uint16_t queue_idx,
>  				uint32_t tx_rate);
> 
> +/** @internal Get queue Tx rate. */
> +typedef int (*eth_get_queue_rate_limit_t)(struct rte_eth_dev *dev,
> +				uint16_t queue_idx,
> +				uint32_t *tx_rate);
> +
>  /** @internal Add tunneling UDP port. */  typedef int
> (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
>  					 struct rte_eth_udp_tunnel
> *tunnel_udp); @@ -1522,6 +1527,8 @@ struct eth_dev_ops {
> 
>  	/** Set queue rate limit */
>  	eth_set_queue_rate_limit_t set_queue_rate_limit;
> +	/** Get queue rate limit */
> +	eth_get_queue_rate_limit_t get_queue_rate_limit;
> 
>  	/** Configure RSS hash protocols and hashing key */
>  	rss_hash_update_t          rss_hash_update;
> diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h index
> 482befc209..6554cc1a21 100644
> --- a/lib/ethdev/ethdev_trace.h
> +++ b/lib/ethdev/ethdev_trace.h
> @@ -908,6 +908,15 @@ RTE_TRACE_POINT(
>  	rte_trace_point_emit_int(ret);
>  )
> 
> +RTE_TRACE_POINT(
> +	rte_eth_trace_get_queue_rate_limit,
> +	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_idx,
> +		int ret),
> +	rte_trace_point_emit_u16(port_id);
> +	rte_trace_point_emit_u16(queue_idx);
> +	rte_trace_point_emit_int(ret);
> +)
> +
>  RTE_TRACE_POINT(
>  	rte_eth_trace_rx_avail_thresh_set,
>  	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id, diff --
> git a/lib/ethdev/ethdev_trace_points.c b/lib/ethdev/ethdev_trace_points.c
> index 071c508327..0a28378a56 100644
> --- a/lib/ethdev/ethdev_trace_points.c
> +++ b/lib/ethdev/ethdev_trace_points.c
> @@ -347,6 +347,9 @@
> RTE_TRACE_POINT_REGISTER(rte_ethdev_trace_uc_all_hash_table_set,
>  RTE_TRACE_POINT_REGISTER(rte_eth_trace_set_queue_rate_limit,
>  	lib.ethdev.set_queue_rate_limit)
> 
> +RTE_TRACE_POINT_REGISTER(rte_eth_trace_get_queue_rate_limit,
> +	lib.ethdev.get_queue_rate_limit)
> +
>  RTE_TRACE_POINT_REGISTER(rte_eth_trace_rx_avail_thresh_set,
>  	lib.ethdev.rx_avail_thresh_set)
> 
> diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index
> 2edc7a362e..5e763e1855 100644
> --- a/lib/ethdev/rte_ethdev.c
> +++ b/lib/ethdev/rte_ethdev.c
> @@ -5694,6 +5694,41 @@ int rte_eth_set_queue_rate_limit(uint16_t
> port_id, uint16_t queue_idx,
>  	return ret;
>  }
> 
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_get_queue_rate_limit, 26.07)
> int
> +rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
> +					uint32_t *tx_rate)
> +{
> +	struct rte_eth_dev *dev;
> +	int ret;
> +
> +	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
> +	dev = &rte_eth_devices[port_id];
> +
> +	if (tx_rate == NULL) {
> +		RTE_ETHDEV_LOG_LINE(ERR,
> +			"Get queue rate limit:port %u: NULL tx_rate pointer",
> +			port_id);
> +		return -EINVAL;
> +	}
> +
> +	if (queue_idx >= dev->data->nb_tx_queues) {
> +		RTE_ETHDEV_LOG_LINE(ERR,
> +			"Get queue rate limit:port %u: invalid queue ID=%u",
> +			port_id, queue_idx);
> +		return -EINVAL;
> +	}
> +
> +	if (dev->dev_ops->get_queue_rate_limit == NULL)
> +		return -ENOTSUP;
> +	ret = eth_err(port_id,
> +		      dev->dev_ops->get_queue_rate_limit(dev, queue_idx,
> +							 tx_rate));
> +
> +	rte_eth_trace_get_queue_rate_limit(port_id, queue_idx, ret);
> +
> +	return ret;
> +}
> +
>  RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_rx_avail_thresh_set, 22.07)
> int rte_eth_rx_avail_thresh_set(uint16_t port_id, uint16_t queue_id,
>  			       uint8_t avail_thresh)
> diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index
> 0d8e2d0236..e525217b77 100644
> --- a/lib/ethdev/rte_ethdev.h
> +++ b/lib/ethdev/rte_ethdev.h
> @@ -4817,6 +4817,30 @@ int rte_eth_dev_uc_all_hash_table_set(uint16_t
> port_id, uint8_t on);  int rte_eth_set_queue_rate_limit(uint16_t port_id,
> uint16_t queue_idx,
>  			uint32_t tx_rate);
> 
> +/**
> + * @warning
> + * @b EXPERIMENTAL: this API may change, or be removed, without prior
> notice.
> + *
> + * Get the rate limitation for a queue on an Ethernet device.
> + *
> + * @param port_id
> + *   The port identifier of the Ethernet device.
> + * @param queue_idx
> + *   The queue ID.
> + * @param[out] tx_rate
> + *   A pointer to retrieve the Tx rate in Mbps.
> + *   0 means rate limiting is disabled.
> + * @return
> + *   - (0) if successful.
> + *   - (-ENOTSUP) if hardware doesn't support this feature.
> + *   - (-ENODEV) if *port_id* invalid.
> + *   - (-EIO) if device is removed.
> + *   - (-EINVAL) if bad parameter.
> + */
> +__rte_experimental
> +int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
> +			uint32_t *tx_rate);
> +
>  /**
>   * Configuration of Receive Side Scaling hash computation of Ethernet device.
>   *
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter
  2026-03-22 13:46     ` [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
@ 2026-03-23 13:20       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 13:20 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter
> 
> Wire the mlx5 PMD to the new rte_eth_get_queue_rate_limit() ethdev callback.
> The implementation reads the per-queue rate_mbps tracking field from the
> txq_ctrl structure.
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5.c     |  2 ++
>  drivers/net/mlx5/mlx5_tx.h  |  2 ++
>  drivers/net/mlx5/mlx5_txq.c | 30 ++++++++++++++++++++++++++++++
>  3 files changed, 34 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c index
> 7d08d7886b..f5784761f9 100644
> --- a/drivers/net/mlx5/mlx5.c
> +++ b/drivers/net/mlx5/mlx5.c
> @@ -2652,6 +2652,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
>  	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
>  	.get_restore_flags = mlx5_get_restore_flags,
>  	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
> +	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
>  };
> 
>  /* Available operations from secondary process. */ @@ -2746,6 +2747,7 @@
> const struct eth_dev_ops mlx5_dev_ops_isolate = {
>  	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
>  	.get_restore_flags = mlx5_get_restore_flags,
>  	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
> +	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
>  };
> 
>  /**
> diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h index
> 975ff57acd..02feb9e6fd 100644
> --- a/drivers/net/mlx5/mlx5_tx.h
> +++ b/drivers/net/mlx5/mlx5_tx.h
> @@ -224,6 +224,8 @@ int mlx5_txq_releasable(struct rte_eth_dev *dev,
> uint16_t idx);  int mlx5_txq_verify(struct rte_eth_dev *dev);  int
> mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
>  			      uint32_t tx_rate);
> +int mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			      uint32_t *tx_rate);
>  int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);  void
> mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);  void
> mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl); diff --git
> a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c index
> ce08363ca9..867ea4b994 100644
> --- a/drivers/net/mlx5/mlx5_txq.c
> +++ b/drivers/net/mlx5/mlx5_txq.c
> @@ -1481,6 +1481,36 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev
> *dev, uint16_t queue_idx,
>  	return 0;
>  }
> 
> +/**
> + * Get per-queue packet pacing rate limit.
> + *
> + * @param dev
> + *   Pointer to Ethernet device.
> + * @param queue_idx
> + *   TX queue index.
> + * @param[out] tx_rate
> + *   Pointer to store the TX rate in Mbps, 0 if rate limiting is disabled.
> + *
> + * @return
> + *   0 on success, a negative errno value otherwise and rte_errno is set.
> + */
> +int
> +mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
> +			  uint32_t *tx_rate)
> +{
> +	struct mlx5_priv *priv = dev->data->dev_private;
> +	struct mlx5_txq_ctrl *txq_ctrl;
> +
> +	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
> +		rte_errno = EINVAL;
> +		return -rte_errno;
> +	}
> +	txq_ctrl = container_of((*priv->txqs)[queue_idx],
> +				struct mlx5_txq_ctrl, txq);
> +	*tx_rate = txq_ctrl->rate_limit.rate_mbps;
> +	return 0;
> +}
> +
>  /**
>   * Verify if the queue can be released.
>   *
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* RE: [PATCH v4 10/10] net/mlx5: add rate table capacity query API
  2026-03-22 13:46     ` [PATCH v4 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
@ 2026-03-23 13:20       ` Slava Ovsiienko
  0 siblings, 0 replies; 87+ messages in thread
From: Slava Ovsiienko @ 2026-03-23 13:20 UTC (permalink / raw)
  To: Vincent Jardin, dev@dpdk.org
  Cc: Raslan Darawsheh, NBU-Contact-Thomas Monjalon (EXTERNAL),
	andrew.rybchenko@oktetlabs.ru, Dariusz Sosnowski, Bing Zhao,
	Ori Kam, Suanming Mou, Matan Azrad, stephen@networkplumber.org,
	aman.deep.singh@intel.com

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

> -----Original Message-----
> From: Vincent Jardin <vjardin@free.fr>
> Sent: Sunday, March 22, 2026 3:46 PM
> To: dev@dpdk.org
> Cc: Raslan Darawsheh <rasland@nvidia.com>; NBU-Contact-Thomas Monjalon
> (EXTERNAL) <thomas@monjalon.net>; andrew.rybchenko@oktetlabs.ru;
> Dariusz Sosnowski <dsosnowski@nvidia.com>; Slava Ovsiienko
> <viacheslavo@nvidia.com>; Bing Zhao <bingz@nvidia.com>; Ori Kam
> <orika@nvidia.com>; Suanming Mou <suanmingm@nvidia.com>; Matan Azrad
> <matan@nvidia.com>; stephen@networkplumber.org;
> aman.deep.singh@intel.com; Vincent Jardin <vjardin@free.fr>
> Subject: [PATCH v4 10/10] net/mlx5: add rate table capacity query API
> 
> Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet pacing
> rate table size and how many entries are used by this port.
> 
> The total comes from the HCA QoS capability packet_pacing_rate_table_size.
> The port_used count is derived by collecting unique non-zero PP indices across
> this port's TX queues.
> 
> The rate table is a global shared HW resource: firmware, kernel, other DPDK
> ports on the same device, and other application instances may all consume
> entries. The port_used count is therefore a lower bound of actual HW usage.
> 
> With shared PP allocation (flags=0), the kernel mlx5 driver reuses a single rate
> table entry for all PP contexts with identical parameters (rate, burst, packet
> size). Multiple queues configured with the same rate share one pp_id, so
> port_used counts unique entries, not the number of queues with rate limiting
> enabled.
> 
> Applications that need device-wide visibility should query all ports on the same
> physical device and aggregate the results, similar to how the kernel mlx5 driver
> tracks usage internally.
> 
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---
>  drivers/net/mlx5/mlx5_tx.c      | 64 +++++++++++++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h | 44 +++++++++++++++++++++++
>  2 files changed, 108 insertions(+)
> 
> diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c index
> 7d71782d33..615b792836 100644
> --- a/drivers/net/mlx5/mlx5_tx.c
> +++ b/drivers/net/mlx5/mlx5_tx.c
> @@ -19,6 +19,7 @@
> 
>  #include <mlx5_prm.h>
>  #include <mlx5_common.h>
> +#include <mlx5_malloc.h>
> 
>  #include "mlx5_autoconf.h"
>  #include "mlx5_defs.h"
> @@ -886,3 +887,66 @@ int rte_pmd_mlx5_txq_rate_limit_query(uint16_t
> port_id, uint16_t queue_id,
>  				     packet_pacing_rate_limit_index);
>  	return 0;
>  }
> +
> +RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query,
> 26.07)
> +int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
> +				     struct rte_pmd_mlx5_pp_rate_table_info
> *info) {
> +	struct rte_eth_dev *dev;
> +	struct mlx5_priv *priv;
> +	uint16_t used = 0;
> +	uint16_t *seen;
> +	unsigned int i;
> +
> +	if (info == NULL)
> +		return -EINVAL;
> +	if (!rte_eth_dev_is_valid_port(port_id))
> +		return -ENODEV;
> +	dev = &rte_eth_devices[port_id];
> +	priv = dev->data->dev_private;
> +	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
> +		rte_errno = ENOTSUP;
> +		return -ENOTSUP;
> +	}
> +	info->total = priv->sh->cdev-
> >config.hca_attr.qos.packet_pacing_rate_table_size;
> +	if (priv->txqs == NULL || priv->txqs_n == 0) {
> +		info->port_used = 0;
> +		return 0;
> +	}
> +	seen = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
> +			   priv->txqs_n * sizeof(*seen), 0, SOCKET_ID_ANY);
> +	if (seen == NULL)
> +		return -ENOMEM;
> +	/*
> +	 * Count unique non-zero PP indices across this port's TX queues.
> +	 * Note: the count reflects only queues on this port; other ports
> +	 * sharing the same device may also consume rate table entries.
> +	 */
> +	for (i = 0; i < priv->txqs_n; i++) {
> +		struct mlx5_txq_data *txq_data;
> +		struct mlx5_txq_ctrl *txq_ctrl;
> +		uint16_t pp_id;
> +		uint16_t j;
> +		bool dup;
> +
> +		if ((*priv->txqs)[i] == NULL)
> +			continue;
> +		txq_data = (*priv->txqs)[i];
> +		txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
> +		pp_id = txq_ctrl->rate_limit.pp_id;
> +		if (pp_id == 0)
> +			continue;
> +		dup = false;
> +		for (j = 0; j < used; j++) {
> +			if (seen[j] == pp_id) {
> +				dup = true;
> +				break;
> +			}
> +		}
> +		if (!dup)
> +			seen[used++] = pp_id;
> +	}
> +	mlx5_free(seen);
> +	info->port_used = used;
> +	return 0;
> +}
> diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h
> b/drivers/net/mlx5/rte_pmd_mlx5.h index 698d7d2032..621d8c2b15 100644
> --- a/drivers/net/mlx5/rte_pmd_mlx5.h
> +++ b/drivers/net/mlx5/rte_pmd_mlx5.h
> @@ -450,6 +450,50 @@ int
>  rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
>  				  struct rte_pmd_mlx5_txq_rate_limit_info
> *info);
> 
> +/**
> + * Packet pacing rate table capacity information.
> + */
> +struct rte_pmd_mlx5_pp_rate_table_info {
> +	uint16_t total;		/**< Total HW rate table entries. */
> +	uint16_t port_used;	/**< Entries used by this port's TX queues. */
> +};
> +
> +/**
> + * Query packet pacing rate table capacity.
> + *
> + * The ``port_used`` count reflects only unique PP indices allocated
> + * by the queried port's TX queues. It is a lower bound of actual HW
> + * usage because the rate table is a global shared resource:
> + * - Other DPDK ports on the same physical device may hold entries.
> + * - The kernel mlx5 driver and firmware may also consume entries.
> + * - Multiple DPDK application instances may share the device.
> + *
> + * When multiple queues on the same port are configured with identical
> + * rate parameters, the kernel shares a single rate table entry across
> + * them (with flags=0 allocation), so ``port_used`` counts unique
> + * entries, not the number of queues with rate limiting enabled.
> + *
> + * Applications that need device-wide visibility should query all
> + * ports on the same physical device and aggregate the results,
> + * similar to how the kernel mlx5 driver tracks usage internally.
> + *
> + * @param[in] port_id
> + *   Port ID.
> + * @param[out] info
> + *   Rate table capacity information.
> + *
> + * @return
> + *   0 on success, negative errno on failure:
> + *   - -ENODEV: invalid port_id.
> + *   - -EINVAL: info is NULL.
> + *   - -ENOTSUP: packet pacing not supported.
> + *   - -ENOMEM: allocation failure.
> + */
> +__rte_experimental
> +int
> +rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
> +				 struct rte_pmd_mlx5_pp_rate_table_info
> *info);
> +
>  /** Type of mlx5 driver event for which custom callback is called. */  enum
> rte_pmd_mlx5_driver_event_cb_type {
>  	/** Called after HW Rx queue is created. */
> --
> 2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v4 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (9 preceding siblings ...)
  2026-03-22 13:46     ` [PATCH v4 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
@ 2026-03-23 23:09     ` Stephen Hemminger
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
  11 siblings, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-23 23:09 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan, aman.deep.singh

On Sun, 22 Mar 2026 14:46:19 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
> hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
> ethdev API to read back the configured rate.
> 
> Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
> rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a PP
> context per queue from the HW rate table, programs the PP index into the
> SQ via modify_sq, and relies on the kernel to share identical rates
> across PP contexts to conserve table entries. A PMD-specific API exposes
> per-queue PP diagnostics and rate table capacity.
> 
> Patch breakdown:
> 
>   01/10 doc/nics/mlx5: fix stale packet pacing documentation
>   02/10 common/mlx5: query packet pacing rate table capabilities
>   03/10 common/mlx5: extend SQ modify to support rate limit update
>   04/10 net/mlx5: add per-queue packet pacing infrastructure
>   05/10 net/mlx5: support per-queue rate limiting
>   06/10 net/mlx5: add burst pacing devargs
>   07/10 net/mlx5: add testpmd command to query per-queue rate limit
>   08/10 ethdev: add getter for per-queue Tx rate limit
>   09/10 net/mlx5: implement per-queue Tx rate limit getter
>   10/10 net/mlx5: add rate table capacity query API
> 
> Release notes for the new ethdev API and mlx5 per-queue rate
> limiting can be added to a release_26_07.rst once the file is
> created at the start of the 26.07 development cycle.
> 
> Changes since v3:
> 
>   Addressed review feedback from Stephen and Slava (nvidia/Mellanox).
> 
>   Patch 02/10 (query caps):
>   - Added Acked-by: Viacheslav Ovsiienko
> 
>   Patch 03/10 (SQ modify):
>   - Define MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX
>     enum in mlx5_prm.h, following the MLX5_MODIFY_RQ_IN_MODIFY_xxx pattern
>   - Use read-modify-write for modify_bitmask (MLX5_GET64 | OR | MLX5_SET64)
>     instead of direct overwrite, for forward compatibility
> 
>   Patch 04/10 (PP infrastructure):
>   - Rename struct member and parameters from "rl" to "rate_limit"
>     for consistency with codebase naming style
>   - Replace MLX5_ASSERT(rate_mbps > 0) with runtime check returning
>     -EINVAL in non-debug builds
>   - Move mlx5_txq_free_pp_rate_limit() to after txq_obj_release() in
>     mlx5_txq_release() — destroy the SQ before freeing the PP index
>     it references
>   - Clarify commit message: distinct PP handle per queue (for cleanup)
>     but kernel shares the same pp_id for identical rate parameters
> 
>   Patch 05/10 (set rate):
>   - Fix obj->sq vs obj->sq_obj.sq: use obj->sq_obj.sq from the start
>     for non-hairpin queues (was introduced in patch 07 in v3, breaking
>     git bisect)
>   - Move all variable declarations to block top (sq_devx,
>     new_rate_limit)
>   - Add queue state check: reject set_queue_rate_limit if queue is not
>     STARTED (SQ not in RDY state)
>   - Update mlx5 feature matrix: Rate limitation = Y
>   - Add Per-Queue Tx Rate Limiting documentation section in mlx5.rst
>     covering DevX requirement, hardware support, rate table sharing,
>     and testpmd usage
> 
>   Patch 06/10 (burst devargs):
>   - Remove burst_upper_bound/typical_packet_size from Clock Queue
>     path (mlx5_txpp_alloc_pp_index) — Clock Queue uses WQE rate
>     pacing and does not need these parameters
>   - Update commit message and documentation accordingly
> 
>   Patch 07/10 (testpmd + PMD query):
>   - sq_obj.sq accessor change moved to patch 05 (see above)
>   - sq_devx declaration moved to block top
> 
>   Patch 08/10 (ethdev getter) — split from v3 patch 08:
>   - Split into ethdev API (this patch) and mlx5 driver (patch 09)
>   - Add rte_eth_trace_get_queue_rate_limit() trace point matching
>     the existing setter pattern
> 
>   Patch 09/10 — NEW (was part of v3 patch 08):
>   - mlx5 driver implementation of get_queue_rate_limit callback,
>     split out per Slava's request
> 
>   Patch 10/10 (rate table query):
>   - Rename struct field "used" to "port_used" to clarify per-port
>     scope
>   - Strengthen Doxygen: rate table is a global shared HW resource
>     (firmware, kernel, other DPDK instances may consume entries);
>     port_used is a lower bound
>   - Document PP sharing behavior with flags=0
>   - Note that applications should aggregate across ports for
>     device-wide visibility
> 
> Changes since v2:
> 
>   Addressed review feedback from Stephen Hemminger:
> 
>   Patch 04: cleaned redundant cast parentheses on (struct mlx5dv_pp *)
>   Patch 04: consolidated dv_alloc_pp call onto one line
>   Patch 05+08: removed redundant queue_idx bounds checks from driver
>     callbacks — ethdev layer is the single validation point
>   Patch 07: added generic testpmd command: show port <id> queue <id> rate
>   Patch 08+10: removed release notes from release_26_03.rst (targets 26.07)
>   Patch 10: use MLX5_MEM_SYS | MLX5_MEM_ZERO for heap allocation
>   Patch 10: consolidated packet_pacing_rate_table_size onto one line
> 
> Changes since v1:
> 
>   Patch 01: Acked-by Viacheslav Ovsiienko
>   Patch 04: rate bounds validation, uint64_t overflow fix, remove
>     early PP free
>   Patch 05: PP leak fix (temp struct pattern), rte_errno in error paths
>   Patch 07: inverted rte_eth_tx_queue_is_valid() check
>   Patch 10: stack array replaced with heap, per-port scope documented
> 
> Testing:
> 
>   - Build: GCC, no warnings
>   - Hardware: ConnectX-6 Dx
>   - DevX path (default): set/get/disable rate limiting verified
>   - Verbs path (dv_flow_en=0): returns -EINVAL cleanly (SQ DevX
>     object not available), no crash
> 
> Vincent Jardin (10):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: implement per-queue Tx rate limit getter
>   net/mlx5: add rate table capacity query API
> 
> Vincent Jardin (10):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: implement per-queue Tx rate limit getter
>   net/mlx5: add rate table capacity query API
> 
>  app/test-pmd/cmdline.c               |  69 ++++++++++
>  doc/guides/nics/features/mlx5.ini    |   1 +
>  doc/guides/nics/mlx5.rst             | 180 ++++++++++++++++++++++-----
>  drivers/common/mlx5/mlx5_devx_cmds.c |  23 ++++
>  drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
>  drivers/common/mlx5/mlx5_prm.h       |   7 ++
>  drivers/net/mlx5/mlx5.c              |  46 +++++++
>  drivers/net/mlx5/mlx5.h              |  13 ++
>  drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c           | 104 +++++++++++++++-
>  drivers/net/mlx5/mlx5_tx.h           |   5 +
>  drivers/net/mlx5/mlx5_txpp.c         |  84 +++++++++++++
>  drivers/net/mlx5/mlx5_txq.c          | 149 ++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h      |  74 +++++++++++
>  lib/ethdev/ethdev_driver.h           |   7 ++
>  lib/ethdev/ethdev_trace.h            |   9 ++
>  lib/ethdev/ethdev_trace_points.c     |   3 +
>  lib/ethdev/rte_ethdev.c              |  35 ++++++
>  lib/ethdev/rte_ethdev.h              |  24 ++++
>  19 files changed, 906 insertions(+), 33 deletions(-)
> 


I didn't see anything wrong, but AI nit picking did.



---

## Review: [PATCH v4 01-10/10] mlx5 per-queue rate limiting

### Patch 05/10 — `net/mlx5: support per-queue rate limiting`

**Error (Correctness): Data truncation in `rate_kbps` → PRM `rate_limit` field**

In `mlx5_txq_alloc_pp_rate_limit()`, `rate_kbps` is computed as `(uint64_t)rate_mbps * 1000` and then stored with `MLX5_SET(..., rate_limit, (uint32_t)rate_kbps)`. When `qos->packet_pacing_max_rate` is zero (HW doesn't report a maximum), the range check is skipped, and any `rate_mbps` value above ~4,294,967 will silently overflow the `uint32_t` cast, programming a completely wrong rate into the hardware.

For example, `rate_mbps = 5000000` (5 Tbps — unlikely but allowed by the `uint32_t` parameter type) yields `rate_kbps = 5,000,000,000` which truncates to `705,032,704` kbps.

Suggested fix: add an explicit check that `rate_kbps` fits in `uint32_t` before the cast, or cap it:

```c
if (rate_kbps > UINT32_MAX) {
    DRV_LOG(ERR, "Rate %u Mbps overflows PRM rate_limit field.", rate_mbps);
    rte_errno = ERANGE;
    return -ERANGE;
}
```

**Warning (Correctness): Missing NULL check on `txq_data` in `rte_pmd_mlx5_txq_rate_limit_query()` (patch 07)**

After `rte_eth_tx_queue_is_valid()` confirms the index is in range, the code does:

```c
txq_data = (*priv->txqs)[queue_id];
txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
```

If the queue index is valid but the queue has been released (entry is NULL), `container_of` on a NULL pointer produces an invalid pointer, and subsequent dereferences are undefined behavior. The setter in patch 05 checks `(*priv->txqs)[queue_idx] == NULL` but the query function does not.

Suggested fix: add a NULL check on `txq_data` before the `container_of`, similar to what the setter does.

The same issue exists in `rte_pmd_mlx5_pp_rate_table_query()` (patch 10) but there the loop body does check `(*priv->txqs)[i] == NULL` and skips, so that one is fine.

### Patch 09/10 — `net/mlx5: implement per-queue Tx rate limit getter`

**Warning (Correctness): Missing NULL check on queue data in `mlx5_get_queue_rate_limit()`**

The getter checks `priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL` which is good. However, the ethdev layer (`rte_eth_get_queue_rate_limit()` in patch 08) only checks `queue_idx >= dev->data->nb_tx_queues`, not whether `priv->txqs` is allocated. The driver-side check handles this, so this is actually fine — noting for completeness.

### Patch 01/10 — `doc/nics/mlx5: fix stale packet pacing documentation`

**Warning: Missing `Cc: stable@dpdk.org`**

The patch has a `Fixes:` tag referencing a previous commit but is missing `Cc: stable@dpdk.org` for backport consideration, even though it's a documentation-only fix.

### Patches 02, 03, 04, 06, 08, 10

No correctness issues found. The code is well-structured with proper error handling, resource cleanup on failure paths, and appropriate validation.

### General observations

The series is well organized with clean separation of concerns across patches. Error paths in `mlx5_set_queue_rate_limit()` properly clean up the temporary `new_rate_limit` PP allocation on modify-SQ failure, and `mlx5_txq_release()` correctly frees the rate limit context on queue teardown. The `rte_errno = -ret` pattern after `mlx5_devx_cmd_modify_sq()` is consistent with how other mlx5 callers handle that function's negative errno return convention.

^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
                       ` (10 preceding siblings ...)
  2026-03-23 23:09     ` [PATCH v4 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
@ 2026-03-24 16:50     ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
                         ` (10 more replies)
  11 siblings, 11 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin


This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
ethdev API to read back the configured rate.

Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a PP
context per queue from the HW rate table, programs the PP index into the
SQ via modify_sq, and relies on the kernel to share identical rates
across PP contexts to conserve table entries. A PMD-specific API exposes
per-queue PP diagnostics and rate table capacity.

Patch breakdown:

  01/10 doc/nics/mlx5: fix stale packet pacing documentation
  02/10 common/mlx5: query packet pacing rate table capabilities
  03/10 common/mlx5: extend SQ modify to support rate limit update
  04/10 net/mlx5: add per-queue packet pacing infrastructure
  05/10 net/mlx5: support per-queue rate limiting
  06/10 net/mlx5: add burst pacing devargs
  07/10 net/mlx5: add testpmd command to query per-queue rate limit
  08/10 ethdev: add getter for per-queue Tx rate limit
  09/10 net/mlx5: implement per-queue Tx rate limit getter
  10/10 net/mlx5: add rate table capacity query API

Release notes for the new ethdev API and mlx5 per-queue rate
limiting can be added to a release_26_07.rst once the file is
created at the start of the 26.07 development cycle.

Changes since v4:

  Addressed review feedback from Stephen Hemminger and added
  Acked-by from Viacheslav Ovsiienko on patches 03-10.

  Patch 05/10 (set rate):
  - Add rate_kbps > UINT32_MAX bounds check before truncating to
    the PRM rate_limit field, preventing silent overflow when HW
    reports no maximum rate

  Patch 07/10 (testpmd + PMD query):
  - Add NULL check on (*priv->txqs)[queue_id] before container_of()
    in rte_pmd_mlx5_txq_rate_limit_query(), matching the pattern
    in the setter

  Patches 03-10:
  - Added Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>

Changes since v3:

  Addressed review feedback from Stephen and Slava (nvidia/Mellanox).

  Patch 02/10 (query caps):
  - Added Acked-by: Viacheslav Ovsiienko

  Patch 03/10 (SQ modify):
  - Define MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX
    enum in mlx5_prm.h, following the MLX5_MODIFY_RQ_IN_MODIFY_xxx pattern
  - Use read-modify-write for modify_bitmask (MLX5_GET64 | OR | MLX5_SET64)
    instead of direct overwrite, for forward compatibility

  Patch 04/10 (PP infrastructure):
  - Rename struct member and parameters from "rl" to "rate_limit"
    for consistency with codebase naming style
  - Replace MLX5_ASSERT(rate_mbps > 0) with runtime check returning
    -EINVAL in non-debug builds
  - Move mlx5_txq_free_pp_rate_limit() to after txq_obj_release() in
    mlx5_txq_release() — destroy the SQ before freeing the PP index
    it references
  - Clarify commit message: distinct PP handle per queue (for cleanup)
    but kernel shares the same pp_id for identical rate parameters

  Patch 05/10 (set rate):
  - Fix obj->sq vs obj->sq_obj.sq: use obj->sq_obj.sq from the start
    for non-hairpin queues (was introduced in patch 07 in v3, breaking
    git bisect)
  - Move all variable declarations to block top (sq_devx,
    new_rate_limit)
  - Add queue state check: reject set_queue_rate_limit if queue is not
    STARTED (SQ not in RDY state)
  - Update mlx5 feature matrix: Rate limitation = Y
  - Add Per-Queue Tx Rate Limiting documentation section in mlx5.rst
    covering DevX requirement, hardware support, rate table sharing,
    and testpmd usage

  Patch 06/10 (burst devargs):
  - Remove burst_upper_bound/typical_packet_size from Clock Queue
    path (mlx5_txpp_alloc_pp_index) — Clock Queue uses WQE rate
    pacing and does not need these parameters
  - Update commit message and documentation accordingly

  Patch 07/10 (testpmd + PMD query):
  - sq_obj.sq accessor change moved to patch 05 (see above)
  - sq_devx declaration moved to block top

  Patch 08/10 (ethdev getter) — split from v3 patch 08:
  - Split into ethdev API (this patch) and mlx5 driver (patch 09)
  - Add rte_eth_trace_get_queue_rate_limit() trace point matching
    the existing setter pattern

  Patch 09/10 — NEW (was part of v3 patch 08):
  - mlx5 driver implementation of get_queue_rate_limit callback,
    split out per Slava's request

  Patch 10/10 (rate table query):
  - Rename struct field "used" to "port_used" to clarify per-port
    scope
  - Strengthen Doxygen: rate table is a global shared HW resource
    (firmware, kernel, other DPDK instances may consume entries);
    port_used is a lower bound
  - Document PP sharing behavior with flags=0
  - Note that applications should aggregate across ports for
    device-wide visibility

Changes since v2:

  Addressed review feedback from Stephen Hemminger:

  Patch 04: cleaned redundant cast parentheses on (struct mlx5dv_pp *)
  Patch 04: consolidated dv_alloc_pp call onto one line
  Patch 05+08: removed redundant queue_idx bounds checks from driver
    callbacks — ethdev layer is the single validation point
  Patch 07: added generic testpmd command: show port <id> queue <id> rate
  Patch 08+10: removed release notes from release_26_03.rst (targets 26.07)
  Patch 10: use MLX5_MEM_SYS | MLX5_MEM_ZERO for heap allocation
  Patch 10: consolidated packet_pacing_rate_table_size onto one line

Changes since v1:

  Patch 01: Acked-by Viacheslav Ovsiienko
  Patch 04: rate bounds validation, uint64_t overflow fix, remove
    early PP free
  Patch 05: PP leak fix (temp struct pattern), rte_errno in error paths
  Patch 07: inverted rte_eth_tx_queue_is_valid() check
  Patch 10: stack array replaced with heap, per-port scope documented

Testing:

  - Build: GCC, no warnings
  - Hardware: ConnectX-6 Dx
  - DevX path (default): set/get/disable rate limiting verified
  - Verbs path (dv_flow_en=0): returns -EINVAL cleanly (SQ DevX
    object not available), no crash

Vincent Jardin (10):
  doc/nics/mlx5: fix stale packet pacing documentation
  common/mlx5: query packet pacing rate table capabilities
  common/mlx5: extend SQ modify to support rate limit update
  net/mlx5: add per-queue packet pacing infrastructure
  net/mlx5: support per-queue rate limiting
  net/mlx5: add burst pacing devargs
  net/mlx5: add testpmd command to query per-queue rate limit
  ethdev: add getter for per-queue Tx rate limit
  net/mlx5: implement per-queue Tx rate limit getter
  net/mlx5: add rate table capacity query API

Vincent Jardin (10):
  doc/nics/mlx5: fix stale packet pacing documentation
  common/mlx5: query packet pacing rate table capabilities
  common/mlx5: extend SQ modify to support rate limit update
  net/mlx5: add per-queue packet pacing infrastructure
  net/mlx5: support per-queue rate limiting
  net/mlx5: add burst pacing devargs
  net/mlx5: add testpmd command to query per-queue rate limit
  ethdev: add getter for per-queue Tx rate limit
  net/mlx5: implement per-queue Tx rate limit getter
  net/mlx5: add rate table capacity query API

 app/test-pmd/cmdline.c               |  69 ++++++++++
 doc/guides/nics/features/mlx5.ini    |   1 +
 doc/guides/nics/mlx5.rst             | 180 ++++++++++++++++++++++-----
 drivers/common/mlx5/mlx5_devx_cmds.c |  23 ++++
 drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
 drivers/common/mlx5/mlx5_prm.h       |   7 ++
 drivers/net/mlx5/mlx5.c              |  46 +++++++
 drivers/net/mlx5/mlx5.h              |  13 ++
 drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++
 drivers/net/mlx5/mlx5_tx.c           | 106 +++++++++++++++-
 drivers/net/mlx5/mlx5_tx.h           |   5 +
 drivers/net/mlx5/mlx5_txpp.c         |  90 ++++++++++++++
 drivers/net/mlx5/mlx5_txq.c          | 149 ++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h      |  74 +++++++++++
 lib/ethdev/ethdev_driver.h           |   7 ++
 lib/ethdev/ethdev_trace.h            |   9 ++
 lib/ethdev/ethdev_trace_points.c     |   3 +
 lib/ethdev/rte_ethdev.c              |  35 ++++++
 lib/ethdev/rte_ethdev.h              |  24 ++++
 19 files changed, 914 insertions(+), 33 deletions(-)

-- 
2.43.0


^ permalink raw reply	[flat|nested] 87+ messages in thread

* [PATCH v5 01/10] doc/nics/mlx5: fix stale packet pacing documentation
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
                         ` (9 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

The Tx Scheduling section incorrectly stated that timestamps can only
be put on the first packet in a burst. The driver actually checks every
packet's ol_flags for the timestamp dynamic flag and inserts a dedicated
WAIT WQE per timestamped packet. The eMPW path also breaks batches when
a timestamped packet is encountered.

Additionally, the ConnectX-7+ wait-on-time capability was only briefly
mentioned in the tx_pp parameter section with no explanation of how it
differs from the ConnectX-6 Dx Clock Queue approach.

This patch:
- Removes the stale first-packet-only limitation
- Documents both scheduling mechanisms (ConnectX-6 Dx Clock Queue and
  ConnectX-7+ wait-on-time) with separate requirements tables
- Clarifies that tx_pp is specific to ConnectX-6 Dx
- Fixes tx_skew applicability to cover both hardware generations
- Updates the Send Scheduling Counters intro to reflect that timestamp
  validation counters also apply to ConnectX-7+ wait-on-time mode

Fixes: 8f848f32fc24 ("net/mlx5: introduce send scheduling devargs")

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst | 109 ++++++++++++++++++++++++++++-----------
 1 file changed, 78 insertions(+), 31 deletions(-)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 9dcc93cc23..6bb8c07353 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -553,27 +553,32 @@ for an additional list of options shared with other mlx5 drivers.
 
 - ``tx_pp`` parameter [int]
 
+  This parameter applies to **ConnectX-6 Dx** only.
   If a nonzero value is specified the driver creates all necessary internal
-  objects to provide accurate packet send scheduling on mbuf timestamps.
+  objects (Clock Queue and Rearm Queue) to provide accurate packet send
+  scheduling on mbuf timestamps using a cross-channel approach.
   The positive value specifies the scheduling granularity in nanoseconds,
   the packet send will be accurate up to specified digits. The allowed range is
   from 500 to 1 million of nanoseconds. The negative value specifies the module
   of granularity and engages the special test mode the check the schedule rate.
   By default (if the ``tx_pp`` is not specified) send scheduling on timestamps
-  feature is disabled.
+  feature is disabled on ConnectX-6 Dx.
 
-  Starting with ConnectX-7 the capability to schedule traffic directly
-  on timestamp specified in descriptor is provided,
-  no extra objects are needed anymore and scheduling capability
-  is advertised and handled regardless ``tx_pp`` parameter presence.
+  Starting with **ConnectX-7** the hardware provides a native wait-on-time
+  capability that inserts the scheduling delay directly in the WQE descriptor.
+  No Clock Queue or Rearm Queue is needed and the ``tx_pp`` parameter is not
+  required. The driver automatically advertises send scheduling support when
+  the HCA wait-on-time capability is detected. The ``tx_skew`` parameter can
+  still be used on ConnectX-7 and above to compensate for wire delay.
 
 - ``tx_skew`` parameter [int]
 
   The parameter adjusts the send packet scheduling on timestamps and represents
   the average delay between beginning of the transmitting descriptor processing
   by the hardware and appearance of actual packet data on the wire. The value
-  should be provided in nanoseconds and is valid only if ``tx_pp`` parameter is
-  specified. The default value is zero.
+  should be provided in nanoseconds and applies to both ConnectX-6 Dx
+  (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
+  The default value is zero.
 
 - ``tx_vec_en`` parameter [int]
 
@@ -883,9 +888,13 @@ Send Scheduling Counters
 
 The mlx5 PMD provides a comprehensive set of counters designed for
 debugging and diagnostics related to packet scheduling during transmission.
-These counters are applicable only if the port was configured with the ``tx_pp`` devarg
-and reflect the status of the PMD scheduling infrastructure
-based on Clock and Rearm Queues, used as a workaround on ConnectX-6 DX NICs.
+The first group of counters (prefixed ``tx_pp_``) reflects the status of the
+Clock Queue and Rearm Queue infrastructure used on ConnectX-6 Dx and is
+applicable only if the port was configured with the ``tx_pp`` devarg.
+The timestamp validation counters
+(``tx_pp_timestamp_past_errors``, ``tx_pp_timestamp_future_errors``,
+``tx_pp_timestamp_order_errors``) are also reported on ConnectX-7 and above
+in wait-on-time mode, without requiring ``tx_pp``.
 
 ``tx_pp_missed_interrupt_errors``
   Indicates that the Rearm Queue interrupt was not serviced on time.
@@ -1969,31 +1978,54 @@ Limitations
 Tx Scheduling
 ~~~~~~~~~~~~~
 
-When PMD sees the ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on the packet
-being sent it tries to synchronize the time of packet appearing on
-the wire with the specified packet timestamp. If the specified one
-is in the past it should be ignored, if one is in the distant future
-it should be capped with some reasonable value (in range of seconds).
-These specific cases ("too late" and "distant future") can be optionally
-reported via device xstats to assist applications to detect the
-time-related problems.
-
-The timestamp upper "too-distant-future" limit
-at the moment of invoking the Tx burst routine
-can be estimated as ``tx_pp`` option (in nanoseconds) multiplied by 2^23.
+When the PMD sees ``RTE_MBUF_DYNFLAG_TX_TIMESTAMP_NAME`` set on a packet
+being sent it inserts a dedicated WAIT WQE to synchronize the time of the
+packet appearing on the wire with the specified timestamp. Every packet
+in a burst that carries the timestamp dynamic flag is individually
+scheduled -- there is no restriction to the first packet only.
+
+If the specified timestamp is in the past, the packet is sent immediately.
+If it is in the distant future it should be capped with some reasonable
+value (in range of seconds). These specific cases ("too late" and
+"distant future") can be optionally reported via device xstats to assist
+applications to detect time-related problems.
+
+The eMPW (enhanced Multi-Packet Write) data path automatically breaks
+the batch when a timestamped packet is encountered, ensuring each
+scheduled packet gets its own WAIT WQE.
+
+Two hardware mechanisms are supported:
+
+**ConnectX-6 Dx -- Clock Queue (cross-channel)**
+   The driver creates a Clock Queue and a Rearm Queue that together
+   provide a time reference for scheduling. This mode requires the
+   :ref:`tx_pp <mlx5_tx_pp_param>` devarg. The timestamp upper
+   "too-distant-future" limit at the moment of invoking the Tx burst
+   routine can be estimated as ``tx_pp`` (in nanoseconds) multiplied
+   by 2^23.
+
+**ConnectX-7 and above -- wait-on-time**
+   The hardware supports placing the scheduling delay directly inside
+   the WQE descriptor. No Clock Queue or Rearm Queue is needed and the
+   ``tx_pp`` devarg is **not** required. The driver automatically
+   advertises send scheduling support when the HCA wait-on-time
+   capability is detected.
+
 Please note, for the testpmd txonly mode,
 the limit is deduced from the expression::
 
    (n_tx_descriptors / burst_size + 1) * inter_burst_gap
 
-There is no any packet reordering according timestamps is supposed,
-neither within packet burst, nor between packets, it is an entirely
-application responsibility to generate packets and its timestamps
-in desired order.
+There is no packet reordering according to timestamps,
+neither within a packet burst, nor between packets. It is entirely the
+application's responsibility to generate packets and their timestamps
+in the desired order.
 
 Requirements
 ^^^^^^^^^^^^
 
+ConnectX-6 Dx (Clock Queue mode):
+
 =========  =============
 Minimum    Version
 =========  =============
@@ -2005,20 +2037,35 @@ rdma-core
 DPDK       20.08
 =========  =============
 
+ConnectX-7 and above (wait-on-time mode):
+
+=========  =============
+Minimum    Version
+=========  =============
+hardware   ConnectX-7
+=========  =============
+
 Firmware configuration
 ^^^^^^^^^^^^^^^^^^^^^^
 
 Runtime configuration
 ^^^^^^^^^^^^^^^^^^^^^
 
-To provide the packet send scheduling on mbuf timestamps the ``tx_pp``
-parameter should be specified.
+**ConnectX-6 Dx**: the :ref:`tx_pp <mlx5_tx_pp_param>` parameter must be
+specified to enable send scheduling on mbuf timestamps.
+
+**ConnectX-7+**: no devarg is required. Send scheduling is automatically
+enabled when the HCA reports the wait-on-time capability.
+
+On both hardware generations the ``tx_skew`` parameter can be used to
+compensate for the delay between descriptor processing and actual wire
+time.
 
 Limitations
 ^^^^^^^^^^^
 
-#. The timestamps can be put only in the first packet
-   in the burst providing the entire burst scheduling.
+#. On ConnectX-6 Dx (Clock Queue mode) timestamps too far in the future
+   are capped (see the ``tx_pp`` x 2^23 limit above).
 
 
 .. _mlx5_tx_inline:
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 02/10] common/mlx5: query packet pacing rate table capabilities
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
                         ` (8 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Query additional QoS packet pacing capabilities from HCA attributes:
- packet_pacing_burst_bound: HW supports burst_upper_bound parameter
- packet_pacing_typical_size: HW supports typical_packet_size parameter
- packet_pacing_max_rate / packet_pacing_min_rate: rate range in kbps
- packet_pacing_rate_table_size: number of HW rate table entries

These capabilities are needed by the upcoming per-queue rate limiting
feature to validate devarg values and report HW limits.

Supported hardware:
- ConnectX-6 Dx and later (different boards expose different subsets)
- ConnectX-5 reports packet_pacing but not all extended fields
- ConnectX-7/8 report the full capability set
- BlueField-2 and later DPUs also report these capabilities

Not supported:
- ConnectX-4 Lx and earlier (no packet_pacing capability at all)
- ConnectX-5 Ex may not report burst_bound or typical_size

Signed-off-by: Vincent Jardin <vjardin@free.fr>
Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 15 +++++++++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 11 ++++++++++-
 2 files changed, 25 insertions(+), 1 deletion(-)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index d12ebf8487..8f53303fa7 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -1244,6 +1244,21 @@ mlx5_devx_cmd_query_hca_attr(void *ctx,
 				MLX5_GET(qos_cap, hcattr, packet_pacing);
 		attr->qos.wqe_rate_pp =
 				MLX5_GET(qos_cap, hcattr, wqe_rate_pp);
+		attr->qos.packet_pacing_burst_bound =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_burst_bound);
+		attr->qos.packet_pacing_typical_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_typical_size);
+		attr->qos.packet_pacing_max_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_max_rate);
+		attr->qos.packet_pacing_min_rate =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_min_rate);
+		attr->qos.packet_pacing_rate_table_size =
+				MLX5_GET(qos_cap, hcattr,
+					packet_pacing_rate_table_size);
 		if (attr->qos.flow_meter_aso_sup) {
 			attr->qos.log_meter_aso_granularity =
 				MLX5_GET(qos_cap, hcattr,
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index da50fc686c..930ae2c072 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -67,7 +67,16 @@ struct mlx5_hca_qos_attr {
 	/* Power of the maximum allocation granularity Object. */
 	uint32_t log_max_num_meter_aso:5;
 	/* Power of the maximum number of supported objects. */
-
+	uint32_t packet_pacing_burst_bound:1;
+	/* HW supports burst_upper_bound PP parameter. */
+	uint32_t packet_pacing_typical_size:1;
+	/* HW supports typical_packet_size PP parameter. */
+	uint32_t packet_pacing_max_rate;
+	/* Maximum supported pacing rate in kbps. */
+	uint32_t packet_pacing_min_rate;
+	/* Minimum supported pacing rate in kbps. */
+	uint16_t packet_pacing_rate_table_size;
+	/* Number of entries in the HW rate table. */
 };
 
 struct mlx5_hca_vdpa_attr {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 03/10] common/mlx5: extend SQ modify to support rate limit update
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
                         ` (7 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add rl_update and packet_pacing_rate_limit_index fields to
mlx5_devx_modify_sq_attr. When rl_update is set, the modify SQ
command sets modify_bitmask bit 0 and writes the PP index into
the SQ context, allowing dynamic rate changes on a live RDY SQ
without teardown.

modify_sq_in.modify_bitmask[0x40] bit 0 controls the
packet_pacing_rate_limit_index.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same SQ context field, also supports wait-on-time
- BlueField-2/3: same modify_sq command support

Not supported:
- ConnectX-5: supports packet_pacing but only at SQ creation time,
  dynamic modify_bitmask update may not be supported on all FW
- ConnectX-4 Lx and earlier: no packet_pacing support

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/common/mlx5/mlx5_devx_cmds.c | 8 ++++++++
 drivers/common/mlx5/mlx5_devx_cmds.h | 3 +++
 drivers/common/mlx5/mlx5_prm.h       | 7 +++++++
 3 files changed, 18 insertions(+)

diff --git a/drivers/common/mlx5/mlx5_devx_cmds.c b/drivers/common/mlx5/mlx5_devx_cmds.c
index 8f53303fa7..102f84fd5c 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.c
+++ b/drivers/common/mlx5/mlx5_devx_cmds.c
@@ -2129,6 +2129,14 @@ mlx5_devx_cmd_modify_sq(struct mlx5_devx_obj *sq,
 	MLX5_SET(sqc, sq_ctx, state, sq_attr->state);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_rq, sq_attr->hairpin_peer_rq);
 	MLX5_SET(sqc, sq_ctx, hairpin_peer_vhca, sq_attr->hairpin_peer_vhca);
+	if (sq_attr->rl_update) {
+		uint64_t msk = MLX5_GET64(modify_sq_in, in, modify_bitmask);
+
+		msk |= MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX;
+		MLX5_SET64(modify_sq_in, in, modify_bitmask, msk);
+		MLX5_SET(sqc, sq_ctx, packet_pacing_rate_limit_index,
+			 sq_attr->packet_pacing_rate_limit_index);
+	}
 	ret = mlx5_glue->devx_obj_modify(sq->obj, in, sizeof(in),
 					 out, sizeof(out));
 	if (ret) {
diff --git a/drivers/common/mlx5/mlx5_devx_cmds.h b/drivers/common/mlx5/mlx5_devx_cmds.h
index 930ae2c072..82d949972b 100644
--- a/drivers/common/mlx5/mlx5_devx_cmds.h
+++ b/drivers/common/mlx5/mlx5_devx_cmds.h
@@ -519,6 +519,9 @@ struct mlx5_devx_modify_sq_attr {
 	uint32_t state:4;
 	uint32_t hairpin_peer_rq:24;
 	uint32_t hairpin_peer_vhca:16;
+	uint32_t rl_update:1;
+	/* Set to update packet_pacing_rate_limit_index on a live SQ. */
+	uint32_t packet_pacing_rate_limit_index:16;
 };
 
 
diff --git a/drivers/common/mlx5/mlx5_prm.h b/drivers/common/mlx5/mlx5_prm.h
index ba33336e58..597d06362f 100644
--- a/drivers/common/mlx5/mlx5_prm.h
+++ b/drivers/common/mlx5/mlx5_prm.h
@@ -2985,6 +2985,7 @@ struct mlx5_ifc_create_tis_in_bits {
 	struct mlx5_ifc_tisc_bits ctx;
 };
 
+/* Bits for modify_rq_in.modify_bitmask (Receive Queue). */
 enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_WQ_LWM = 1ULL << 0,
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_VSD = 1ULL << 1,
@@ -2992,6 +2993,12 @@ enum {
 	MLX5_MODIFY_RQ_IN_MODIFY_BITMASK_RQ_COUNTER_SET_ID = 1ULL << 3,
 };
 
+/* Bits for modify_sq_in.modify_bitmask (Send Queue). */
+enum {
+	MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX =
+		1ULL << 0,
+};
+
 struct mlx5_ifc_modify_rq_in_bits {
 	u8 opcode[0x10];
 	u8 uid[0x10];
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 04/10] net/mlx5: add per-queue packet pacing infrastructure
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (2 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
                         ` (6 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add mlx5_txq_rate_limit structure and alloc/free helpers for
per-queue data-rate packet pacing. Each Tx queue can now hold
its own PP (Packet Pacing) context allocated via mlx5dv_pp_alloc()
with MLX5_DATA_RATE mode.

mlx5_txq_alloc_pp_rate_limit() converts Mbps to kbps for the PRM
rate_limit field and allocates a PP context from the HW rate table.
mlx5_txq_free_pp_rate_limit() releases it.

PP allocation uses shared mode (flags=0). Each dv_alloc_pp() call
returns a distinct PP handle (needed for per-queue dv_free_pp()
cleanup), but the kernel mlx5 driver internally maps identical
rate parameters to the same HW rate table entry (same pp_id) with
internal refcounting. This avoids exhausting the rate table
(typically 128 entries on ConnectX-6 Dx) when many queues share
the same rate.

The existing Clock Queue path (sh->txpp.pp / sh->txpp.pp_id) is
untouched — it uses MLX5_WQE_RATE for per-packet scheduling with
a dedicated index, while per-queue rate limiting uses MLX5_DATA_RATE.

PP index cleanup is added to mlx5_txq_release() to prevent leaks
when queues are destroyed.

Supported hardware:
- ConnectX-6 Dx: per-SQ rate via packet_pacing_rate_limit_index
- ConnectX-7/8: same mechanism, plus wait-on-time coexistence
- BlueField-2/3: same PP allocation support

Not supported:
- ConnectX-5: packet_pacing exists but MLX5_DATA_RATE mode may
  not be available on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.h      | 11 +++++
 drivers/net/mlx5/mlx5_tx.h   |  1 +
 drivers/net/mlx5/mlx5_txpp.c | 78 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_txq.c  |  1 +
 4 files changed, 91 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 4da184eb47..33628d7987 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -1297,6 +1297,13 @@ struct mlx5_txpp_ts {
 	RTE_ATOMIC(uint64_t) ts;
 };
 
+/* Per-queue rate limit tracking. */
+struct mlx5_txq_rate_limit {
+	void *pp;		/* Packet pacing context from dv_alloc_pp. */
+	uint16_t pp_id;		/* Packet pacing index. */
+	uint32_t rate_mbps;	/* Current rate in Mbps, 0 = disabled. */
+};
+
 /* Tx packet pacing structure. */
 struct mlx5_dev_txpp {
 	pthread_mutex_t mutex; /* Pacing create/destroy mutex. */
@@ -2630,6 +2637,10 @@ int mlx5_txpp_xstats_get_names(struct rte_eth_dev *dev,
 void mlx5_txpp_interrupt_handler(void *cb_arg);
 int mlx5_txpp_map_hca_bar(struct rte_eth_dev *dev);
 void mlx5_txpp_unmap_hca_bar(struct rte_eth_dev *dev);
+int mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+				 struct mlx5_txq_rate_limit *rate_limit,
+				 uint32_t rate_mbps);
+void mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rate_limit);
 
 /* mlx5_rxtx.c */
 
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 0134a2e003..51f330454a 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -192,6 +192,7 @@ struct mlx5_txq_ctrl {
 	uint16_t dump_file_n; /* Number of dump files. */
 	struct rte_eth_hairpin_conf hairpin_conf; /* Hairpin configuration. */
 	uint32_t hairpin_status; /* Hairpin binding status. */
+	struct mlx5_txq_rate_limit rate_limit; /* Per-queue rate limit. */
 	struct mlx5_txq_data txq; /* Data path structure. */
 	/* Must be the last field in the structure, contains elts[]. */
 };
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 0e99b58bde..e34e996e9b 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -128,6 +128,84 @@ mlx5_txpp_alloc_pp_index(struct mlx5_dev_ctx_shared *sh)
 #endif
 }
 
+/* Free a per-queue packet pacing index. */
+void
+mlx5_txq_free_pp_rate_limit(struct mlx5_txq_rate_limit *rate_limit)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	if (rate_limit->pp) {
+		mlx5_glue->dv_free_pp(rate_limit->pp);
+		rate_limit->pp = NULL;
+		rate_limit->pp_id = 0;
+		rate_limit->rate_mbps = 0;
+	}
+#else
+	RTE_SET_USED(rate_limit);
+#endif
+}
+
+/* Allocate a per-queue packet pacing index for data-rate limiting. */
+int
+mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
+			     struct mlx5_txq_rate_limit *rate_limit,
+			     uint32_t rate_mbps)
+{
+#ifdef HAVE_MLX5DV_PP_ALLOC
+	uint32_t pp[MLX5_ST_SZ_DW(set_pp_rate_limit_context)];
+	uint64_t rate_kbps;
+	struct mlx5_hca_qos_attr *qos = &sh->cdev->config.hca_attr.qos;
+
+	if (rate_mbps == 0) {
+		DRV_LOG(ERR, "Rate must be greater than zero.");
+		rte_errno = EINVAL;
+		return -EINVAL;
+	}
+	rate_kbps = (uint64_t)rate_mbps * 1000;
+	if (qos->packet_pacing_min_rate && rate_kbps < qos->packet_pacing_min_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps below HW minimum (%u kbps).",
+			rate_mbps, qos->packet_pacing_min_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	if (qos->packet_pacing_max_rate && rate_kbps > qos->packet_pacing_max_rate) {
+		DRV_LOG(ERR, "Rate %u Mbps exceeds HW maximum (%u kbps).",
+			rate_mbps, qos->packet_pacing_max_rate);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
+	memset(&pp, 0, sizeof(pp));
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
+	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	rate_limit->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp),
+						 &pp, 0);
+	if (rate_limit->pp == NULL) {
+		DRV_LOG(ERR, "Failed to allocate PP index for rate %u Mbps.",
+			rate_mbps);
+		rte_errno = errno;
+		return -errno;
+	}
+	rate_limit->pp_id = ((struct mlx5dv_pp *)rate_limit->pp)->index;
+	if (!rate_limit->pp_id) {
+		DRV_LOG(ERR, "Zero PP index allocated for rate %u Mbps.",
+			rate_mbps);
+		mlx5_txq_free_pp_rate_limit(rate_limit);
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	rate_limit->rate_mbps = rate_mbps;
+	DRV_LOG(DEBUG, "Allocated PP index %u for rate %u Mbps.",
+		rate_limit->pp_id, rate_mbps);
+	return 0;
+#else
+	RTE_SET_USED(sh);
+	RTE_SET_USED(rate_limit);
+	RTE_SET_USED(rate_mbps);
+	DRV_LOG(ERR, "Per-queue rate limit requires rdma-core PP support.");
+	rte_errno = ENOTSUP;
+	return -ENOTSUP;
+#endif
+}
+
 static void
 mlx5_txpp_destroy_send_queue(struct mlx5_txpp_wq *wq)
 {
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 9275efb58e..3356c89758 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1344,6 +1344,7 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 		mlx5_free(txq_ctrl->obj);
 		txq_ctrl->obj = NULL;
 	}
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
 	if (!txq_ctrl->is_hairpin) {
 		if (txq_ctrl->txq.fcqs) {
 			mlx5_free(txq_ctrl->txq.fcqs);
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 05/10] net/mlx5: support per-queue rate limiting
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (3 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
                         ` (5 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Wire rte_eth_set_queue_rate_limit() to the mlx5 PMD. The callback
allocates a per-queue PP index with the requested data rate, then
modifies the live SQ via modify_bitmask bit 0 to apply the new
packet_pacing_rate_limit_index — no queue teardown required.

Setting tx_rate=0 clears the PP index on the SQ and frees it.

Capability check uses hca_attr.qos.packet_pacing directly (not
dev_cap.txpp_en which requires Clock Queue prerequisites). This
allows per-queue rate limiting without the tx_pp devarg.

The callback rejects hairpin queues and queues whose SQ is not
yet created.

testpmd usage (no testpmd changes needed):
  set port 0 queue 0 rate 1000
  set port 0 queue 1 rate 5000
  set port 0 queue 0 rate 0     # disable

Supported hardware:
- ConnectX-6 Dx: full support, per-SQ rate via HW rate table
- ConnectX-7/8: full support, coexists with wait-on-time scheduling
- BlueField-2/3: full support as DPU rep ports

Not supported:
- ConnectX-5: packet_pacing exists but dynamic SQ modify may not
  work on all firmware versions
- ConnectX-4 Lx and earlier: no packet_pacing capability

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/features/mlx5.ini |   1 +
 doc/guides/nics/mlx5.rst          |  54 ++++++++++++++
 drivers/net/mlx5/mlx5.c           |   2 +
 drivers/net/mlx5/mlx5_tx.h        |   2 +
 drivers/net/mlx5/mlx5_txpp.c      |   6 ++
 drivers/net/mlx5/mlx5_txq.c       | 118 ++++++++++++++++++++++++++++++
 6 files changed, 183 insertions(+)

diff --git a/doc/guides/nics/features/mlx5.ini b/doc/guides/nics/features/mlx5.ini
index 4f9c4c309b..3b3eda28b8 100644
--- a/doc/guides/nics/features/mlx5.ini
+++ b/doc/guides/nics/features/mlx5.ini
@@ -30,6 +30,7 @@ Inner RSS            = Y
 SR-IOV               = Y
 VLAN filter          = Y
 Flow control         = Y
+Rate limitation      = Y
 CRC offload          = Y
 VLAN offload         = Y
 L3 checksum offload  = Y
diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index 6bb8c07353..c72a60f084 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,60 @@ for an additional list of options shared with other mlx5 drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+.. _mlx5_per_queue_rate_limit:
+
+Per-Queue Tx Rate Limiting
+~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+The mlx5 PMD supports per-queue Tx rate limiting via the standard ethdev
+API ``rte_eth_set_queue_rate_limit()`` and ``rte_eth_get_queue_rate_limit()``.
+
+This feature uses the hardware packet pacing mechanism to enforce a data
+rate on individual TX queues without tearing down the queue. The rate is
+specified in Mbps.
+
+**Requirements:**
+
+- ConnectX-6 Dx or later with ``packet_pacing`` HCA capability.
+- The DevX path must be used (default). The legacy Verbs path
+  (``dv_flow_en=0``) does not support dynamic SQ modification and
+  returns ``-EINVAL``.
+- The queue must be started (SQ in RDY state) before setting a rate.
+
+**Supported hardware:**
+
+- ConnectX-6 Dx: per-SQ rate via HW rate table.
+- ConnectX-7/8: full support, coexists with wait-on-time scheduling.
+- BlueField-2/3: full support as DPU rep ports.
+
+**Not supported:**
+
+- ConnectX-5: ``packet_pacing`` exists but dynamic SQ modify may not
+  work on all firmware versions.
+- ConnectX-4 Lx and earlier: no ``packet_pacing`` capability.
+
+**Rate table sharing:**
+
+The hardware rate table has a limited number of entries (typically 128 on
+ConnectX-6 Dx). When multiple queues are configured with identical rate
+parameters, the kernel mlx5 driver shares a single rate table entry across
+them. Each queue still has its own independent SQ and enforces the rate
+independently — queues are never merged. The rate cap applies per-queue:
+if two queues share the same 1000 Mbps entry, each can send up to
+1000 Mbps independently, they do not share a combined budget.
+
+This sharing is transparent and only affects table capacity: 128 entries
+can serve thousands of queues as long as many use the same rate. Queues
+with different rates consume separate entries.
+
+**Usage with testpmd:**
+
+.. code-block:: console
+
+   testpmd> set port 0 queue 0 rate 1000
+   testpmd> show port 0 queue 0 rate
+   testpmd> set port 0 queue 0 rate 0
+
 - ``tx_vec_en`` parameter [int]
 
   A nonzero value enables Tx vector with ConnectX-5 NICs and above.
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e795948187..e718f0fa8c 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2621,6 +2621,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2714,6 +2715,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.count_aggr_ports = mlx5_count_aggr_ports,
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
+	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 51f330454a..975ff57acd 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -222,6 +222,8 @@ struct mlx5_txq_ctrl *mlx5_txq_get(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
+int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index e34e996e9b..23d18b6116 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -173,6 +173,12 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 		rte_errno = ERANGE;
 		return -ERANGE;
 	}
+	if (rate_kbps > UINT32_MAX) {
+		DRV_LOG(ERR, "Rate %u Mbps overflows PRM rate_limit field.",
+			rate_mbps);
+		rte_errno = ERANGE;
+		return -ERANGE;
+	}
 	memset(&pp, 0, sizeof(pp));
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index 3356c89758..ce08363ca9 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1363,6 +1363,124 @@ mlx5_txq_release(struct rte_eth_dev *dev, uint16_t idx)
 	return 0;
 }
 
+/**
+ * Set per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param tx_rate
+ *   TX rate in Mbps, 0 to disable rate limiting.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_dev_ctx_shared *sh = priv->sh;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	struct mlx5_devx_obj *sq_devx;
+	struct mlx5_devx_modify_sq_attr sq_attr = { 0 };
+	struct mlx5_txq_rate_limit new_rate_limit = { 0 };
+	int ret;
+
+	if (!sh->cdev->config.hca_attr.qos.packet_pacing) {
+		DRV_LOG(ERR, "Port %u packet pacing not supported.",
+			dev->data->port_id);
+		rte_errno = ENOTSUP;
+		return -rte_errno;
+	}
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not configured.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	if (txq_ctrl->is_hairpin) {
+		DRV_LOG(ERR, "Port %u Tx queue %u is hairpin.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (txq_ctrl->obj == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u not initialized.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	/*
+	 * For non-hairpin queues the SQ DevX object lives in
+	 * obj->sq_obj.sq (used by DevX/HWS mode), while hairpin
+	 * queues use obj->sq directly. These are different members
+	 * of a union inside mlx5_txq_obj.
+	 */
+	sq_devx = txq_ctrl->obj->sq_obj.sq;
+	if (sq_devx == NULL) {
+		DRV_LOG(ERR, "Port %u Tx queue %u SQ not ready.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (dev->data->tx_queue_state[queue_idx] !=
+	    RTE_ETH_QUEUE_STATE_STARTED) {
+		DRV_LOG(ERR,
+			"Port %u Tx queue %u is not started, stop traffic before setting rate.",
+			dev->data->port_id, queue_idx);
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	if (tx_rate == 0) {
+		/* Disable rate limiting. */
+		if (txq_ctrl->rate_limit.pp_id == 0)
+			return 0; /* Already disabled. */
+		sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+		sq_attr.state = MLX5_SQC_STATE_RDY;
+		sq_attr.rl_update = 1;
+		sq_attr.packet_pacing_rate_limit_index = 0;
+		ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
+		if (ret) {
+			DRV_LOG(ERR,
+				"Port %u Tx queue %u failed to clear rate.",
+				dev->data->port_id, queue_idx);
+			rte_errno = -ret;
+			return ret;
+		}
+		mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
+		DRV_LOG(DEBUG, "Port %u Tx queue %u rate limit disabled.",
+			dev->data->port_id, queue_idx);
+		return 0;
+	}
+	/* Allocate a new PP index for the requested rate into a temp. */
+	ret = mlx5_txq_alloc_pp_rate_limit(sh, &new_rate_limit, tx_rate);
+	if (ret)
+		return ret;
+	/* Modify live SQ to use the new PP index. */
+	sq_attr.sq_state = MLX5_SQC_STATE_RDY;
+	sq_attr.state = MLX5_SQC_STATE_RDY;
+	sq_attr.rl_update = 1;
+	sq_attr.packet_pacing_rate_limit_index = new_rate_limit.pp_id;
+	ret = mlx5_devx_cmd_modify_sq(sq_devx, &sq_attr);
+	if (ret) {
+		DRV_LOG(ERR, "Port %u Tx queue %u failed to set rate %u Mbps.",
+			dev->data->port_id, queue_idx, tx_rate);
+		mlx5_txq_free_pp_rate_limit(&new_rate_limit);
+		rte_errno = -ret;
+		return ret;
+	}
+	/* SQ updated — release old PP context, install new one. */
+	mlx5_txq_free_pp_rate_limit(&txq_ctrl->rate_limit);
+	txq_ctrl->rate_limit = new_rate_limit;
+	DRV_LOG(DEBUG, "Port %u Tx queue %u rate set to %u Mbps (PP idx %u).",
+		dev->data->port_id, queue_idx, tx_rate, txq_ctrl->rate_limit.pp_id);
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 06/10] net/mlx5: add burst pacing devargs
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (4 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
                         ` (4 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Expose burst_upper_bound and typical_packet_size from the PRM
set_pp_rate_limit_context as devargs:
- tx_burst_bound=<bytes>: max burst before rate evaluation kicks in
- tx_typical_pkt_sz=<bytes>: typical packet size for accuracy

These parameters apply to per-queue rate limiting
(rte_eth_set_queue_rate_limit) only. The Clock Queue path
(tx_pp devarg) uses WQE rate pacing and does not need these
parameters.

Values are validated against HCA capabilities
(packet_pacing_burst_bound and packet_pacing_typical_size).
If the HW does not support them, a warning is logged and the
value is silently zeroed. Test mode still overrides both values.

Shared context mismatch checks ensure all ports on the same
device use the same burst parameters.

Supported hardware:
- ConnectX-6 Dx: burst_upper_bound and typical_packet_size
  reported via packet_pacing_burst_bound / packet_pacing_typical_size
  QoS capability bits
- ConnectX-7/8: full support for both parameters
- BlueField-2/3: same capabilities as host-side ConnectX

Not supported:
- ConnectX-5: may not report burst_bound or typical_size caps
- ConnectX-4 Lx and earlier: no packet_pacing at all

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 doc/guides/nics/mlx5.rst     | 17 +++++++++++++++
 drivers/net/mlx5/mlx5.c      | 42 ++++++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5.h      |  2 ++
 drivers/net/mlx5/mlx5_txpp.c |  6 ++++++
 4 files changed, 67 insertions(+)

diff --git a/doc/guides/nics/mlx5.rst b/doc/guides/nics/mlx5.rst
index c72a60f084..d0b403dd5c 100644
--- a/doc/guides/nics/mlx5.rst
+++ b/doc/guides/nics/mlx5.rst
@@ -580,6 +580,23 @@ for an additional list of options shared with other mlx5 drivers.
   (with ``tx_pp``) and ConnectX-7+ (wait-on-time) scheduling modes.
   The default value is zero.
 
+- ``tx_burst_bound`` parameter [int]
+
+  Specifies the burst upper bound in bytes for packet pacing rate evaluation.
+  When set, the hardware considers this burst size when enforcing the configured
+  rate limit. Only effective when the HCA reports ``packet_pacing_burst_bound``
+  capability. Applies to per-queue rate limiting
+  (``rte_eth_set_queue_rate_limit()``). The Clock Queue path (``tx_pp``)
+  uses WQE rate pacing and does not use this parameter.
+  The default value is zero (hardware default).
+
+- ``tx_typical_pkt_sz`` parameter [int]
+
+  Specifies the typical packet size in bytes for packet pacing rate accuracy
+  improvement. Only effective when the HCA reports
+  ``packet_pacing_typical_size`` capability. Applies to per-queue rate
+  limiting only. The default value is zero (hardware default).
+
 .. _mlx5_per_queue_rate_limit:
 
 Per-Queue Tx Rate Limiting
diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index e718f0fa8c..7d08d7886b 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -119,6 +119,18 @@
  */
 #define MLX5_TX_SKEW "tx_skew"
 
+/*
+ * Device parameter to specify burst upper bound in bytes
+ * for packet pacing rate evaluation.
+ */
+#define MLX5_TX_BURST_BOUND "tx_burst_bound"
+
+/*
+ * Device parameter to specify typical packet size in bytes
+ * for packet pacing rate accuracy improvement.
+ */
+#define MLX5_TX_TYPICAL_PKT_SZ "tx_typical_pkt_sz"
+
 /*
  * Device parameter to enable hardware Tx vector.
  * Deprecated, ignored (no vectorized Tx routines anymore).
@@ -1407,6 +1419,10 @@ mlx5_dev_args_check_handler(const char *key, const char *val, void *opaque)
 		config->tx_pp = tmp;
 	} else if (strcmp(MLX5_TX_SKEW, key) == 0) {
 		config->tx_skew = tmp;
+	} else if (strcmp(MLX5_TX_BURST_BOUND, key) == 0) {
+		config->tx_burst_bound = tmp;
+	} else if (strcmp(MLX5_TX_TYPICAL_PKT_SZ, key) == 0) {
+		config->tx_typical_pkt_sz = tmp;
 	} else if (strcmp(MLX5_L3_VXLAN_EN, key) == 0) {
 		config->l3_vxlan_en = !!tmp;
 	} else if (strcmp(MLX5_VF_NL_EN, key) == 0) {
@@ -1481,8 +1497,10 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 				struct mlx5_sh_config *config)
 {
 	const char **params = (const char *[]){
+		MLX5_TX_BURST_BOUND,
 		MLX5_TX_PP,
 		MLX5_TX_SKEW,
+		MLX5_TX_TYPICAL_PKT_SZ,
 		MLX5_L3_VXLAN_EN,
 		MLX5_VF_NL_EN,
 		MLX5_DV_ESW_EN,
@@ -1557,6 +1575,18 @@ mlx5_shared_dev_ctx_args_config(struct mlx5_dev_ctx_shared *sh,
 		DRV_LOG(WARNING,
 			"\"tx_skew\" doesn't affect without \"tx_pp\".");
 	}
+	if (config->tx_burst_bound &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_burst_bound) {
+		DRV_LOG(WARNING,
+			"HW does not support burst_upper_bound, ignoring.");
+		config->tx_burst_bound = 0;
+	}
+	if (config->tx_typical_pkt_sz &&
+	    !sh->cdev->config.hca_attr.qos.packet_pacing_typical_size) {
+		DRV_LOG(WARNING,
+			"HW does not support typical_packet_size, ignoring.");
+		config->tx_typical_pkt_sz = 0;
+	}
 	/* Check for LRO support. */
 	if (mlx5_devx_obj_ops_en(sh) && sh->cdev->config.hca_attr.lro_cap) {
 		/* TBD check tunnel lro caps. */
@@ -3191,6 +3221,18 @@ mlx5_probe_again_args_validate(struct mlx5_common_device *cdev,
 			sh->ibdev_name);
 		goto error;
 	}
+	if (sh->config.tx_burst_bound != config->tx_burst_bound) {
+		DRV_LOG(ERR, "\"tx_burst_bound\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
+	if (sh->config.tx_typical_pkt_sz != config->tx_typical_pkt_sz) {
+		DRV_LOG(ERR, "\"tx_typical_pkt_sz\" "
+			"configuration mismatch for shared %s context.",
+			sh->ibdev_name);
+		goto error;
+	}
 	if (sh->config.txq_mem_algn != config->txq_mem_algn) {
 		DRV_LOG(ERR, "\"TxQ memory alignment\" "
 			"configuration mismatch for shared %s context. %u - %u",
diff --git a/drivers/net/mlx5/mlx5.h b/drivers/net/mlx5/mlx5.h
index 33628d7987..5ae01ec491 100644
--- a/drivers/net/mlx5/mlx5.h
+++ b/drivers/net/mlx5/mlx5.h
@@ -383,6 +383,8 @@ struct mlx5_port_config {
 struct mlx5_sh_config {
 	int tx_pp; /* Timestamp scheduling granularity in nanoseconds. */
 	int tx_skew; /* Tx scheduling skew between WQE and data on wire. */
+	uint32_t tx_burst_bound; /* Burst upper bound in bytes, 0 = default. */
+	uint32_t tx_typical_pkt_sz; /* Typical packet size in bytes, 0 = default. */
 	uint32_t reclaim_mode:2; /* Memory reclaim mode. */
 	uint32_t dv_esw_en:1; /* Enable E-Switch DV flow. */
 	/* Enable DV flow. 1 means SW steering, 2 means HW steering. */
diff --git a/drivers/net/mlx5/mlx5_txpp.c b/drivers/net/mlx5/mlx5_txpp.c
index 23d18b6116..279bd430c9 100644
--- a/drivers/net/mlx5/mlx5_txpp.c
+++ b/drivers/net/mlx5/mlx5_txpp.c
@@ -182,6 +182,12 @@ mlx5_txq_alloc_pp_rate_limit(struct mlx5_dev_ctx_shared *sh,
 	memset(&pp, 0, sizeof(pp));
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_limit, (uint32_t)rate_kbps);
 	MLX5_SET(set_pp_rate_limit_context, &pp, rate_mode, MLX5_DATA_RATE);
+	if (sh->config.tx_burst_bound)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 burst_upper_bound, sh->config.tx_burst_bound);
+	if (sh->config.tx_typical_pkt_sz)
+		MLX5_SET(set_pp_rate_limit_context, &pp,
+			 typical_packet_size, sh->config.tx_typical_pkt_sz);
 	rate_limit->pp = mlx5_glue->dv_alloc_pp(sh->cdev->ctx, sizeof(pp),
 						 &pp, 0);
 	if (rate_limit->pp == NULL) {
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 07/10] net/mlx5: add testpmd command to query per-queue rate limit
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (5 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
                         ` (3 subsequent siblings)
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add a new testpmd command to display the per-queue packet pacing
rate limit state, including the PP index from both driver state
and FW SQ context readback:

  testpmd> mlx5 port <port_id> txq <queue_id> rate show

This helps verify that the FW actually applied the PP index to
the SQ after setting a per-queue rate limit.

Expose a new PMD API rte_pmd_mlx5_txq_rate_limit_query() that
queries txq_ctrl->rate_limit for driver state and
mlx5_devx_cmd_query_sq() for the FW
packet_pacing_rate_limit_index field.

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_testpmd.c | 93 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/mlx5_tx.c      | 42 ++++++++++++++-
 drivers/net/mlx5/rte_pmd_mlx5.h | 30 +++++++++++
 3 files changed, 164 insertions(+), 1 deletion(-)

diff --git a/drivers/net/mlx5/mlx5_testpmd.c b/drivers/net/mlx5/mlx5_testpmd.c
index 1bb5a89559..fd3efecc5d 100644
--- a/drivers/net/mlx5/mlx5_testpmd.c
+++ b/drivers/net/mlx5/mlx5_testpmd.c
@@ -1365,6 +1365,94 @@ cmdline_parse_inst_t mlx5_cmd_dump_rq_context_options = {
 	}
 };
 
+/* Show per-queue rate limit PP index for a given port/queue */
+struct mlx5_cmd_show_rate_limit_options {
+	cmdline_fixed_string_t mlx5;
+	cmdline_fixed_string_t port;
+	portid_t port_id;
+	cmdline_fixed_string_t txq;
+	queueid_t queue_id;
+	cmdline_fixed_string_t rate;
+	cmdline_fixed_string_t show;
+};
+
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_mlx5 =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 mlx5, "mlx5");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 port, "port");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_port_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      port_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_txq =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 txq, "txq");
+cmdline_parse_token_num_t mlx5_cmd_show_rate_limit_queue_id =
+	TOKEN_NUM_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+			      queue_id, RTE_UINT16);
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 rate, "rate");
+cmdline_parse_token_string_t mlx5_cmd_show_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct mlx5_cmd_show_rate_limit_options,
+				 show, "show");
+
+static void
+mlx5_cmd_show_rate_limit_parsed(void *parsed_result,
+				__rte_unused struct cmdline *cl,
+				__rte_unused void *data)
+{
+	struct mlx5_cmd_show_rate_limit_options *res = parsed_result;
+	struct rte_pmd_mlx5_txq_rate_limit_info info;
+	int ret;
+
+	ret = rte_pmd_mlx5_txq_rate_limit_query(res->port_id, res->queue_id,
+						 &info);
+	switch (ret) {
+	case 0:
+		break;
+	case -ENODEV:
+		fprintf(stderr, "invalid port_id %u\n", res->port_id);
+		return;
+	case -EINVAL:
+		fprintf(stderr, "invalid queue index (%u), out of range\n",
+			res->queue_id);
+		return;
+	case -EIO:
+		fprintf(stderr, "failed to query SQ context\n");
+		return;
+	default:
+		fprintf(stderr, "query failed (%d)\n", ret);
+		return;
+	}
+	fprintf(stdout, "Port %u Txq %u rate limit info:\n",
+		res->port_id, res->queue_id);
+	if (info.rate_mbps > 0)
+		fprintf(stdout, "  Configured rate: %u Mbps\n",
+			info.rate_mbps);
+	else
+		fprintf(stdout, "  Configured rate: disabled\n");
+	fprintf(stdout, "  PP index (driver): %u\n", info.pp_index);
+	fprintf(stdout, "  PP index (FW readback): %u\n", info.fw_pp_index);
+}
+
+cmdline_parse_inst_t mlx5_cmd_show_rate_limit = {
+	.f = mlx5_cmd_show_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "mlx5 port <port_id> txq <queue_id> rate show",
+	.tokens = {
+		(void *)&mlx5_cmd_show_rate_limit_mlx5,
+		(void *)&mlx5_cmd_show_rate_limit_port,
+		(void *)&mlx5_cmd_show_rate_limit_port_id,
+		(void *)&mlx5_cmd_show_rate_limit_txq,
+		(void *)&mlx5_cmd_show_rate_limit_queue_id,
+		(void *)&mlx5_cmd_show_rate_limit_rate,
+		(void *)&mlx5_cmd_show_rate_limit_show,
+		NULL,
+	}
+};
+
 static struct testpmd_driver_commands mlx5_driver_cmds = {
 	.commands = {
 		{
@@ -1440,6 +1528,11 @@ static struct testpmd_driver_commands mlx5_driver_cmds = {
 			.help = "mlx5 port (port_id) queue (queue_id) dump rq_context (file_name)\n"
 				"    Dump mlx5 RQ Context\n\n",
 		},
+		{
+			.ctx = &mlx5_cmd_show_rate_limit,
+			.help = "mlx5 port (port_id) txq (queue_id) rate show\n"
+				"    Show per-queue rate limit PP index\n\n",
+		},
 		{
 			.ctx = NULL,
 		},
diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index 8085b5c306..fe8c363490 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -800,7 +800,7 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	if (!rte_eth_dev_is_valid_port(port_id))
 		return -ENODEV;
 
-	if (rte_eth_tx_queue_is_valid(port_id, queue_id))
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
 		return -EINVAL;
 
 	fd = fopen(path, "w");
@@ -848,3 +848,43 @@ int rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const ch
 	fclose(fd);
 	return ret;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_txq_rate_limit_query, 26.07)
+int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				       struct rte_pmd_mlx5_txq_rate_limit_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	struct mlx5_txq_data *txq_data;
+	struct mlx5_txq_ctrl *txq_ctrl;
+	uint32_t sq_out[MLX5_ST_SZ_DW(query_sq_out)] = {0};
+	int ret;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	if (rte_eth_tx_queue_is_valid(port_id, queue_id) != 0)
+		return -EINVAL;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	txq_data = (*priv->txqs)[queue_id];
+	if (txq_data == NULL)
+		return -EINVAL;
+	txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+	info->rate_mbps = txq_ctrl->rate_limit.rate_mbps;
+	info->pp_index = txq_ctrl->rate_limit.pp_id;
+	if (txq_ctrl->obj == NULL) {
+		info->fw_pp_index = 0;
+		return 0;
+	}
+	ret = mlx5_devx_cmd_query_sq(txq_ctrl->obj->sq_obj.sq,
+				     sq_out, sizeof(sq_out));
+	if (ret)
+		return -EIO;
+	info->fw_pp_index = MLX5_GET(sqc,
+				     MLX5_ADDR_OF(query_sq_out, sq_out,
+						  sq_context),
+				     packet_pacing_rate_limit_index);
+	return 0;
+}
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 7acfdae97d..698d7d2032 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -420,6 +420,36 @@ __rte_experimental
 int
 rte_pmd_mlx5_txq_dump_contexts(uint16_t port_id, uint16_t queue_id, const char *filename);
 
+/**
+ * Per-queue rate limit information.
+ */
+struct rte_pmd_mlx5_txq_rate_limit_info {
+	uint32_t rate_mbps;	/**< Configured rate in Mbps, 0 = disabled. */
+	uint16_t pp_index;	/**< PP index from driver state. */
+	uint16_t fw_pp_index;	/**< PP index read back from FW SQ context. */
+};
+
+/**
+ * Query per-queue rate limit state for a given Tx queue.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[in] queue_id
+ *   Tx queue ID.
+ * @param[out] info
+ *   Rate limit information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: invalid queue_id.
+ *   - -EIO: FW query failed.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
+				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (6 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-25  2:24         ` Stephen Hemminger
  2026-03-24 16:50       ` [PATCH v5 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
                         ` (2 subsequent siblings)
  10 siblings, 1 reply; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

The existing rte_eth_set_queue_rate_limit() API allows setting a
per-queue Tx rate but provides no way to read it back. Applications
such as grout are forced to maintain a shadow copy of the rate to
be able to report it.

Add rte_eth_get_queue_rate_limit() as the symmetric getter, following
the established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
rte_eth_dev_set_vlan_offload/get_vlan_offload).

This adds:
- eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
- rte_eth_get_queue_rate_limit() public experimental API (26.07)
- Trace point matching the existing setter pattern
- Generic testpmd command: show port <id> queue <id> rate

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 app/test-pmd/cmdline.c           | 69 ++++++++++++++++++++++++++++++++
 lib/ethdev/ethdev_driver.h       |  7 ++++
 lib/ethdev/ethdev_trace.h        |  9 +++++
 lib/ethdev/ethdev_trace_points.c |  3 ++
 lib/ethdev/rte_ethdev.c          | 35 ++++++++++++++++
 lib/ethdev/rte_ethdev.h          | 24 +++++++++++
 6 files changed, 147 insertions(+)

diff --git a/app/test-pmd/cmdline.c b/app/test-pmd/cmdline.c
index c5abeb5730..cc9c462498 100644
--- a/app/test-pmd/cmdline.c
+++ b/app/test-pmd/cmdline.c
@@ -8982,6 +8982,74 @@ static cmdline_parse_inst_t cmd_queue_rate_limit = {
 	},
 };
 
+/* *** SHOW RATE LIMIT FOR A QUEUE OF A PORT *** */
+struct cmd_show_queue_rate_limit_result {
+	cmdline_fixed_string_t show;
+	cmdline_fixed_string_t port;
+	uint16_t port_num;
+	cmdline_fixed_string_t queue;
+	uint16_t queue_num;
+	cmdline_fixed_string_t rate;
+};
+
+static void cmd_show_queue_rate_limit_parsed(void *parsed_result,
+		__rte_unused struct cmdline *cl,
+		__rte_unused void *data)
+{
+	struct cmd_show_queue_rate_limit_result *res = parsed_result;
+	uint32_t tx_rate = 0;
+	int ret;
+
+	ret = rte_eth_get_queue_rate_limit(res->port_num, res->queue_num,
+					   &tx_rate);
+	if (ret) {
+		fprintf(stderr, "Get queue rate limit failed: %s\n",
+			rte_strerror(-ret));
+		return;
+	}
+	if (tx_rate)
+		printf("Port %u Queue %u rate limit: %u Mbps\n",
+		       res->port_num, res->queue_num, tx_rate);
+	else
+		printf("Port %u Queue %u rate limit: disabled\n",
+		       res->port_num, res->queue_num);
+}
+
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_show =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				show, "show");
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_port =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				port, "port");
+static cmdline_parse_token_num_t cmd_show_queue_rate_limit_portnum =
+	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				port_num, RTE_UINT16);
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_queue =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				queue, "queue");
+static cmdline_parse_token_num_t cmd_show_queue_rate_limit_queuenum =
+	TOKEN_NUM_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				queue_num, RTE_UINT16);
+static cmdline_parse_token_string_t cmd_show_queue_rate_limit_rate =
+	TOKEN_STRING_INITIALIZER(struct cmd_show_queue_rate_limit_result,
+				rate, "rate");
+
+static cmdline_parse_inst_t cmd_show_queue_rate_limit = {
+	.f = cmd_show_queue_rate_limit_parsed,
+	.data = NULL,
+	.help_str = "show port <port_id> queue <queue_id> rate: "
+		"Show rate limit for a queue on port_id",
+	.tokens = {
+		(void *)&cmd_show_queue_rate_limit_show,
+		(void *)&cmd_show_queue_rate_limit_port,
+		(void *)&cmd_show_queue_rate_limit_portnum,
+		(void *)&cmd_show_queue_rate_limit_queue,
+		(void *)&cmd_show_queue_rate_limit_queuenum,
+		(void *)&cmd_show_queue_rate_limit_rate,
+		NULL,
+	},
+};
+
 /* *** SET RATE LIMIT FOR A VF OF A PORT *** */
 struct cmd_vf_rate_limit_result {
 	cmdline_fixed_string_t set;
@@ -14270,6 +14338,7 @@ static cmdline_parse_ctx_t builtin_ctx[] = {
 	&cmd_set_uc_all_hash_filter,
 	&cmd_vf_mac_addr_filter,
 	&cmd_queue_rate_limit,
+	&cmd_show_queue_rate_limit,
 	&cmd_tunnel_udp_config,
 	&cmd_showport_rss_hash,
 	&cmd_showport_rss_hash_key,
diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h
index 1255cd6f2c..0f336f9567 100644
--- a/lib/ethdev/ethdev_driver.h
+++ b/lib/ethdev/ethdev_driver.h
@@ -762,6 +762,11 @@ typedef int (*eth_set_queue_rate_limit_t)(struct rte_eth_dev *dev,
 				uint16_t queue_idx,
 				uint32_t tx_rate);
 
+/** @internal Get queue Tx rate. */
+typedef int (*eth_get_queue_rate_limit_t)(struct rte_eth_dev *dev,
+				uint16_t queue_idx,
+				uint32_t *tx_rate);
+
 /** @internal Add tunneling UDP port. */
 typedef int (*eth_udp_tunnel_port_add_t)(struct rte_eth_dev *dev,
 					 struct rte_eth_udp_tunnel *tunnel_udp);
@@ -1522,6 +1527,8 @@ struct eth_dev_ops {
 
 	/** Set queue rate limit */
 	eth_set_queue_rate_limit_t set_queue_rate_limit;
+	/** Get queue rate limit */
+	eth_get_queue_rate_limit_t get_queue_rate_limit;
 
 	/** Configure RSS hash protocols and hashing key */
 	rss_hash_update_t          rss_hash_update;
diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h
index 482befc209..6554cc1a21 100644
--- a/lib/ethdev/ethdev_trace.h
+++ b/lib/ethdev/ethdev_trace.h
@@ -908,6 +908,15 @@ RTE_TRACE_POINT(
 	rte_trace_point_emit_int(ret);
 )
 
+RTE_TRACE_POINT(
+	rte_eth_trace_get_queue_rate_limit,
+	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_idx,
+		int ret),
+	rte_trace_point_emit_u16(port_id);
+	rte_trace_point_emit_u16(queue_idx);
+	rte_trace_point_emit_int(ret);
+)
+
 RTE_TRACE_POINT(
 	rte_eth_trace_rx_avail_thresh_set,
 	RTE_TRACE_POINT_ARGS(uint16_t port_id, uint16_t queue_id,
diff --git a/lib/ethdev/ethdev_trace_points.c b/lib/ethdev/ethdev_trace_points.c
index 071c508327..0a28378a56 100644
--- a/lib/ethdev/ethdev_trace_points.c
+++ b/lib/ethdev/ethdev_trace_points.c
@@ -347,6 +347,9 @@ RTE_TRACE_POINT_REGISTER(rte_ethdev_trace_uc_all_hash_table_set,
 RTE_TRACE_POINT_REGISTER(rte_eth_trace_set_queue_rate_limit,
 	lib.ethdev.set_queue_rate_limit)
 
+RTE_TRACE_POINT_REGISTER(rte_eth_trace_get_queue_rate_limit,
+	lib.ethdev.get_queue_rate_limit)
+
 RTE_TRACE_POINT_REGISTER(rte_eth_trace_rx_avail_thresh_set,
 	lib.ethdev.rx_avail_thresh_set)
 
diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c
index 2edc7a362e..5e763e1855 100644
--- a/lib/ethdev/rte_ethdev.c
+++ b/lib/ethdev/rte_ethdev.c
@@ -5694,6 +5694,41 @@ int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 	return ret;
 }
 
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_get_queue_rate_limit, 26.07)
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+					uint32_t *tx_rate)
+{
+	struct rte_eth_dev *dev;
+	int ret;
+
+	RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV);
+	dev = &rte_eth_devices[port_id];
+
+	if (tx_rate == NULL) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: NULL tx_rate pointer",
+			port_id);
+		return -EINVAL;
+	}
+
+	if (queue_idx >= dev->data->nb_tx_queues) {
+		RTE_ETHDEV_LOG_LINE(ERR,
+			"Get queue rate limit:port %u: invalid queue ID=%u",
+			port_id, queue_idx);
+		return -EINVAL;
+	}
+
+	if (dev->dev_ops->get_queue_rate_limit == NULL)
+		return -ENOTSUP;
+	ret = eth_err(port_id,
+		      dev->dev_ops->get_queue_rate_limit(dev, queue_idx,
+							 tx_rate));
+
+	rte_eth_trace_get_queue_rate_limit(port_id, queue_idx, ret);
+
+	return ret;
+}
+
 RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_eth_rx_avail_thresh_set, 22.07)
 int rte_eth_rx_avail_thresh_set(uint16_t port_id, uint16_t queue_id,
 			       uint8_t avail_thresh)
diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h
index 0d8e2d0236..e525217b77 100644
--- a/lib/ethdev/rte_ethdev.h
+++ b/lib/ethdev/rte_ethdev.h
@@ -4817,6 +4817,30 @@ int rte_eth_dev_uc_all_hash_table_set(uint16_t port_id, uint8_t on);
 int rte_eth_set_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
 			uint32_t tx_rate);
 
+/**
+ * @warning
+ * @b EXPERIMENTAL: this API may change, or be removed, without prior notice.
+ *
+ * Get the rate limitation for a queue on an Ethernet device.
+ *
+ * @param port_id
+ *   The port identifier of the Ethernet device.
+ * @param queue_idx
+ *   The queue ID.
+ * @param[out] tx_rate
+ *   A pointer to retrieve the Tx rate in Mbps.
+ *   0 means rate limiting is disabled.
+ * @return
+ *   - (0) if successful.
+ *   - (-ENOTSUP) if hardware doesn't support this feature.
+ *   - (-ENODEV) if *port_id* invalid.
+ *   - (-EIO) if device is removed.
+ *   - (-EINVAL) if bad parameter.
+ */
+__rte_experimental
+int rte_eth_get_queue_rate_limit(uint16_t port_id, uint16_t queue_idx,
+			uint32_t *tx_rate);
+
 /**
  * Configuration of Receive Side Scaling hash computation of Ethernet device.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 09/10] net/mlx5: implement per-queue Tx rate limit getter
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (7 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-24 16:50       ` [PATCH v5 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
  2026-03-25  2:25       ` [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Wire the mlx5 PMD to the new rte_eth_get_queue_rate_limit()
ethdev callback. The implementation reads the per-queue
rate_mbps tracking field from the txq_ctrl structure.

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5.c     |  2 ++
 drivers/net/mlx5/mlx5_tx.h  |  2 ++
 drivers/net/mlx5/mlx5_txq.c | 30 ++++++++++++++++++++++++++++++
 3 files changed, 34 insertions(+)

diff --git a/drivers/net/mlx5/mlx5.c b/drivers/net/mlx5/mlx5.c
index 7d08d7886b..f5784761f9 100644
--- a/drivers/net/mlx5/mlx5.c
+++ b/drivers/net/mlx5/mlx5.c
@@ -2652,6 +2652,7 @@ const struct eth_dev_ops mlx5_dev_ops = {
 	.rx_metadata_negotiate = mlx5_flow_rx_metadata_negotiate,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /* Available operations from secondary process. */
@@ -2746,6 +2747,7 @@ const struct eth_dev_ops mlx5_dev_ops_isolate = {
 	.map_aggr_tx_affinity = mlx5_map_aggr_tx_affinity,
 	.get_restore_flags = mlx5_get_restore_flags,
 	.set_queue_rate_limit = mlx5_set_queue_rate_limit,
+	.get_queue_rate_limit = mlx5_get_queue_rate_limit,
 };
 
 /**
diff --git a/drivers/net/mlx5/mlx5_tx.h b/drivers/net/mlx5/mlx5_tx.h
index 975ff57acd..02feb9e6fd 100644
--- a/drivers/net/mlx5/mlx5_tx.h
+++ b/drivers/net/mlx5/mlx5_tx.h
@@ -224,6 +224,8 @@ int mlx5_txq_releasable(struct rte_eth_dev *dev, uint16_t idx);
 int mlx5_txq_verify(struct rte_eth_dev *dev);
 int mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 			      uint32_t tx_rate);
+int mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			      uint32_t *tx_rate);
 int mlx5_txq_get_sqn(struct mlx5_txq_ctrl *txq);
 void mlx5_txq_alloc_elts(struct mlx5_txq_ctrl *txq_ctrl);
 void mlx5_txq_free_elts(struct mlx5_txq_ctrl *txq_ctrl);
diff --git a/drivers/net/mlx5/mlx5_txq.c b/drivers/net/mlx5/mlx5_txq.c
index ce08363ca9..867ea4b994 100644
--- a/drivers/net/mlx5/mlx5_txq.c
+++ b/drivers/net/mlx5/mlx5_txq.c
@@ -1481,6 +1481,36 @@ mlx5_set_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
 	return 0;
 }
 
+/**
+ * Get per-queue packet pacing rate limit.
+ *
+ * @param dev
+ *   Pointer to Ethernet device.
+ * @param queue_idx
+ *   TX queue index.
+ * @param[out] tx_rate
+ *   Pointer to store the TX rate in Mbps, 0 if rate limiting is disabled.
+ *
+ * @return
+ *   0 on success, a negative errno value otherwise and rte_errno is set.
+ */
+int
+mlx5_get_queue_rate_limit(struct rte_eth_dev *dev, uint16_t queue_idx,
+			  uint32_t *tx_rate)
+{
+	struct mlx5_priv *priv = dev->data->dev_private;
+	struct mlx5_txq_ctrl *txq_ctrl;
+
+	if (priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL) {
+		rte_errno = EINVAL;
+		return -rte_errno;
+	}
+	txq_ctrl = container_of((*priv->txqs)[queue_idx],
+				struct mlx5_txq_ctrl, txq);
+	*tx_rate = txq_ctrl->rate_limit.rate_mbps;
+	return 0;
+}
+
 /**
  * Verify if the queue can be released.
  *
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* [PATCH v5 10/10] net/mlx5: add rate table capacity query API
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (8 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
@ 2026-03-24 16:50       ` Vincent Jardin
  2026-03-25  2:25       ` [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
  10 siblings, 0 replies; 87+ messages in thread
From: Vincent Jardin @ 2026-03-24 16:50 UTC (permalink / raw)
  To: dev
  Cc: rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo, bingz,
	orika, suanmingm, matan, stephen, aman.deep.singh, Vincent Jardin

Add rte_pmd_mlx5_pp_rate_table_query() to report the HW packet
pacing rate table size and how many entries are used by this port.

The total comes from the HCA QoS capability
packet_pacing_rate_table_size. The port_used count is derived by
collecting unique non-zero PP indices across this port's TX queues.

The rate table is a global shared HW resource: firmware, kernel,
other DPDK ports on the same device, and other application
instances may all consume entries. The port_used count is therefore
a lower bound of actual HW usage.

With shared PP allocation (flags=0), the kernel mlx5 driver reuses
a single rate table entry for all PP contexts with identical
parameters (rate, burst, packet size). Multiple queues configured
with the same rate share one pp_id, so port_used counts unique
entries, not the number of queues with rate limiting enabled.

Applications that need device-wide visibility should query all
ports on the same physical device and aggregate the results,
similar to how the kernel mlx5 driver tracks usage internally.

Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
Signed-off-by: Vincent Jardin <vjardin@free.fr>
---
 drivers/net/mlx5/mlx5_tx.c      | 64 +++++++++++++++++++++++++++++++++
 drivers/net/mlx5/rte_pmd_mlx5.h | 44 +++++++++++++++++++++++
 2 files changed, 108 insertions(+)

diff --git a/drivers/net/mlx5/mlx5_tx.c b/drivers/net/mlx5/mlx5_tx.c
index fe8c363490..654323852f 100644
--- a/drivers/net/mlx5/mlx5_tx.c
+++ b/drivers/net/mlx5/mlx5_tx.c
@@ -19,6 +19,7 @@
 
 #include <mlx5_prm.h>
 #include <mlx5_common.h>
+#include <mlx5_malloc.h>
 
 #include "mlx5_autoconf.h"
 #include "mlx5_defs.h"
@@ -888,3 +889,66 @@ int rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				     packet_pacing_rate_limit_index);
 	return 0;
 }
+
+RTE_EXPORT_EXPERIMENTAL_SYMBOL(rte_pmd_mlx5_pp_rate_table_query, 26.07)
+int rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				     struct rte_pmd_mlx5_pp_rate_table_info *info)
+{
+	struct rte_eth_dev *dev;
+	struct mlx5_priv *priv;
+	uint16_t used = 0;
+	uint16_t *seen;
+	unsigned int i;
+
+	if (info == NULL)
+		return -EINVAL;
+	if (!rte_eth_dev_is_valid_port(port_id))
+		return -ENODEV;
+	dev = &rte_eth_devices[port_id];
+	priv = dev->data->dev_private;
+	if (!priv->sh->cdev->config.hca_attr.qos.packet_pacing) {
+		rte_errno = ENOTSUP;
+		return -ENOTSUP;
+	}
+	info->total = priv->sh->cdev->config.hca_attr.qos.packet_pacing_rate_table_size;
+	if (priv->txqs == NULL || priv->txqs_n == 0) {
+		info->port_used = 0;
+		return 0;
+	}
+	seen = mlx5_malloc(MLX5_MEM_SYS | MLX5_MEM_ZERO,
+			   priv->txqs_n * sizeof(*seen), 0, SOCKET_ID_ANY);
+	if (seen == NULL)
+		return -ENOMEM;
+	/*
+	 * Count unique non-zero PP indices across this port's TX queues.
+	 * Note: the count reflects only queues on this port; other ports
+	 * sharing the same device may also consume rate table entries.
+	 */
+	for (i = 0; i < priv->txqs_n; i++) {
+		struct mlx5_txq_data *txq_data;
+		struct mlx5_txq_ctrl *txq_ctrl;
+		uint16_t pp_id;
+		uint16_t j;
+		bool dup;
+
+		if ((*priv->txqs)[i] == NULL)
+			continue;
+		txq_data = (*priv->txqs)[i];
+		txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
+		pp_id = txq_ctrl->rate_limit.pp_id;
+		if (pp_id == 0)
+			continue;
+		dup = false;
+		for (j = 0; j < used; j++) {
+			if (seen[j] == pp_id) {
+				dup = true;
+				break;
+			}
+		}
+		if (!dup)
+			seen[used++] = pp_id;
+	}
+	mlx5_free(seen);
+	info->port_used = used;
+	return 0;
+}
diff --git a/drivers/net/mlx5/rte_pmd_mlx5.h b/drivers/net/mlx5/rte_pmd_mlx5.h
index 698d7d2032..621d8c2b15 100644
--- a/drivers/net/mlx5/rte_pmd_mlx5.h
+++ b/drivers/net/mlx5/rte_pmd_mlx5.h
@@ -450,6 +450,50 @@ int
 rte_pmd_mlx5_txq_rate_limit_query(uint16_t port_id, uint16_t queue_id,
 				  struct rte_pmd_mlx5_txq_rate_limit_info *info);
 
+/**
+ * Packet pacing rate table capacity information.
+ */
+struct rte_pmd_mlx5_pp_rate_table_info {
+	uint16_t total;		/**< Total HW rate table entries. */
+	uint16_t port_used;	/**< Entries used by this port's TX queues. */
+};
+
+/**
+ * Query packet pacing rate table capacity.
+ *
+ * The ``port_used`` count reflects only unique PP indices allocated
+ * by the queried port's TX queues. It is a lower bound of actual HW
+ * usage because the rate table is a global shared resource:
+ * - Other DPDK ports on the same physical device may hold entries.
+ * - The kernel mlx5 driver and firmware may also consume entries.
+ * - Multiple DPDK application instances may share the device.
+ *
+ * When multiple queues on the same port are configured with identical
+ * rate parameters, the kernel shares a single rate table entry across
+ * them (with flags=0 allocation), so ``port_used`` counts unique
+ * entries, not the number of queues with rate limiting enabled.
+ *
+ * Applications that need device-wide visibility should query all
+ * ports on the same physical device and aggregate the results,
+ * similar to how the kernel mlx5 driver tracks usage internally.
+ *
+ * @param[in] port_id
+ *   Port ID.
+ * @param[out] info
+ *   Rate table capacity information.
+ *
+ * @return
+ *   0 on success, negative errno on failure:
+ *   - -ENODEV: invalid port_id.
+ *   - -EINVAL: info is NULL.
+ *   - -ENOTSUP: packet pacing not supported.
+ *   - -ENOMEM: allocation failure.
+ */
+__rte_experimental
+int
+rte_pmd_mlx5_pp_rate_table_query(uint16_t port_id,
+				 struct rte_pmd_mlx5_pp_rate_table_info *info);
+
 /** Type of mlx5 driver event for which custom callback is called. */
 enum rte_pmd_mlx5_driver_event_cb_type {
 	/** Called after HW Rx queue is created. */
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 08/10] ethdev: add getter for per-queue Tx rate limit
  2026-03-24 16:50       ` [PATCH v5 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
@ 2026-03-25  2:24         ` Stephen Hemminger
  0 siblings, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-25  2:24 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan, aman.deep.singh

On Tue, 24 Mar 2026 17:50:45 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> The existing rte_eth_set_queue_rate_limit() API allows setting a
> per-queue Tx rate but provides no way to read it back. Applications
> such as grout are forced to maintain a shadow copy of the rate to
> be able to report it.
> 
> Add rte_eth_get_queue_rate_limit() as the symmetric getter, following
> the established DPDK pattern (e.g. rte_eth_dev_set_mtu/get_mtu,
> rte_eth_dev_set_vlan_offload/get_vlan_offload).
> 
> This adds:
> - eth_get_queue_rate_limit_t driver callback in ethdev_driver.h
> - rte_eth_get_queue_rate_limit() public experimental API (26.07)
> - Trace point matching the existing setter pattern
> - Generic testpmd command: show port <id> queue <id> rate
> 
> Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> Signed-off-by: Vincent Jardin <vjardin@free.fr>
> ---

Acked-by: Stephen Hemminger <stephen@networkplumber.org>

^ permalink raw reply	[flat|nested] 87+ messages in thread

* Re: [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
  2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
                         ` (9 preceding siblings ...)
  2026-03-24 16:50       ` [PATCH v5 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
@ 2026-03-25  2:25       ` Stephen Hemminger
  10 siblings, 0 replies; 87+ messages in thread
From: Stephen Hemminger @ 2026-03-25  2:25 UTC (permalink / raw)
  To: Vincent Jardin
  Cc: dev, rasland, thomas, andrew.rybchenko, dsosnowski, viacheslavo,
	bingz, orika, suanmingm, matan, aman.deep.singh

On Tue, 24 Mar 2026 17:50:37 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
> hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
> ethdev API to read back the configured rate.
> 
> Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
> rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a PP
> context per queue from the HW rate table, programs the PP index into the
> SQ via modify_sq, and relies on the kernel to share identical rates
> across PP contexts to conserve table entries. A PMD-specific API exposes
> per-queue PP diagnostics and rate table capacity.
> 
> Patch breakdown:
> 
>   01/10 doc/nics/mlx5: fix stale packet pacing documentation
>   02/10 common/mlx5: query packet pacing rate table capabilities
>   03/10 common/mlx5: extend SQ modify to support rate limit update
>   04/10 net/mlx5: add per-queue packet pacing infrastructure
>   05/10 net/mlx5: support per-queue rate limiting
>   06/10 net/mlx5: add burst pacing devargs
>   07/10 net/mlx5: add testpmd command to query per-queue rate limit
>   08/10 ethdev: add getter for per-queue Tx rate limit
>   09/10 net/mlx5: implement per-queue Tx rate limit getter
>   10/10 net/mlx5: add rate table capacity query API
> 
> Release notes for the new ethdev API and mlx5 per-queue rate
> limiting can be added to a release_26_07.rst once the file is
> created at the start of the 26.07 development cycle.
> 
> Changes since v4:
> 
>   Addressed review feedback from Stephen Hemminger and added
>   Acked-by from Viacheslav Ovsiienko on patches 03-10.
> 
>   Patch 05/10 (set rate):
>   - Add rate_kbps > UINT32_MAX bounds check before truncating to
>     the PRM rate_limit field, preventing silent overflow when HW
>     reports no maximum rate
> 
>   Patch 07/10 (testpmd + PMD query):
>   - Add NULL check on (*priv->txqs)[queue_id] before container_of()
>     in rte_pmd_mlx5_txq_rate_limit_query(), matching the pattern
>     in the setter
> 
>   Patches 03-10:
>   - Added Acked-by: Viacheslav Ovsiienko <viacheslavo@nvidia.com>
> 
> Changes since v3:
> 
>   Addressed review feedback from Stephen and Slava (nvidia/Mellanox).
> 
>   Patch 02/10 (query caps):
>   - Added Acked-by: Viacheslav Ovsiienko
> 
>   Patch 03/10 (SQ modify):
>   - Define MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX
>     enum in mlx5_prm.h, following the MLX5_MODIFY_RQ_IN_MODIFY_xxx pattern
>   - Use read-modify-write for modify_bitmask (MLX5_GET64 | OR | MLX5_SET64)
>     instead of direct overwrite, for forward compatibility
> 
>   Patch 04/10 (PP infrastructure):
>   - Rename struct member and parameters from "rl" to "rate_limit"
>     for consistency with codebase naming style
>   - Replace MLX5_ASSERT(rate_mbps > 0) with runtime check returning
>     -EINVAL in non-debug builds
>   - Move mlx5_txq_free_pp_rate_limit() to after txq_obj_release() in
>     mlx5_txq_release() — destroy the SQ before freeing the PP index
>     it references
>   - Clarify commit message: distinct PP handle per queue (for cleanup)
>     but kernel shares the same pp_id for identical rate parameters
> 
>   Patch 05/10 (set rate):
>   - Fix obj->sq vs obj->sq_obj.sq: use obj->sq_obj.sq from the start
>     for non-hairpin queues (was introduced in patch 07 in v3, breaking
>     git bisect)
>   - Move all variable declarations to block top (sq_devx,
>     new_rate_limit)
>   - Add queue state check: reject set_queue_rate_limit if queue is not
>     STARTED (SQ not in RDY state)
>   - Update mlx5 feature matrix: Rate limitation = Y
>   - Add Per-Queue Tx Rate Limiting documentation section in mlx5.rst
>     covering DevX requirement, hardware support, rate table sharing,
>     and testpmd usage
> 
>   Patch 06/10 (burst devargs):
>   - Remove burst_upper_bound/typical_packet_size from Clock Queue
>     path (mlx5_txpp_alloc_pp_index) — Clock Queue uses WQE rate
>     pacing and does not need these parameters
>   - Update commit message and documentation accordingly
> 
>   Patch 07/10 (testpmd + PMD query):
>   - sq_obj.sq accessor change moved to patch 05 (see above)
>   - sq_devx declaration moved to block top
> 
>   Patch 08/10 (ethdev getter) — split from v3 patch 08:
>   - Split into ethdev API (this patch) and mlx5 driver (patch 09)
>   - Add rte_eth_trace_get_queue_rate_limit() trace point matching
>     the existing setter pattern
> 
>   Patch 09/10 — NEW (was part of v3 patch 08):
>   - mlx5 driver implementation of get_queue_rate_limit callback,
>     split out per Slava's request
> 
>   Patch 10/10 (rate table query):
>   - Rename struct field "used" to "port_used" to clarify per-port
>     scope
>   - Strengthen Doxygen: rate table is a global shared HW resource
>     (firmware, kernel, other DPDK instances may consume entries);
>     port_used is a lower bound
>   - Document PP sharing behavior with flags=0
>   - Note that applications should aggregate across ports for
>     device-wide visibility
> 
> Changes since v2:
> 
>   Addressed review feedback from Stephen Hemminger:
> 
>   Patch 04: cleaned redundant cast parentheses on (struct mlx5dv_pp *)
>   Patch 04: consolidated dv_alloc_pp call onto one line
>   Patch 05+08: removed redundant queue_idx bounds checks from driver
>     callbacks — ethdev layer is the single validation point
>   Patch 07: added generic testpmd command: show port <id> queue <id> rate
>   Patch 08+10: removed release notes from release_26_03.rst (targets 26.07)
>   Patch 10: use MLX5_MEM_SYS | MLX5_MEM_ZERO for heap allocation
>   Patch 10: consolidated packet_pacing_rate_table_size onto one line
> 
> Changes since v1:
> 
>   Patch 01: Acked-by Viacheslav Ovsiienko
>   Patch 04: rate bounds validation, uint64_t overflow fix, remove
>     early PP free
>   Patch 05: PP leak fix (temp struct pattern), rte_errno in error paths
>   Patch 07: inverted rte_eth_tx_queue_is_valid() check
>   Patch 10: stack array replaced with heap, per-port scope documented
> 
> Testing:
> 
>   - Build: GCC, no warnings
>   - Hardware: ConnectX-6 Dx
>   - DevX path (default): set/get/disable rate limiting verified
>   - Verbs path (dv_flow_en=0): returns -EINVAL cleanly (SQ DevX
>     object not available), no crash
> 
> Vincent Jardin (10):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: implement per-queue Tx rate limit getter
>   net/mlx5: add rate table capacity query API
> 
> Vincent Jardin (10):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: implement per-queue Tx rate limit getter
>   net/mlx5: add rate table capacity query API
> 
>  app/test-pmd/cmdline.c               |  69 ++++++++++
>  doc/guides/nics/features/mlx5.ini    |   1 +
>  doc/guides/nics/mlx5.rst             | 180 ++++++++++++++++++++++-----
>  drivers/common/mlx5/mlx5_devx_cmds.c |  23 ++++
>  drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
>  drivers/common/mlx5/mlx5_prm.h       |   7 ++
>  drivers/net/mlx5/mlx5.c              |  46 +++++++
>  drivers/net/mlx5/mlx5.h              |  13 ++
>  drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c           | 106 +++++++++++++++-
>  drivers/net/mlx5/mlx5_tx.h           |   5 +
>  drivers/net/mlx5/mlx5_txpp.c         |  90 ++++++++++++++
>  drivers/net/mlx5/mlx5_txq.c          | 149 ++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h      |  74 +++++++++++
>  lib/ethdev/ethdev_driver.h           |   7 ++
>  lib/ethdev/ethdev_trace.h            |   9 ++
>  lib/ethdev/ethdev_trace_points.c     |   3 +
>  lib/ethdev/rte_ethdev.c              |  35 ++++++
>  lib/ethdev/rte_ethdev.h              |  24 ++++
>  19 files changed, 914 insertions(+), 33 deletions(-)
> 

I acked the ethdev changes. Since the bulk of the changes are
to mlx5, this should go through next-net-mlx5 tree.

^ permalink raw reply	[flat|nested] 87+ messages in thread

end of thread, other threads:[~2026-03-25  2:25 UTC | newest]

Thread overview: 87+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-11 12:26   ` Slava Ovsiienko
2026-03-10  9:20 ` [PATCH v1 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 9/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-10 14:20 ` [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-11 16:29     ` Stephen Hemminger
2026-03-10 23:26   ` [PATCH v2 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-11 16:31     ` Stephen Hemminger
2026-03-10 23:26   ` [PATCH v2 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-11 16:17     ` Stephen Hemminger
2026-03-11 16:26     ` Stephen Hemminger
2026-03-12 15:54       ` Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 09/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-11 16:31     ` Stephen Hemminger
2026-03-11 16:35     ` Stephen Hemminger
2026-03-12 15:05       ` Vincent Jardin
2026-03-12 16:01         ` Stephen Hemminger
2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 01/9] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-20 12:02     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-20 12:01     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-20 12:51     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 05/9] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-20 15:11     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 06/9] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-20 15:19     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-20 15:38     ` Slava Ovsiienko
2026-03-22 14:02       ` Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 08/9] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-20 15:44     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 09/9] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-20 15:49     ` Slava Ovsiienko
2026-03-16 16:04   ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-22 14:16     ` Vincent Jardin
2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-23 12:59       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-23 13:00       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-23 13:17       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-23 13:18       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-23 13:19       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-23 13:19       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
2026-03-23 13:20       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-23 13:20       ` Slava Ovsiienko
2026-03-23 23:09     ` [PATCH v4 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-24 16:50     ` [PATCH v5 " Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-25  2:24         ` Stephen Hemminger
2026-03-24 16:50       ` [PATCH v5 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-25  2:25       ` [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox