public inbox for dev@dpdk.org
 help / color / mirror / Atom feed
From: Stephen Hemminger <stephen@networkplumber.org>
To: Vincent Jardin <vjardin@free.fr>
Cc: dev@dpdk.org, rasland@nvidia.com, thomas@monjalon.net,
	andrew.rybchenko@oktetlabs.ru, dsosnowski@nvidia.com,
	viacheslavo@nvidia.com, bingz@nvidia.com, orika@nvidia.com,
	suanmingm@nvidia.com, matan@nvidia.com,
	aman.deep.singh@intel.com
Subject: Re: [PATCH v4 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing
Date: Mon, 23 Mar 2026 16:09:55 -0700	[thread overview]
Message-ID: <20260323160955.5379adfe@phoenix.local> (raw)
In-Reply-To: <20260322134629.94633-1-vjardin@free.fr>

On Sun, 22 Mar 2026 14:46:19 +0100
Vincent Jardin <vjardin@free.fr> wrote:

> This series adds per-queue Tx data-rate limiting to the mlx5 PMD using
> hardware packet pacing (PP), and a symmetric rte_eth_get_queue_rate_limit()
> ethdev API to read back the configured rate.
> 
> Each Tx queue can be assigned an individual rate (in Mbps) at runtime via
> rte_eth_set_queue_rate_limit(). The mlx5 implementation allocates a PP
> context per queue from the HW rate table, programs the PP index into the
> SQ via modify_sq, and relies on the kernel to share identical rates
> across PP contexts to conserve table entries. A PMD-specific API exposes
> per-queue PP diagnostics and rate table capacity.
> 
> Patch breakdown:
> 
>   01/10 doc/nics/mlx5: fix stale packet pacing documentation
>   02/10 common/mlx5: query packet pacing rate table capabilities
>   03/10 common/mlx5: extend SQ modify to support rate limit update
>   04/10 net/mlx5: add per-queue packet pacing infrastructure
>   05/10 net/mlx5: support per-queue rate limiting
>   06/10 net/mlx5: add burst pacing devargs
>   07/10 net/mlx5: add testpmd command to query per-queue rate limit
>   08/10 ethdev: add getter for per-queue Tx rate limit
>   09/10 net/mlx5: implement per-queue Tx rate limit getter
>   10/10 net/mlx5: add rate table capacity query API
> 
> Release notes for the new ethdev API and mlx5 per-queue rate
> limiting can be added to a release_26_07.rst once the file is
> created at the start of the 26.07 development cycle.
> 
> Changes since v3:
> 
>   Addressed review feedback from Stephen and Slava (nvidia/Mellanox).
> 
>   Patch 02/10 (query caps):
>   - Added Acked-by: Viacheslav Ovsiienko
> 
>   Patch 03/10 (SQ modify):
>   - Define MLX5_MODIFY_SQ_IN_MODIFY_BITMASK_PACKET_PACING_RATE_LIMIT_INDEX
>     enum in mlx5_prm.h, following the MLX5_MODIFY_RQ_IN_MODIFY_xxx pattern
>   - Use read-modify-write for modify_bitmask (MLX5_GET64 | OR | MLX5_SET64)
>     instead of direct overwrite, for forward compatibility
> 
>   Patch 04/10 (PP infrastructure):
>   - Rename struct member and parameters from "rl" to "rate_limit"
>     for consistency with codebase naming style
>   - Replace MLX5_ASSERT(rate_mbps > 0) with runtime check returning
>     -EINVAL in non-debug builds
>   - Move mlx5_txq_free_pp_rate_limit() to after txq_obj_release() in
>     mlx5_txq_release() — destroy the SQ before freeing the PP index
>     it references
>   - Clarify commit message: distinct PP handle per queue (for cleanup)
>     but kernel shares the same pp_id for identical rate parameters
> 
>   Patch 05/10 (set rate):
>   - Fix obj->sq vs obj->sq_obj.sq: use obj->sq_obj.sq from the start
>     for non-hairpin queues (was introduced in patch 07 in v3, breaking
>     git bisect)
>   - Move all variable declarations to block top (sq_devx,
>     new_rate_limit)
>   - Add queue state check: reject set_queue_rate_limit if queue is not
>     STARTED (SQ not in RDY state)
>   - Update mlx5 feature matrix: Rate limitation = Y
>   - Add Per-Queue Tx Rate Limiting documentation section in mlx5.rst
>     covering DevX requirement, hardware support, rate table sharing,
>     and testpmd usage
> 
>   Patch 06/10 (burst devargs):
>   - Remove burst_upper_bound/typical_packet_size from Clock Queue
>     path (mlx5_txpp_alloc_pp_index) — Clock Queue uses WQE rate
>     pacing and does not need these parameters
>   - Update commit message and documentation accordingly
> 
>   Patch 07/10 (testpmd + PMD query):
>   - sq_obj.sq accessor change moved to patch 05 (see above)
>   - sq_devx declaration moved to block top
> 
>   Patch 08/10 (ethdev getter) — split from v3 patch 08:
>   - Split into ethdev API (this patch) and mlx5 driver (patch 09)
>   - Add rte_eth_trace_get_queue_rate_limit() trace point matching
>     the existing setter pattern
> 
>   Patch 09/10 — NEW (was part of v3 patch 08):
>   - mlx5 driver implementation of get_queue_rate_limit callback,
>     split out per Slava's request
> 
>   Patch 10/10 (rate table query):
>   - Rename struct field "used" to "port_used" to clarify per-port
>     scope
>   - Strengthen Doxygen: rate table is a global shared HW resource
>     (firmware, kernel, other DPDK instances may consume entries);
>     port_used is a lower bound
>   - Document PP sharing behavior with flags=0
>   - Note that applications should aggregate across ports for
>     device-wide visibility
> 
> Changes since v2:
> 
>   Addressed review feedback from Stephen Hemminger:
> 
>   Patch 04: cleaned redundant cast parentheses on (struct mlx5dv_pp *)
>   Patch 04: consolidated dv_alloc_pp call onto one line
>   Patch 05+08: removed redundant queue_idx bounds checks from driver
>     callbacks — ethdev layer is the single validation point
>   Patch 07: added generic testpmd command: show port <id> queue <id> rate
>   Patch 08+10: removed release notes from release_26_03.rst (targets 26.07)
>   Patch 10: use MLX5_MEM_SYS | MLX5_MEM_ZERO for heap allocation
>   Patch 10: consolidated packet_pacing_rate_table_size onto one line
> 
> Changes since v1:
> 
>   Patch 01: Acked-by Viacheslav Ovsiienko
>   Patch 04: rate bounds validation, uint64_t overflow fix, remove
>     early PP free
>   Patch 05: PP leak fix (temp struct pattern), rte_errno in error paths
>   Patch 07: inverted rte_eth_tx_queue_is_valid() check
>   Patch 10: stack array replaced with heap, per-port scope documented
> 
> Testing:
> 
>   - Build: GCC, no warnings
>   - Hardware: ConnectX-6 Dx
>   - DevX path (default): set/get/disable rate limiting verified
>   - Verbs path (dv_flow_en=0): returns -EINVAL cleanly (SQ DevX
>     object not available), no crash
> 
> Vincent Jardin (10):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: implement per-queue Tx rate limit getter
>   net/mlx5: add rate table capacity query API
> 
> Vincent Jardin (10):
>   doc/nics/mlx5: fix stale packet pacing documentation
>   common/mlx5: query packet pacing rate table capabilities
>   common/mlx5: extend SQ modify to support rate limit update
>   net/mlx5: add per-queue packet pacing infrastructure
>   net/mlx5: support per-queue rate limiting
>   net/mlx5: add burst pacing devargs
>   net/mlx5: add testpmd command to query per-queue rate limit
>   ethdev: add getter for per-queue Tx rate limit
>   net/mlx5: implement per-queue Tx rate limit getter
>   net/mlx5: add rate table capacity query API
> 
>  app/test-pmd/cmdline.c               |  69 ++++++++++
>  doc/guides/nics/features/mlx5.ini    |   1 +
>  doc/guides/nics/mlx5.rst             | 180 ++++++++++++++++++++++-----
>  drivers/common/mlx5/mlx5_devx_cmds.c |  23 ++++
>  drivers/common/mlx5/mlx5_devx_cmds.h |  14 ++-
>  drivers/common/mlx5/mlx5_prm.h       |   7 ++
>  drivers/net/mlx5/mlx5.c              |  46 +++++++
>  drivers/net/mlx5/mlx5.h              |  13 ++
>  drivers/net/mlx5/mlx5_testpmd.c      |  93 ++++++++++++++
>  drivers/net/mlx5/mlx5_tx.c           | 104 +++++++++++++++-
>  drivers/net/mlx5/mlx5_tx.h           |   5 +
>  drivers/net/mlx5/mlx5_txpp.c         |  84 +++++++++++++
>  drivers/net/mlx5/mlx5_txq.c          | 149 ++++++++++++++++++++++
>  drivers/net/mlx5/rte_pmd_mlx5.h      |  74 +++++++++++
>  lib/ethdev/ethdev_driver.h           |   7 ++
>  lib/ethdev/ethdev_trace.h            |   9 ++
>  lib/ethdev/ethdev_trace_points.c     |   3 +
>  lib/ethdev/rte_ethdev.c              |  35 ++++++
>  lib/ethdev/rte_ethdev.h              |  24 ++++
>  19 files changed, 906 insertions(+), 33 deletions(-)
> 


I didn't see anything wrong, but AI nit picking did.



---

## Review: [PATCH v4 01-10/10] mlx5 per-queue rate limiting

### Patch 05/10 — `net/mlx5: support per-queue rate limiting`

**Error (Correctness): Data truncation in `rate_kbps` → PRM `rate_limit` field**

In `mlx5_txq_alloc_pp_rate_limit()`, `rate_kbps` is computed as `(uint64_t)rate_mbps * 1000` and then stored with `MLX5_SET(..., rate_limit, (uint32_t)rate_kbps)`. When `qos->packet_pacing_max_rate` is zero (HW doesn't report a maximum), the range check is skipped, and any `rate_mbps` value above ~4,294,967 will silently overflow the `uint32_t` cast, programming a completely wrong rate into the hardware.

For example, `rate_mbps = 5000000` (5 Tbps — unlikely but allowed by the `uint32_t` parameter type) yields `rate_kbps = 5,000,000,000` which truncates to `705,032,704` kbps.

Suggested fix: add an explicit check that `rate_kbps` fits in `uint32_t` before the cast, or cap it:

```c
if (rate_kbps > UINT32_MAX) {
    DRV_LOG(ERR, "Rate %u Mbps overflows PRM rate_limit field.", rate_mbps);
    rte_errno = ERANGE;
    return -ERANGE;
}
```

**Warning (Correctness): Missing NULL check on `txq_data` in `rte_pmd_mlx5_txq_rate_limit_query()` (patch 07)**

After `rte_eth_tx_queue_is_valid()` confirms the index is in range, the code does:

```c
txq_data = (*priv->txqs)[queue_id];
txq_ctrl = container_of(txq_data, struct mlx5_txq_ctrl, txq);
```

If the queue index is valid but the queue has been released (entry is NULL), `container_of` on a NULL pointer produces an invalid pointer, and subsequent dereferences are undefined behavior. The setter in patch 05 checks `(*priv->txqs)[queue_idx] == NULL` but the query function does not.

Suggested fix: add a NULL check on `txq_data` before the `container_of`, similar to what the setter does.

The same issue exists in `rte_pmd_mlx5_pp_rate_table_query()` (patch 10) but there the loop body does check `(*priv->txqs)[i] == NULL` and skips, so that one is fine.

### Patch 09/10 — `net/mlx5: implement per-queue Tx rate limit getter`

**Warning (Correctness): Missing NULL check on queue data in `mlx5_get_queue_rate_limit()`**

The getter checks `priv->txqs == NULL || (*priv->txqs)[queue_idx] == NULL` which is good. However, the ethdev layer (`rte_eth_get_queue_rate_limit()` in patch 08) only checks `queue_idx >= dev->data->nb_tx_queues`, not whether `priv->txqs` is allocated. The driver-side check handles this, so this is actually fine — noting for completeness.

### Patch 01/10 — `doc/nics/mlx5: fix stale packet pacing documentation`

**Warning: Missing `Cc: stable@dpdk.org`**

The patch has a `Fixes:` tag referencing a previous commit but is missing `Cc: stable@dpdk.org` for backport consideration, even though it's a documentation-only fix.

### Patches 02, 03, 04, 06, 08, 10

No correctness issues found. The code is well-structured with proper error handling, resource cleanup on failure paths, and appropriate validation.

### General observations

The series is well organized with clean separation of concerns across patches. Error paths in `mlx5_set_queue_rate_limit()` properly clean up the temporary `new_rate_limit` PP allocation on modify-SQ failure, and `mlx5_txq_release()` correctly frees the rate limit context on queue teardown. The `rte_errno = -ret` pattern after `mlx5_devx_cmd_modify_sq()` is consistent with how other mlx5 callers handle that function's negative errno return convention.

  parent reply	other threads:[~2026-03-23 23:10 UTC|newest]

Thread overview: 87+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-03-10  9:20 [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-11 12:26   ` Slava Ovsiienko
2026-03-10  9:20 ` [PATCH v1 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 9/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
2026-03-10  9:20 ` [PATCH v1 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-10 14:20 ` [PATCH v1 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-10 23:26 ` [PATCH v2 " Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-11 16:29     ` Stephen Hemminger
2026-03-10 23:26   ` [PATCH v2 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-11 16:31     ` Stephen Hemminger
2026-03-10 23:26   ` [PATCH v2 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-11 16:17     ` Stephen Hemminger
2026-03-11 16:26     ` Stephen Hemminger
2026-03-12 15:54       ` Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 09/10] net/mlx5: share pacing rate table entries across queues Vincent Jardin
2026-03-10 23:26   ` [PATCH v2 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-11 16:31     ` Stephen Hemminger
2026-03-11 16:35     ` Stephen Hemminger
2026-03-12 15:05       ` Vincent Jardin
2026-03-12 16:01         ` Stephen Hemminger
2026-03-12 22:01 ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 01/9] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 02/9] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-20 12:02     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 03/9] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-20 12:01     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 04/9] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-20 12:51     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 05/9] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-20 15:11     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 06/9] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-20 15:19     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 07/9] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-20 15:38     ` Slava Ovsiienko
2026-03-22 14:02       ` Vincent Jardin
2026-03-12 22:01   ` [PATCH v3 08/9] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-20 15:44     ` Slava Ovsiienko
2026-03-12 22:01   ` [PATCH v3 09/9] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-20 15:49     ` Slava Ovsiienko
2026-03-16 16:04   ` [PATCH v3 00/9] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger
2026-03-22 14:16     ` Vincent Jardin
2026-03-22 13:46   ` [PATCH v4 00/10] " Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-22 13:46     ` [PATCH v4 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-23 12:59       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-23 13:00       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-23 13:17       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-23 13:18       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-23 13:19       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-23 13:19       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
2026-03-23 13:20       ` Slava Ovsiienko
2026-03-22 13:46     ` [PATCH v4 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-23 13:20       ` Slava Ovsiienko
2026-03-23 23:09     ` Stephen Hemminger [this message]
2026-03-24 16:50     ` [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 01/10] doc/nics/mlx5: fix stale packet pacing documentation Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 02/10] common/mlx5: query packet pacing rate table capabilities Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 03/10] common/mlx5: extend SQ modify to support rate limit update Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 04/10] net/mlx5: add per-queue packet pacing infrastructure Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 05/10] net/mlx5: support per-queue rate limiting Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 06/10] net/mlx5: add burst pacing devargs Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 07/10] net/mlx5: add testpmd command to query per-queue rate limit Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 08/10] ethdev: add getter for per-queue Tx " Vincent Jardin
2026-03-25  2:24         ` Stephen Hemminger
2026-03-24 16:50       ` [PATCH v5 09/10] net/mlx5: implement per-queue Tx rate limit getter Vincent Jardin
2026-03-24 16:50       ` [PATCH v5 10/10] net/mlx5: add rate table capacity query API Vincent Jardin
2026-03-25  2:25       ` [PATCH v5 00/10] net/mlx5: per-queue Tx rate limiting via packet pacing Stephen Hemminger

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260323160955.5379adfe@phoenix.local \
    --to=stephen@networkplumber.org \
    --cc=aman.deep.singh@intel.com \
    --cc=andrew.rybchenko@oktetlabs.ru \
    --cc=bingz@nvidia.com \
    --cc=dev@dpdk.org \
    --cc=dsosnowski@nvidia.com \
    --cc=matan@nvidia.com \
    --cc=orika@nvidia.com \
    --cc=rasland@nvidia.com \
    --cc=suanmingm@nvidia.com \
    --cc=thomas@monjalon.net \
    --cc=viacheslavo@nvidia.com \
    --cc=vjardin@free.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox