From: Linhu Li <lilinhu618@gmail.com>
To: dev@dpdk.org
Cc: stable@dpdk.org, dsosnowski@nvidia.com, Linhu Li <lilinhu618@gmail.com>
Subject: [PATCH v3] net/mlx5: fix counter TAILQ race between free and query callback
Date: Wed, 10 Jun 2026 14:34:25 +0800 [thread overview]
Message-ID: <20260610063425.73808-1-lilinhu618@gmail.com> (raw)
In-Reply-To: <20260604101112.72177-1-lilinhu618@gmail.com>
flow_dv_counter_free() inserts counters into
pool->counters[pool->query_gen] under pool->csl. Meanwhile,
mlx5_flow_async_pool_query_handle() moves counters from
pool->counters[query_gen ^ 1] to the global free list via
TAILQ_CONCAT while holding only cmng->csl, not pool->csl.
The comment in flow_dv_counter_free() claims the lock is not needed
because the query callback and the release function operate on different lists.
That holds only if the free path always observes the up-to-date query_gen. It
can be violated:
1. A counter free thread (non-PMD, e.g. OVS offload thread) reads
pool->query_gen == 0 and is about to insert into counters[0].
2. The free thread is preempted by the OS scheduler; it is a regular
pthread, not pinned to a core.
3. The eal-intr-thread alarm fires: query_gen++ (now 1) and the async
query is sent.
4. Hardware completes the query and the callback runs TAILQ_CONCAT on
counters[0] (= query_gen ^ 1).
5. The free thread resumes and runs TAILQ_INSERT_TAIL on counters[0]
concurrently with step 4 on another core.
Because the two paths take different locks, TAILQ_INSERT_TAIL and
TAILQ_CONCAT run concurrently on the same list with no synchronization and
corrupt it: the pool-local list ends up with a NULL head but a dangling
tqh_last, and the global free list tail no longer points to the real tail. The just-
freed counter and every counter inserted afterwards become unreachable
and are leaked.
Non-PMD threads can be preempted for hundreds of microseconds under
CPU pressure, which is well within the async query round-trip time, so the
window is reachable in practice.
Fix it by taking pool->csl in the query completion callback before operating on
pool->counters[query_gen], serializing the CONCAT with any concurrent
INSERT. The lock is taken once per pool per query completion in the eal-intr-
thread context, not on the datapath, so the cost is negligible. Lock order is
pool->csl then cmng->csl, matching all other sites.
Also handle the error path: previously the counters accumulated in
pool->counters[query_gen] were abandoned when a query failed. Move
them back to the global free list to avoid a leak on persistent query failures.
Fixes: ac79183dc6f7 ("net/mlx5: optimize free counter lookup")
Cc: stable@dpdk.org
Signed-off-by: Linhu Li <lilinhu618@gmail.com>
Acked-by: Dariusz Sosnowski <dsosnowski@nvidia.com>
---
doc/guides/rel_notes/release_26_07.rst | 21 +++++++++++++++++
drivers/net/mlx5/mlx5_flow.c | 31 ++++++++++++++++++++++++++
2 files changed, 52 insertions(+)
diff --git a/doc/guides/rel_notes/release_26_07.rst b/doc/guides/rel_notes/release_26_07.rst
index b8a3e2ced9..30a9564884 100644
--- a/doc/guides/rel_notes/release_26_07.rst
+++ b/doc/guides/rel_notes/release_26_07.rst
@@ -153,6 +153,27 @@ ABI Changes
* No ABI change that would break compatibility with 25.11.
+Fixed Issues
+------------
+
+.. This section should contain fixed issues in this release. Sample format:
+
+ * **Add a title in the past tense with a full stop.**
+
+ Add a short 1-2 sentence description of the fix in the past tense.
+
+ This section is a comment. Do not overwrite or remove it.
+ Also, make sure to start the actual text at the margin.
+ =======================================================
+
+* **net/mlx5: Fixed counter TAILQ race between free and query callback.**
+
+ Fixed a race condition where concurrent counter free operations and async
+ query completions could corrupt the counter free list, causing counter leaks.
+ The issue occurred when non-PMD threads were preempted between reading
+ ``query_gen`` and inserting into the counter list.
+
+
Known Issues
------------
diff --git a/drivers/net/mlx5/mlx5_flow.c b/drivers/net/mlx5/mlx5_flow.c
index 915ea29a5a..2f785d58ec 100644
--- a/drivers/net/mlx5/mlx5_flow.c
+++ b/drivers/net/mlx5/mlx5_flow.c
@@ -9893,6 +9893,13 @@ void
mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
uint64_t async_id, int status)
{
+ /*
+ * Handle async counter pool query completion.
+ * query_gen is flipped each round: freed counters go into [query_gen],
+ * while this callback moves [query_gen ^ 1] to the global free list.
+ * pool->csl must be held when operating on pool->counters[] to serialize
+ * with concurrent free-path insertions.
+ */
struct mlx5_flow_counter_pool *pool =
(struct mlx5_flow_counter_pool *)(uintptr_t)async_id;
struct mlx5_counter_stats_raw *raw_to_free;
@@ -9904,6 +9911,21 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
if (unlikely(status)) {
raw_to_free = pool->raw_hw;
+ /*
+ * The query failed, so the freed counters accumulated
+ * in the old-gen list would otherwise be stranded.
+ * Move them back to the global free list. This is safe
+ * for both transient and persistent failures: the
+ * counters are still valid and can be reused.
+ */
+ if (!TAILQ_EMPTY(&pool->counters[query_gen])) {
+ rte_spinlock_lock(&pool->csl);
+ rte_spinlock_lock(&cmng->csl[cnt_type]);
+ TAILQ_CONCAT(&cmng->counters[cnt_type],
+ &pool->counters[query_gen], next);
+ rte_spinlock_unlock(&cmng->csl[cnt_type]);
+ rte_spinlock_unlock(&pool->csl);
+ }
} else {
raw_to_free = pool->raw;
if (pool->is_aged)
@@ -9913,11 +9935,20 @@ mlx5_flow_async_pool_query_handle(struct mlx5_dev_ctx_shared *sh,
rte_spinlock_unlock(&pool->sl);
/* Be sure the new raw counters data is updated in memory. */
rte_io_wmb();
+ /*
+ * A counter free thread may have read a stale query_gen
+ * before the generation was flipped and could still be
+ * inserting into this same old-gen list. Hold pool->csl to
+ * serialize TAILQ_CONCAT with that TAILQ_INSERT_TAIL and
+ * avoid corrupting the list.
+ */
if (!TAILQ_EMPTY(&pool->counters[query_gen])) {
+ rte_spinlock_lock(&pool->csl);
rte_spinlock_lock(&cmng->csl[cnt_type]);
TAILQ_CONCAT(&cmng->counters[cnt_type],
&pool->counters[query_gen], next);
rte_spinlock_unlock(&cmng->csl[cnt_type]);
+ rte_spinlock_unlock(&pool->csl);
}
}
LIST_INSERT_HEAD(&sh->sws_cmng.free_stat_raws, raw_to_free, next);
--
2.39.3 (Apple Git-146)
next prev parent reply other threads:[~2026-06-11 7:25 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-04 10:11 [PATCH] net/mlx5: fix counter TAILQ race between free and query callback Laaahu
2026-06-08 12:41 ` Dariusz Sosnowski
2026-06-08 13:25 ` [PATCH v2] " Linhu Li
2026-06-08 14:11 ` Dariusz Sosnowski
2026-06-09 9:22 ` Dariusz Sosnowski
2026-06-10 6:34 ` Linhu Li [this message]
2026-06-11 7:51 ` [PATCH v4] " Linhu Li
2026-06-16 8:03 ` [PATCH v5] " Linhu Li
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260610063425.73808-1-lilinhu618@gmail.com \
--to=lilinhu618@gmail.com \
--cc=dev@dpdk.org \
--cc=dsosnowski@nvidia.com \
--cc=stable@dpdk.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox