* [PATCH net 0/4] net/mlx5e: Fix crashes in dynamic per-channel stats and HV VHCA agent
@ 2026-06-04 13:50 Tariq Toukan
2026-06-04 13:50 ` [PATCH net 1/4] net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation Tariq Toukan
` (3 more replies)
0 siblings, 4 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-06-04 13:50 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Eran Ben Elisha, Feng Liu, Cosmin Ratiu, Gal Pressman,
Simon Horman, Alexei Lazar, Nimrod Oren, Carolina Jubran,
Kees Cook, Lama Kayal, Eran Ben Elisha, Saeed Mahameed,
Haiyang Zhang, Joe Damato, netdev, linux-rdma, linux-kernel
Hi,
Since per-channel stats were converted to be allocated and published
lazily at first channel open in commit fa691d0c9c08 ("net/mlx5e: Allocate
per-channel stats dynamically at first usage"), priv->channel_stats[]
and priv->stats_nch are filled in incrementally during interface
bring-up. This opened a window in which the various stats readers - most
of them reachable from userspace via netlink/netdev stats queries - can
race with mlx5e_open_channel() on another CPU and observe partially
initialized state. The HV VHCA stats agent, which is created before the
channels are opened, hits related problems of its own.
This series by Feng collects the resulting crashes and fixes them.
Regards,
Tariq
Feng Liu (4):
net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation
net/mlx5e: Fix HV VHCA stats agent registration race
net/mlx5e: Bounds-check stats_nch in mlx5e_get_queue_stats_rx()
net/mlx5e: Fix publication race for priv->channel_stats[]
drivers/net/ethernet/mellanox/mlx5/core/en.h | 12 ++++++
.../mellanox/mlx5/core/en/hv_vhca_stats.c | 38 +++++++++++++------
.../net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++++---
.../ethernet/mellanox/mlx5/core/en_stats.c | 9 +++--
.../ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 +-
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 8 +++-
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 6 ++-
7 files changed, 64 insertions(+), 27 deletions(-)
base-commit: c05fa14db43ebef3bd862ca9d073981c0358b3f0
--
2.44.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH net 1/4] net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation
2026-06-04 13:50 [PATCH net 0/4] net/mlx5e: Fix crashes in dynamic per-channel stats and HV VHCA agent Tariq Toukan
@ 2026-06-04 13:50 ` Tariq Toukan
2026-06-04 13:50 ` [PATCH net 2/4] net/mlx5e: Fix HV VHCA stats agent registration race Tariq Toukan
` (2 subsequent siblings)
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-06-04 13:50 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Eran Ben Elisha, Feng Liu, Cosmin Ratiu, Gal Pressman,
Simon Horman, Alexei Lazar, Nimrod Oren, Carolina Jubran,
Kees Cook, Lama Kayal, Eran Ben Elisha, Saeed Mahameed,
Haiyang Zhang, Joe Damato, netdev, linux-rdma, linux-kernel
From: Feng Liu <feliu@nvidia.com>
mlx5e_hv_vhca_stats_create() is called from mlx5e_nic_enable(),
before mlx5e_open(). At that point priv->stats_nch is still zero,
because it is only ever incremented in mlx5e_channel_stats_alloc(),
which is reached only from mlx5e_open_channel().
mlx5e_hv_vhca_stats_buf_size() therefore returns 0, and
kvzalloc(0, GFP_KERNEL) returns ZERO_SIZE_PTR ((void *)16) rather
than NULL. The "if (!buf)" guard does not catch this, and
mlx5e_hv_vhca_stats_create() completes "successfully" with
priv->stats_agent.buf set to ZERO_SIZE_PTR.
Once channels are opened (priv->stats_nch > 0) and the hypervisor
enables stats reporting, mlx5e_hv_vhca_stats_work() recomputes
buf_len using the new non-zero stats_nch and calls
memset(buf, 0, buf_len) on ZERO_SIZE_PTR, faulting at address 0x10.
Allocate the buffer based on priv->max_nch, which is set in
mlx5e_priv_init() and is the upper bound on stats_nch:
- Add a separate helper mlx5e_hv_vhca_stats_buf_max_size() that
returns sizeof(per_ring_stats) * max(max_nch, stats_nch), and
use it for the kvzalloc() in mlx5e_hv_vhca_stats_create().
- Keep mlx5e_hv_vhca_stats_buf_size() (which returns based on
stats_nch) for the worker's active payload size, so the wire
format (block->rings = stats_nch) and the amount of data filled
by mlx5e_hv_vhca_fill_stats() are unchanged.
The max(max_nch, stats_nch) guard handles the rare case where
mlx5e_attach_netdev() recomputes max_nch downward across a
detach/resume cycle while priv->stats_nch persists (mlx5e_detach_netdev
does not call mlx5e_priv_cleanup, so stats_nch is only reset when
the netdev is destroyed). Without the guard, the worker could compute
buf_len from stats_nch and overrun the smaller buffer allocated based
on the reduced max_nch.
This mirrors the existing mlx5e pattern of preallocating arrays of
size max_nch (e.g. priv->channel_stats) and lazily populating
entries up to stats_nch on demand.
Fixes: fa691d0c9c08 ("net/mlx5e: Allocate per-channel stats dynamically at first usage")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c | 8 +++++++-
1 file changed, 7 insertions(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
index 195863b2c013..06cbd49d4e98 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -54,6 +54,12 @@ static int mlx5e_hv_vhca_stats_buf_size(struct mlx5e_priv *priv)
priv->stats_nch);
}
+static int mlx5e_hv_vhca_stats_buf_max_size(struct mlx5e_priv *priv)
+{
+ return (sizeof(struct mlx5e_hv_vhca_per_ring_stats) *
+ max(priv->max_nch, priv->stats_nch));
+}
+
static void mlx5e_hv_vhca_stats_work(struct work_struct *work)
{
struct mlx5e_hv_vhca_stats_agent *sagent;
@@ -122,7 +128,7 @@ static void mlx5e_hv_vhca_stats_cleanup(struct mlx5_hv_vhca_agent *agent)
void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
{
- int buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
+ int buf_len = mlx5e_hv_vhca_stats_buf_max_size(priv);
struct mlx5_hv_vhca_agent *agent;
priv->stats_agent.buf = kvzalloc(buf_len, GFP_KERNEL);
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net 2/4] net/mlx5e: Fix HV VHCA stats agent registration race
2026-06-04 13:50 [PATCH net 0/4] net/mlx5e: Fix crashes in dynamic per-channel stats and HV VHCA agent Tariq Toukan
2026-06-04 13:50 ` [PATCH net 1/4] net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation Tariq Toukan
@ 2026-06-04 13:50 ` Tariq Toukan
2026-06-04 13:50 ` [PATCH net 3/4] net/mlx5e: Bounds-check stats_nch in mlx5e_get_queue_stats_rx() Tariq Toukan
2026-06-04 13:50 ` [PATCH net 4/4] net/mlx5e: Fix publication race for priv->channel_stats[] Tariq Toukan
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-06-04 13:50 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Eran Ben Elisha, Feng Liu, Cosmin Ratiu, Gal Pressman,
Simon Horman, Alexei Lazar, Nimrod Oren, Carolina Jubran,
Kees Cook, Lama Kayal, Eran Ben Elisha, Saeed Mahameed,
Haiyang Zhang, Joe Damato, netdev, linux-rdma, linux-kernel
From: Feng Liu <feliu@nvidia.com>
mlx5e_hv_vhca_stats_create() registers the stats agent through
mlx5_hv_vhca_agent_create(). The helper publishes the agent in
hv_vhca->agents[type] under agents_lock and immediately schedules an
asynchronous control invalidation on the HV VHCA workqueue before
returning to mlx5e.
The asynchronous invalidation invokes the control agent's invalidate
callback, which reads the hypervisor control block and forwards the
command to mlx5e_hv_vhca_stats_control(). That callback may either:
- call cancel_delayed_work_sync(&priv->stats_agent.work), or
- call queue_delayed_work(priv->wq, &sagent->work, sagent->delay).
However, the delayed_work and priv->stats_agent.agent are only
initialized after mlx5_hv_vhca_agent_create() returns to mlx5e:
agent = mlx5_hv_vhca_agent_create(...); /* publish + invalidate */
...
priv->stats_agent.agent = agent; /* too late */
INIT_DELAYED_WORK(&priv->stats_agent.work, ...); /* too late */
If the asynchronous control path runs before the two assignments
above, it can:
- Operate on an uninitialized delayed_work whose timer.function is
NULL. queue_delayed_work() calls add_timer() unconditionally, so
when the timer expires the timer softirq invokes a NULL function
pointer.
- Re-initialize the timer later through INIT_DELAYED_WORK() while
the timer is already enqueued in the timer wheel, corrupting the
hlist (entry.pprev cleared while the previous bucket node still
points at this entry).
- When the worker eventually runs, mlx5e_hv_vhca_stats_work() reads
sagent->agent (NULL) and dereferences it inside
mlx5_hv_vhca_agent_write().
Fix this by:
- Initializing priv->stats_agent.work before invoking
mlx5_hv_vhca_agent_create(), so the work is always in a valid
state when the control callback observes it.
- Adding a struct mlx5_hv_vhca_agent **ctx_update out-parameter
to mlx5_hv_vhca_agent_create(). The helper writes the agent
pointer to *ctx_update before publishing into hv_vhca->agents[]
and triggering the agents_update flow, so any callback
subsequently invoked from that flow already sees a valid
priv->stats_agent.agent. This avoids having the control
callback participate in agent initialization.
While at it, clear priv->stats_agent.{agent,buf} after teardown and
on the agent_create() failure path. Without this, an enable/disable
cycle hitting an early-return in create can lead to a UAF or
double-destroy of stale pointers from the previous cycle.
Fixes: cef35af34d6d ("net/mlx5e: Add mlx5e HV VHCA stats agent")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../mellanox/mlx5/core/en/hv_vhca_stats.c | 22 ++++++++++++-------
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.c | 8 +++++--
.../ethernet/mellanox/mlx5/core/lib/hv_vhca.h | 6 +++--
3 files changed, 24 insertions(+), 12 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
index 06cbd49d4e98..2e495442a547 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -73,7 +73,7 @@ static void mlx5e_hv_vhca_stats_work(struct work_struct *work)
sagent = container_of(dwork, struct mlx5e_hv_vhca_stats_agent, work);
priv = container_of(sagent, struct mlx5e_priv, stats_agent);
buf_len = mlx5e_hv_vhca_stats_buf_size(priv);
- agent = sagent->agent;
+ agent = READ_ONCE(sagent->agent);
buf = sagent->buf;
memset(buf, 0, buf_len);
@@ -135,11 +135,14 @@ void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
if (!priv->stats_agent.buf)
return;
+ INIT_DELAYED_WORK(&priv->stats_agent.work, mlx5e_hv_vhca_stats_work);
+
agent = mlx5_hv_vhca_agent_create(priv->mdev->hv_vhca,
MLX5_HV_VHCA_AGENT_STATS,
mlx5e_hv_vhca_stats_control, NULL,
mlx5e_hv_vhca_stats_cleanup,
- priv);
+ priv,
+ &priv->stats_agent.agent);
if (IS_ERR_OR_NULL(agent)) {
if (IS_ERR(agent))
@@ -148,18 +151,21 @@ void mlx5e_hv_vhca_stats_create(struct mlx5e_priv *priv)
agent);
kvfree(priv->stats_agent.buf);
- return;
+ priv->stats_agent.buf = NULL;
}
-
- priv->stats_agent.agent = agent;
- INIT_DELAYED_WORK(&priv->stats_agent.work, mlx5e_hv_vhca_stats_work);
}
void mlx5e_hv_vhca_stats_destroy(struct mlx5e_priv *priv)
{
- if (IS_ERR_OR_NULL(priv->stats_agent.agent))
+ struct mlx5_hv_vhca_agent *agent;
+
+ agent = READ_ONCE(priv->stats_agent.agent);
+ if (IS_ERR_OR_NULL(agent))
return;
- mlx5_hv_vhca_agent_destroy(priv->stats_agent.agent);
+ mlx5_hv_vhca_agent_destroy(agent);
kvfree(priv->stats_agent.buf);
+
+ WRITE_ONCE(priv->stats_agent.agent, NULL);
+ priv->stats_agent.buf = NULL;
}
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
index d6dc7bce855e..305752dab7bd 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.c
@@ -190,7 +190,7 @@ mlx5_hv_vhca_control_agent_create(struct mlx5_hv_vhca *hv_vhca)
return mlx5_hv_vhca_agent_create(hv_vhca, MLX5_HV_VHCA_AGENT_CONTROL,
NULL,
mlx5_hv_vhca_control_agent_invalidate,
- NULL, NULL);
+ NULL, NULL, NULL);
}
static void mlx5_hv_vhca_control_agent_destroy(struct mlx5_hv_vhca_agent *agent)
@@ -256,7 +256,8 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
void (*invalidate)(struct mlx5_hv_vhca_agent*,
u64 block_mask),
void (*cleaup)(struct mlx5_hv_vhca_agent *agent),
- void *priv)
+ void *priv,
+ struct mlx5_hv_vhca_agent **ctx_update)
{
struct mlx5_hv_vhca_agent *agent;
@@ -284,6 +285,9 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
agent->invalidate = invalidate;
agent->cleanup = cleaup;
+ if (ctx_update)
+ WRITE_ONCE(*ctx_update, agent);
+
mutex_lock(&hv_vhca->agents_lock);
hv_vhca->agents[type] = agent;
mutex_unlock(&hv_vhca->agents_lock);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
index f240ffe5116c..8b3974cf0ee4 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/lib/hv_vhca.h
@@ -43,7 +43,8 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
void (*invalidate)(struct mlx5_hv_vhca_agent*,
u64 block_mask),
void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
- void *context);
+ void *context,
+ struct mlx5_hv_vhca_agent **ctx_update);
void mlx5_hv_vhca_agent_destroy(struct mlx5_hv_vhca_agent *agent);
int mlx5_hv_vhca_agent_write(struct mlx5_hv_vhca_agent *agent,
@@ -84,7 +85,8 @@ mlx5_hv_vhca_agent_create(struct mlx5_hv_vhca *hv_vhca,
void (*invalidate)(struct mlx5_hv_vhca_agent*,
u64 block_mask),
void (*cleanup)(struct mlx5_hv_vhca_agent *agent),
- void *context)
+ void *context,
+ struct mlx5_hv_vhca_agent **ctx_update)
{
return NULL;
}
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net 3/4] net/mlx5e: Bounds-check stats_nch in mlx5e_get_queue_stats_rx()
2026-06-04 13:50 [PATCH net 0/4] net/mlx5e: Fix crashes in dynamic per-channel stats and HV VHCA agent Tariq Toukan
2026-06-04 13:50 ` [PATCH net 1/4] net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation Tariq Toukan
2026-06-04 13:50 ` [PATCH net 2/4] net/mlx5e: Fix HV VHCA stats agent registration race Tariq Toukan
@ 2026-06-04 13:50 ` Tariq Toukan
2026-06-04 13:50 ` [PATCH net 4/4] net/mlx5e: Fix publication race for priv->channel_stats[] Tariq Toukan
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-06-04 13:50 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Eran Ben Elisha, Feng Liu, Cosmin Ratiu, Gal Pressman,
Simon Horman, Alexei Lazar, Nimrod Oren, Carolina Jubran,
Kees Cook, Lama Kayal, Eran Ben Elisha, Saeed Mahameed,
Haiyang Zhang, Joe Damato, netdev, linux-rdma, linux-kernel
From: Feng Liu <feliu@nvidia.com>
mlx5e_get_queue_stats_rx() is invoked by the netdev stats core with
an RX queue index 'i' from real_num_rx_queues. Today it only guards
against priv->stats_nch == 0 and then dereferences
priv->channel_stats[i] unconditionally.
During interface bring-up channel_stats[] is populated incrementally
by mlx5e_channel_stats_alloc(), so a concurrent QSTATS netlink dump
can call into the helper with i >= stats_nch. The non-zero check
passes, channel_stats[i] is NULL, and the dereference panics.
Replace the non-zero check with an upper-bound check against
stats_nch, which subsumes the zero check and prevents the
out-of-bounds dereference.
Fixes: 7b66ae536a78 ("net/mlx5e: Add per queue netdev-genl stats")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 8f2b3abe0092..42a658402592 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -5489,7 +5489,7 @@ static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
struct mlx5e_rq_stats *xskrq_stats;
struct mlx5e_rq_stats *rq_stats;
- if (mlx5e_is_uplink_rep(priv) || !priv->stats_nch)
+ if (mlx5e_is_uplink_rep(priv) || i >= priv->stats_nch)
return;
channel_stats = priv->channel_stats[i];
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH net 4/4] net/mlx5e: Fix publication race for priv->channel_stats[]
2026-06-04 13:50 [PATCH net 0/4] net/mlx5e: Fix crashes in dynamic per-channel stats and HV VHCA agent Tariq Toukan
` (2 preceding siblings ...)
2026-06-04 13:50 ` [PATCH net 3/4] net/mlx5e: Bounds-check stats_nch in mlx5e_get_queue_stats_rx() Tariq Toukan
@ 2026-06-04 13:50 ` Tariq Toukan
3 siblings, 0 replies; 5+ messages in thread
From: Tariq Toukan @ 2026-06-04 13:50 UTC (permalink / raw)
To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
David S. Miller
Cc: Saeed Mahameed, Leon Romanovsky, Tariq Toukan, Mark Bloch,
Eran Ben Elisha, Feng Liu, Cosmin Ratiu, Gal Pressman,
Simon Horman, Alexei Lazar, Nimrod Oren, Carolina Jubran,
Kees Cook, Lama Kayal, Eran Ben Elisha, Saeed Mahameed,
Haiyang Zhang, Joe Damato, netdev, linux-rdma, linux-kernel
From: Feng Liu <feliu@nvidia.com>
mlx5e_channel_stats_alloc() publishes a new entry to
priv->channel_stats[] and then increments priv->stats_nch as a
publication token, but neither store carries any memory barrier:
priv->channel_stats[ix] = kvzalloc_node(...);
if (!priv->channel_stats[ix])
return -ENOMEM;
priv->stats_nch++;
Concurrent readers compute the loop bound from priv->stats_nch and
then dereference priv->channel_stats[i] using plain accesses, e.g.
for (i = 0; i < priv->stats_nch; i++) {
struct mlx5e_channel_stats *cs = priv->channel_stats[i];
... cs->rq.packets ...
}
On weakly-ordered architectures (ARM, PowerPC, RISC-V) the writes to
channel_stats[ix] and stats_nch may become visible to other CPUs out
of program order. A reader can observe stats_nch == N while still
seeing channel_stats[N-1] == NULL, leading to a NULL pointer
dereference in the channel_stats loop.
This has been observed in production on BlueField-3 DPUs (arm64),
where ovs-vswitchd queries netdev statistics over netlink during NIC
bringup, racing mlx5e_open_channel() -> mlx5e_channel_stats_alloc()
on another CPU:
Unable to handle kernel NULL pointer dereference at virtual address 0x840
Hardware name: BlueField-3 DPU
pc : mlx5e_fold_sw_stats64+0x30/0x180 [mlx5_core]
Call trace:
mlx5e_fold_sw_stats64+0x30/0x180 [mlx5_core]
dev_get_stats+0x50/0xc0
ovs_vport_get_stats+0x38/0xac [openvswitch]
ovs_vport_cmd_fill_info+0x194/0x290 [openvswitch]
ovs_vport_cmd_get+0xbc/0x10c [openvswitch]
genl_family_rcv_msg_doit+0xd0/0x160
genl_rcv_msg+0xec/0x1f0
netlink_rcv_skb+0x64/0x130
genl_rcv+0x40/0x60
netlink_unicast+0x2fc/0x370
netlink_sendmsg+0x1dc/0x454
...
__arm64_sys_sendmsg+0x2c/0x40
Order the stats_nch increment through smp_store_release() in the
writer, paired with smp_load_acquire() of stats_nch in every reader.
The release/acquire pair establishes the contract:
stats_nch == N => channel_stats[0..N-1] are visible and non-NULL.
Update all readers of priv->stats_nch in mlx5e RX/TX queue stats,
mlx5e_get_base_stats(), ethtool channels stats, IPoIB stats, the
sw_stats fold and the HV VHCA stats agent to use smp_load_acquire().
mlx5e_channel_stats_alloc() (the writer, serialized by state_lock)
and mlx5e_priv_cleanup() (single-owner teardown) are intentionally
not modified.
Fixes: fa691d0c9c08 ("net/mlx5e: Allocate per-channel stats dynamically at first usage")
Signed-off-by: Feng Liu <feliu@nvidia.com>
Reviewed-by: Eran Ben Elisha <eranbe@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Reviewed-by: Nimrod Oren <noren@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
drivers/net/ethernet/mellanox/mlx5/core/en.h | 12 ++++++++++++
.../mellanox/mlx5/core/en/hv_vhca_stats.c | 10 ++++++----
drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 15 +++++++++------
.../net/ethernet/mellanox/mlx5/core/en_stats.c | 9 +++++----
.../net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c | 3 ++-
5 files changed, 34 insertions(+), 15 deletions(-)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en.h b/drivers/net/ethernet/mellanox/mlx5/core/en.h
index 2270e2e550dd..d507289096c2 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en.h
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en.h
@@ -987,6 +987,18 @@ struct mlx5e_priv {
struct ethtool_fec_hist_range *fec_ranges;
};
+static inline u16 mlx5e_stats_nch_read(const struct mlx5e_priv *priv)
+{
+ /* Pairs with smp_store_release in mlx5e_stats_nch_write(). */
+ return smp_load_acquire(&priv->stats_nch);
+}
+
+static inline void mlx5e_stats_nch_write(struct mlx5e_priv *priv, u16 n)
+{
+ /* Pairs with smp_load_acquire in mlx5e_stats_nch_read(). */
+ smp_store_release(&priv->stats_nch, n);
+}
+
struct mlx5e_dev {
struct net_device *netdev;
struct devlink_port dl_port;
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
index 2e495442a547..9747d7736d37 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en/hv_vhca_stats.c
@@ -33,9 +33,10 @@ mlx5e_hv_vhca_fill_ring_stats(struct mlx5e_priv *priv, int ch,
static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, void *data,
int buf_len)
{
+ u16 nch = mlx5e_stats_nch_read(priv);
int ch, i = 0;
- for (ch = 0; ch < priv->stats_nch; ch++) {
+ for (ch = 0; ch < nch; ch++) {
void *buf = data + i;
if (WARN_ON_ONCE(buf +
@@ -50,8 +51,9 @@ static void mlx5e_hv_vhca_fill_stats(struct mlx5e_priv *priv, void *data,
static int mlx5e_hv_vhca_stats_buf_size(struct mlx5e_priv *priv)
{
- return (sizeof(struct mlx5e_hv_vhca_per_ring_stats) *
- priv->stats_nch);
+ u16 nch = mlx5e_stats_nch_read(priv);
+
+ return sizeof(struct mlx5e_hv_vhca_per_ring_stats) * nch;
}
static int mlx5e_hv_vhca_stats_buf_max_size(struct mlx5e_priv *priv)
@@ -106,7 +108,7 @@ static void mlx5e_hv_vhca_stats_control(struct mlx5_hv_vhca_agent *agent,
sagent = &priv->stats_agent;
block->version = MLX5_HV_VHCA_STATS_VERSION;
- block->rings = priv->stats_nch;
+ block->rings = mlx5e_stats_nch_read(priv);
if (!block->command) {
cancel_delayed_work_sync(&priv->stats_agent.work);
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index 42a658402592..42ca7cb0eac1 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -2773,7 +2773,7 @@ static int mlx5e_channel_stats_alloc(struct mlx5e_priv *priv, int ix, int cpu)
GFP_KERNEL, cpu_to_node(cpu));
if (!priv->channel_stats[ix])
return -ENOMEM;
- priv->stats_nch++;
+ mlx5e_stats_nch_write(priv, priv->stats_nch + 1);
return 0;
}
@@ -4043,9 +4043,10 @@ static int mlx5e_setup_tc(struct net_device *dev, enum tc_setup_type type,
void mlx5e_fold_sw_stats64(struct mlx5e_priv *priv, struct rtnl_link_stats64 *s)
{
+ u16 nch = mlx5e_stats_nch_read(priv);
int i;
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
struct mlx5e_rq_stats *xskrq_stats = &channel_stats->xskrq;
struct mlx5e_rq_stats *rq_stats = &channel_stats->rq;
@@ -5486,10 +5487,11 @@ static void mlx5e_get_queue_stats_rx(struct net_device *dev, int i,
{
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5e_channel_stats *channel_stats;
+ u16 nch = mlx5e_stats_nch_read(priv);
struct mlx5e_rq_stats *xskrq_stats;
struct mlx5e_rq_stats *rq_stats;
- if (mlx5e_is_uplink_rep(priv) || i >= priv->stats_nch)
+ if (mlx5e_is_uplink_rep(priv) || i >= nch)
return;
channel_stats = priv->channel_stats[i];
@@ -5508,7 +5510,7 @@ static void mlx5e_get_queue_stats_tx(struct net_device *dev, int i,
struct mlx5e_priv *priv = netdev_priv(dev);
struct mlx5e_sq_stats *sq_stats;
- if (!priv->stats_nch)
+ if (!mlx5e_stats_nch_read(priv))
return;
/* no special case needed for ptp htb etc since txq2sq_stats is kept up
@@ -5525,6 +5527,7 @@ static void mlx5e_get_base_stats(struct net_device *dev,
struct netdev_queue_stats_tx *tx)
{
struct mlx5e_priv *priv = netdev_priv(dev);
+ u16 nch = mlx5e_stats_nch_read(priv);
struct mlx5e_ptp *ptp_channel;
int i, tc;
@@ -5533,7 +5536,7 @@ static void mlx5e_get_base_stats(struct net_device *dev,
rx->bytes = 0;
rx->alloc_fail = 0;
- for (i = priv->channels.params.num_channels; i < priv->stats_nch; i++) {
+ for (i = priv->channels.params.num_channels; i < nch; i++) {
struct netdev_queue_stats_rx rx_i = {0};
mlx5e_get_queue_stats_rx(dev, i, &rx_i);
@@ -5558,7 +5561,7 @@ static void mlx5e_get_base_stats(struct net_device *dev,
tx->packets = 0;
tx->bytes = 0;
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats = priv->channel_stats[i];
/* handle two cases:
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
index 1a3ecf073913..8632b73179cb 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_stats.c
@@ -516,6 +516,7 @@ static void mlx5e_stats_update_stats_rq_page_pool(struct mlx5e_channel *c)
static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
{
struct mlx5e_sw_stats *s = &priv->stats.sw;
+ u16 nch = mlx5e_stats_nch_read(priv);
int i;
memset(s, 0, sizeof(*s));
@@ -523,7 +524,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(sw)
for (i = 0; i < priv->channels.num; i++) /* for active channels only */
mlx5e_stats_update_stats_rq_page_pool(priv->channels.c[i]);
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats =
priv->channel_stats[i];
@@ -2615,7 +2616,7 @@ static MLX5E_DECLARE_STATS_GRP_OP_UPDATE_STATS(ptp) { return; }
static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
{
- int max_nch = priv->stats_nch;
+ int max_nch = mlx5e_stats_nch_read(priv);
return (NUM_RQ_STATS * max_nch) +
(NUM_CH_STATS * max_nch) +
@@ -2628,8 +2629,8 @@ static MLX5E_DECLARE_STATS_GRP_OP_NUM_STATS(channels)
static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
{
+ int max_nch = mlx5e_stats_nch_read(priv);
bool is_xsk = priv->xsk.ever_used;
- int max_nch = priv->stats_nch;
int i, j, tc;
for (i = 0; i < max_nch; i++)
@@ -2661,8 +2662,8 @@ static MLX5E_DECLARE_STATS_GRP_OP_FILL_STRS(channels)
static MLX5E_DECLARE_STATS_GRP_OP_FILL_STATS(channels)
{
+ int max_nch = mlx5e_stats_nch_read(priv);
bool is_xsk = priv->xsk.ever_used;
- int max_nch = priv->stats_nch;
int i, j, tc;
for (i = 0; i < max_nch; i++)
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
index 0a6003fe60e9..674bed721e63 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/ipoib/ipoib.c
@@ -135,10 +135,11 @@ void mlx5i_cleanup(struct mlx5e_priv *priv)
static void mlx5i_grp_sw_update_stats(struct mlx5e_priv *priv)
{
+ u16 nch = mlx5e_stats_nch_read(priv);
struct rtnl_link_stats64 s = {};
int i, j;
- for (i = 0; i < priv->stats_nch; i++) {
+ for (i = 0; i < nch; i++) {
struct mlx5e_channel_stats *channel_stats;
struct mlx5e_rq_stats *rq_stats;
--
2.44.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
end of thread, other threads:[~2026-06-04 13:52 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-04 13:50 [PATCH net 0/4] net/mlx5e: Fix crashes in dynamic per-channel stats and HV VHCA agent Tariq Toukan
2026-06-04 13:50 ` [PATCH net 1/4] net/mlx5e: Fix HV VHCA stats zero-sized buffer allocation Tariq Toukan
2026-06-04 13:50 ` [PATCH net 2/4] net/mlx5e: Fix HV VHCA stats agent registration race Tariq Toukan
2026-06-04 13:50 ` [PATCH net 3/4] net/mlx5e: Bounds-check stats_nch in mlx5e_get_queue_stats_rx() Tariq Toukan
2026-06-04 13:50 ` [PATCH net 4/4] net/mlx5e: Fix publication race for priv->channel_stats[] Tariq Toukan
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox