Netdev List
 help / color / mirror / Atom feed
* [PATCH net] net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown
@ 2026-06-11 13:48 Tariq Toukan
  2026-06-11 16:01 ` Alexander Lobakin
  2026-06-13  1:42 ` Jakub Kicinski
  0 siblings, 2 replies; 3+ messages in thread
From: Tariq Toukan @ 2026-06-11 13:48 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller
  Cc: Saeed Mahameed, Tariq Toukan, Mark Bloch, Leon Romanovsky,
	Vlad Buslov, Paul Blakey, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Lama Kayal, Cosmin Ratiu

From: Lama Kayal <lkayal@nvidia.com>

Restore-rule creation stores ERR_PTR(errno) in act_id_restore_rule
on failure.  Teardown still called mlx5_del_flow_rules() with that
value, which dereferenced it like a real mlx5_flow_handle and could
crash.

Clear act_id_restore_rule to NULL in the error branch after
esw_add_restore_rule() fails so teardown only sees NULL or a valid
handle.

Call Trace:
 ? page_fault+0x1e/0x30
 ? mlx5_del_flow_rules+0x12/0x140 [mlx5_core]
 mlx5e_tc_action_miss_mapping_put+0x49/0x50 [mlx5_core]
 mlx5_tc_ct_delete_flow+0x4d/0x70 [mlx5_core]
 mlx5_free_flow_attr_actions+0xd2/0x160 [mlx5_core]
 mlx5e_tc_del_fdb_flow+0x15d/0x210 [mlx5_core]
 mlx5e_flow_put+0x23/0x40 [mlx5_core]
 __mlx5e_add_fdb_flow+0xf3/0x430 [mlx5_core]
 mlx5e_tc_add_flow+0x2ab/0x9c0 [mlx5_core]
 mlx5e_configure_flower+0x2f4/0x620 [mlx5_core]
 tc_setup_cb_add+0xca/0x1e0
 fl_hw_replace_filter+0x143/0x1e0 [cls_flower]
 [...]

Fixes: dfa1e46d6093 ("net/mlx5e: TC, Fix using eswitch mapping in nic mode")
Signed-off-by: Lama Kayal <lkayal@nvidia.com>
Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_tc.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
index a9001d1c902f..4c135858f297 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
@@ -5863,6 +5863,7 @@ int mlx5e_tc_action_miss_mapping_get(struct mlx5e_priv *priv, struct mlx5_flow_a
 	attr->act_id_restore_rule = esw_add_restore_rule(esw, *act_miss_mapping);
 	if (IS_ERR(attr->act_id_restore_rule)) {
 		err = PTR_ERR(attr->act_id_restore_rule);
+		attr->act_id_restore_rule = NULL;
 		goto err_rule;
 	}
 

base-commit: 0068940907d33217ae01217f84910a5cde606c17
-- 
2.44.0


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown
  2026-06-11 13:48 [PATCH net] net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown Tariq Toukan
@ 2026-06-11 16:01 ` Alexander Lobakin
  2026-06-13  1:42 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Alexander Lobakin @ 2026-06-11 16:01 UTC (permalink / raw)
  To: Tariq Toukan
  Cc: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller, Saeed Mahameed, Mark Bloch, Leon Romanovsky,
	Vlad Buslov, Paul Blakey, netdev, linux-rdma, linux-kernel,
	Gal Pressman, Lama Kayal, Cosmin Ratiu

From: Tariq Toukan <tariqt@nvidia.com>
Date: Thu, 11 Jun 2026 16:48:36 +0300

> From: Lama Kayal <lkayal@nvidia.com>
> 
> Restore-rule creation stores ERR_PTR(errno) in act_id_restore_rule
> on failure.  Teardown still called mlx5_del_flow_rules() with that
> value, which dereferenced it like a real mlx5_flow_handle and could
> crash.
> 
> Clear act_id_restore_rule to NULL in the error branch after
> esw_add_restore_rule() fails so teardown only sees NULL or a valid
> handle.
> 
> Call Trace:
>  ? page_fault+0x1e/0x30
>  ? mlx5_del_flow_rules+0x12/0x140 [mlx5_core]
>  mlx5e_tc_action_miss_mapping_put+0x49/0x50 [mlx5_core]
>  mlx5_tc_ct_delete_flow+0x4d/0x70 [mlx5_core]
>  mlx5_free_flow_attr_actions+0xd2/0x160 [mlx5_core]
>  mlx5e_tc_del_fdb_flow+0x15d/0x210 [mlx5_core]
>  mlx5e_flow_put+0x23/0x40 [mlx5_core]
>  __mlx5e_add_fdb_flow+0xf3/0x430 [mlx5_core]
>  mlx5e_tc_add_flow+0x2ab/0x9c0 [mlx5_core]
>  mlx5e_configure_flower+0x2f4/0x620 [mlx5_core]
>  tc_setup_cb_add+0xca/0x1e0
>  fl_hw_replace_filter+0x143/0x1e0 [cls_flower]
>  [...]
> 
> Fixes: dfa1e46d6093 ("net/mlx5e: TC, Fix using eswitch mapping in nic mode")
> Signed-off-by: Lama Kayal <lkayal@nvidia.com>
> Reviewed-by: Cosmin Ratiu <cratiu@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>

Reviewed-by: Alexander Lobakin <aleksander.lobakin@intel.com>

Thanks,
Olek

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH net] net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown
  2026-06-11 13:48 [PATCH net] net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown Tariq Toukan
  2026-06-11 16:01 ` Alexander Lobakin
@ 2026-06-13  1:42 ` Jakub Kicinski
  1 sibling, 0 replies; 3+ messages in thread
From: Jakub Kicinski @ 2026-06-13  1:42 UTC (permalink / raw)
  To: tariqt
  Cc: Jakub Kicinski, edumazet, pabeni, andrew+netdev, davem, saeedm,
	mbloch, leon, vladbu, paulb, netdev, linux-rdma, linux-kernel,
	gal, lkayal, cratiu

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown

This change clears attr->act_id_restore_rule to NULL after
esw_add_restore_rule() fails, so the teardown path no longer
dereferences an ERR_PTR cookie as if it were a real mlx5_flow_handle.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index a9001d1c902f..4c135858f297 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -5863,6 +5863,7 @@ int mlx5e_tc_action_miss_mapping_get(struct mlx5e_priv *priv, struct mlx5_flow_a
>  	attr->act_id_restore_rule = esw_add_restore_rule(esw, *act_miss_mapping);
>  	if (IS_ERR(attr->act_id_restore_rule)) {
>  		err = PTR_ERR(attr->act_id_restore_rule);
> +		attr->act_id_restore_rule = NULL;
>  		goto err_rule;
>  	}

[High]
The commit message states that after this change "teardown only sees
NULL or a valid handle", framing NULL as benign.  Is that invariant
actually enforced on the consumer side?

Looking at the teardown helper:

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:mlx5e_tc_action_miss_mapping_put() {
    struct mapping_ctx *ctx = mlx5e_get_priv_obj_mapping(priv);

    if (is_mdev_switchdev_mode(priv->mdev))
        mlx5_del_flow_rules(attr->act_id_restore_rule);
    mapping_remove(ctx, act_miss_mapping);
}

There is no NULL check before mlx5_del_flow_rules(), and
mlx5_del_flow_rules() itself dereferences handle->rule[0] without a
NULL check:

drivers/net/ethernet/mellanox/mlx5/core/fs_core.c:mlx5_del_flow_rules() {
    struct fs_fte *fte;
    int i;
    ...
    fs_get_obj(fte, handle->rule[0]->node.parent);
    down_write_ref_node(&fte->node, false);
    ...
}

So on any path where mlx5e_tc_action_miss_mapping_put() is reached with
attr->act_id_restore_rule == NULL in switchdev mode, this patch
converts the original ERR_PTR dereference into a NULL dereference at
the same site rather than avoiding the crash.

Today the put is gated by attr->ct_attr.offloaded in
mlx5_tc_ct_delete_flow(), and __mlx5_tc_ct_flow_offload() leaves
offloaded clear when mlx5e_tc_action_miss_mapping_get() fails, so the
specific oops in the call trace should no longer be reachable.  But
should the put be made symmetric with the new "NULL or valid" producer
contract, e.g.:

    if (is_mdev_switchdev_mode(priv->mdev) && attr->act_id_restore_rule)
        mlx5_del_flow_rules(attr->act_id_restore_rule);

so that any future caller that relies on the documented invariant
(including mlx5e_clone_flow_attr_for_post_act() in en_tc.c, which
already inherits this field) does not reintroduce the same oops with a
slightly different fault address?
-- 
pw-bot: cr

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-06-13  1:42 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-06-11 13:48 [PATCH net] net/mlx5e: Fix oops from ERR_PTR in act-miss restore teardown Tariq Toukan
2026-06-11 16:01 ` Alexander Lobakin
2026-06-13  1:42 ` Jakub Kicinski

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox