All of lore.kernel.org
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: tariqt@nvidia.com
Cc: Simon Horman <horms@kernel.org>,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	andrew+netdev@lunn.ch, davem@davemloft.net, saeedm@nvidia.com,
	mbloch@nvidia.com, leon@kernel.org, shayd@nvidia.com,
	ohartoov@nvidia.com, edwards@nvidia.com, msanalla@nvidia.com,
	phaddad@nvidia.com, parav@nvidia.com, gbayer@linux.ibm.com,
	kees@kernel.org, moshe@nvidia.com, rongweil@nvidia.com,
	jacob.e.keller@intel.com, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	gal@nvidia.com
Subject: Re: [PATCH net 3/3] net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable
Date: Thu, 18 Jun 2026 13:48:22 +0100	[thread overview]
Message-ID: <20260618124820.890808-3-horms@kernel.org> (raw)
In-Reply-To: <20260617063204.547427-4-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
Full review at: https://sashiko.dev

Simon says: I think the first two issues flagged below warrant
            consideration within the context of this patch.

	    But the third and final issue flagged below as pre-existing
	    seems to be something to consider in the context of
	    possible follow-up.
---
net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable

This commit addresses a bad memory access crash by skipping peer flows when
mlx5_lag_get_dev_seq() returns an error. This occurs when the peer isn't in
the LAG or when no device is marked as master.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index a9001d1c902ff..c6e6534a5e23b 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -2155,6 +2155,9 @@ static void mlx5e_tc_del_fdb_peers_flow(struct mlx5e_tc_flow *flow)
>  	devcom = flow->priv->mdev->priv.eswitch->devcom;
>  	mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) {
>  		i = mlx5_lag_get_dev_seq(peer_esw->dev);
> +		if (i < 0)
> +			continue;
> +

[Severity: Critical]
Does skipping the remainder of this loop prevent the flow from being
unlinked from the doubly-linked list?

By using continue here in mlx5e_tc_del_fdb_peers_flow(), we skip calling
mlx5e_tc_del_fdb_peer_flow(), which means flow->peer[i] is never
unlinked from the esw->offloads.peer_flows[original_i] list.

If the flow is subsequently freed via mlx5e_flow_put(), will this leave a
freed list node resident in the active list and guarantee a use-after-free
list corruption during the next list traversal?

>  		mlx5e_tc_del_fdb_peer_flow(flow, i);
>  	}
>  }

[ ... ]

> @@ -5526,6 +5529,9 @@ void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw)
>  
>  	mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) {
>  		i = mlx5_lag_get_dev_seq(peer_esw->dev);
> +		if (i < 0)
> +			continue;
> +

[Severity: Critical]
Similarly to the above, does this skip unlinking during flow cleanup when
the LAG sequence is unavailable?

>  		list_for_each_entry_safe(flow, tmp, &esw->offloads.peer_flows[i], peer[i])
>  			mlx5e_tc_del_fdb_peers_flow(flow);
>  	}


[Severity: Critical]
This isn't a bug introduced by this patch, but is there a similar missing
bounds check during flow addition in mlx5e_tc_add_fdb_peer_flow() in this
same file?

If a flow is added concurrently with LAG teardown, mlx5_lag_get_dev_seq()
can return -ENOENT (-2):

drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:mlx5e_tc_add_fdb_peer_flow() {
    ...
    int i = mlx5_lag_get_dev_seq(peer_esw->dev);
    ...
    list_add_tail(&flow->peer[i], &esw->offloads.peer_flows[i]);
    ...
}

Does this result in an out-of-bounds write to a negative array index?

      reply	other threads:[~2026-06-18 12:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-17  6:32 [PATCH net 0/3] net/mlx5: LAG bug fixes Tariq Toukan
2026-06-17  6:32 ` [PATCH net 1/3] net/mlx5: LAG, Fix off-by-one in single-FDB error rollback Tariq Toukan
2026-06-17  6:32 ` [PATCH net 2/3] net/mlx5: LAG, MPESW, Fix missing complete() on devcom error Tariq Toukan
2026-06-17  6:32 ` [PATCH net 3/3] net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable Tariq Toukan
2026-06-18 12:48   ` Simon Horman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260618124820.890808-3-horms@kernel.org \
    --to=horms@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=gbayer@linux.ibm.com \
    --cc=jacob.e.keller@intel.com \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=rongweil@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.