From: Shay Drori <shayd@nvidia.com>
To: Simon Horman <horms@kernel.org>, <tariqt@nvidia.com>
Cc: <edumazet@google.com>, <kuba@kernel.org>, <pabeni@redhat.com>,
<andrew+netdev@lunn.ch>, <davem@davemloft.net>,
<saeedm@nvidia.com>, <mbloch@nvidia.com>, <leon@kernel.org>,
<ohartoov@nvidia.com>, <edwards@nvidia.com>,
<msanalla@nvidia.com>, <phaddad@nvidia.com>, <parav@nvidia.com>,
<gbayer@linux.ibm.com>, <kees@kernel.org>, <moshe@nvidia.com>,
<rongweil@nvidia.com>, <jacob.e.keller@intel.com>,
<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
<linux-kernel@vger.kernel.org>, <gal@nvidia.com>
Subject: Re: [PATCH net 3/3] net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable
Date: Mon, 22 Jun 2026 09:40:44 +0300 [thread overview]
Message-ID: <e18662ac-413e-43f6-ac65-a4e15fd47bb7@nvidia.com> (raw)
In-Reply-To: <20260618124820.890808-3-horms@kernel.org>
On 18/06/2026 15:48, Simon Horman wrote:
> External email: Use caution opening links or attachments
>
>
> This is an AI-generated review of your patch. The human sending this
> email has considered the AI review valid, or at least plausible.
> Full review at: https://sashiko.dev
>
> Simon says: I think the first two issues flagged below warrant
> consideration within the context of this patch.
>
> But the third and final issue flagged below as pre-existing
> seems to be something to consider in the context of
> possible follow-up.
> ---
> net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable
>
> This commit addresses a bad memory access crash by skipping peer flows when
> mlx5_lag_get_dev_seq() returns an error. This occurs when the peer isn't in
> the LAG or when no device is marked as master.
>
>> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> index a9001d1c902ff..c6e6534a5e23b 100644
>> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>> @@ -2155,6 +2155,9 @@ static void mlx5e_tc_del_fdb_peers_flow(struct mlx5e_tc_flow *flow)
>> devcom = flow->priv->mdev->priv.eswitch->devcom;
>> mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) {
>> i = mlx5_lag_get_dev_seq(peer_esw->dev);
>> + if (i < 0)
>> + continue;
>> +
>
> [Severity: Critical]
> Does skipping the remainder of this loop prevent the flow from being
> unlinked from the doubly-linked list?
>
> By using continue here in mlx5e_tc_del_fdb_peers_flow(), we skip calling
> mlx5e_tc_del_fdb_peer_flow(), which means flow->peer[i] is never
> unlinked from the esw->offloads.peer_flows[original_i] list.
only in case a LAG member is removed from ldev, mlx5_lag_get_dev_seq()
will return error.
before LAG member is removed, esw->devcom is cleanup, which invoke
mlx5e_tc_clean_fdb_peer_flows(), which remove all peer flows.
Hence, no flow remains.
>
> If the flow is subsequently freed via mlx5e_flow_put(), will this leave a
> freed list node resident in the active list and guarantee a use-after-free
> list corruption during the next list traversal?
>
>> mlx5e_tc_del_fdb_peer_flow(flow, i);
>> }
>> }
>
> [ ... ]
>
>> @@ -5526,6 +5529,9 @@ void mlx5e_tc_clean_fdb_peer_flows(struct mlx5_eswitch *esw)
>>
>> mlx5_devcom_for_each_peer_entry(devcom, peer_esw, pos) {
>> i = mlx5_lag_get_dev_seq(peer_esw->dev);
>> + if (i < 0)
>> + continue;
>> +
>
> [Severity: Critical]
> Similarly to the above, does this skip unlinking during flow cleanup when
> the LAG sequence is unavailable?
By the time get_dev_seq() can fail, the member is already out of ldev
- and clean_fdb_peer_flows() (devcom unpair) ran before that, while seq
was still valid, so the flows are already unlinked.
The guard just covers that later window; nothing remains to leak.
>
>> list_for_each_entry_safe(flow, tmp, &esw->offloads.peer_flows[i], peer[i])
>> mlx5e_tc_del_fdb_peers_flow(flow);
>> }
>
>
> [Severity: Critical]
> This isn't a bug introduced by this patch, but is there a similar missing
> bounds check during flow addition in mlx5e_tc_add_fdb_peer_flow() in this
> same file?
>
> If a flow is added concurrently with LAG teardown, mlx5_lag_get_dev_seq()
> can return -ENOENT (-2):
>
> drivers/net/ethernet/mellanox/mlx5/core/en_tc.c:mlx5e_tc_add_fdb_peer_flow() {
> ...
> int i = mlx5_lag_get_dev_seq(peer_esw->dev);
> ...
> list_add_tail(&flow->peer[i], &esw->offloads.peer_flows[i]);
> ...
> }
>
> Does this result in an out-of-bounds write to a negative array index?
No, if LAG is destroyed, than is_peer_flow_needed will return false and
we won't enter here.
the whole peer loop runs under the devcom read lock
(mlx5_devcom_for_each_peer_begin), while devcom unpair - which is what
precedes LAG member removal and runs clean_fdb_peer_flows - takes the
write lock. The read lock therefore blocks teardown for the duration, so
mlx5_lag_get_dev_seq() can't go negative here.
next prev parent reply other threads:[~2026-06-22 6:41 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-17 6:32 [PATCH net 0/3] net/mlx5: LAG bug fixes Tariq Toukan
2026-06-17 6:32 ` [PATCH net 1/3] net/mlx5: LAG, Fix off-by-one in single-FDB error rollback Tariq Toukan
2026-06-17 6:32 ` [PATCH net 2/3] net/mlx5: LAG, MPESW, Fix missing complete() on devcom error Tariq Toukan
2026-06-17 6:32 ` [PATCH net 3/3] net/mlx5e: TC, skip peer flow cleanup when LAG seq is unavailable Tariq Toukan
2026-06-18 12:48 ` Simon Horman
2026-06-22 6:40 ` Shay Drori [this message]
2026-06-23 0:03 ` Jakub Kicinski
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=e18662ac-413e-43f6-ac65-a4e15fd47bb7@nvidia.com \
--to=shayd@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=edwards@nvidia.com \
--cc=gal@nvidia.com \
--cc=gbayer@linux.ibm.com \
--cc=horms@kernel.org \
--cc=jacob.e.keller@intel.com \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=moshe@nvidia.com \
--cc=msanalla@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=ohartoov@nvidia.com \
--cc=pabeni@redhat.com \
--cc=parav@nvidia.com \
--cc=phaddad@nvidia.com \
--cc=rongweil@nvidia.com \
--cc=saeedm@nvidia.com \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.