All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: tariqt@nvidia.com
Cc: Jakub Kicinski <kuba@kernel.org>,
	edumazet@google.com, pabeni@redhat.com, andrew+netdev@lunn.ch,
	davem@davemloft.net, saeedm@nvidia.com, leon@kernel.org,
	mbloch@nvidia.com, shayd@nvidia.com, ohartoov@nvidia.com,
	edwards@nvidia.com, msanalla@nvidia.com, horms@kernel.org,
	parav@nvidia.com, moshe@nvidia.com, kees@kernel.org,
	phaddad@nvidia.com, gbayer@linux.ibm.com, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org, linux-kernel@vger.kernel.org,
	gal@nvidia.com
Subject: Re: [PATCH net] net/mlx5: Skip disabled vports when setting max TX speed
Date: Thu, 14 May 2026 18:35:08 -0700	[thread overview]
Message-ID: <20260515013508.1308541-1-kuba@kernel.org> (raw)
In-Reply-To: <20260513063640.334132-1-tariqt@nvidia.com>

This is an AI-generated review of your patch. The human sending this
email has considered the AI review valid, or at least plausible.
---
net/mlx5: Skip disabled vports when setting max TX speed

This commit modifies the LAG activation process to skip setting the maximum
TX speed for disabled vports, preventing firmware commands from being sent
to uninitialized vports. It stores the aggregated speed in the vport
structure so it can be applied later if the vport is enabled.

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.c

[ ... ]

> @@ -948,6 +966,9 @@ int mlx5_esw_vport_enable(struct mlx5_eswitch *esw, struct mlx5_vport *vport,
>  
>  	esw->enabled_vports++;
>  	esw_debug(esw->dev, "Enabled VPORT(%d)\n", vport_num);
> +
> +	if (vport->agg_max_tx_speed)

Could dynamically allocated vports bypass the active LAG QoS speed limits?

If a new vport is allocated dynamically after the LAG is already active,
it appears agg_max_tx_speed would be zero-initialized. When this new vport
is subsequently enabled here, the condition evaluates to false, which
would skip the firmware speed update entirely.

> +		mlx5_esw_vport_set_max_tx_speed(esw, vport);

Does this introduce a race condition with mlx5_lag_modify_device_vports_speed?

If the LAG thread is concurrently updating the speed:

LAG thread (lockless):
vport->agg_max_tx_speed = speed;
if (!vport->enabled)
    continue;

ESW thread (with esw->state_lock):
vport->enabled = true;
...
if (vport->agg_max_tx_speed)
    mlx5_esw_vport_set_max_tx_speed(esw, vport);

Since there are no shared locks or explicit memory barriers, could CPU
reordering allow the LAG thread to see a false enabled state while the ESW
thread sees a zero agg_max_tx_speed, resulting in neither thread issuing the
firmware command?

>  done:
>  	mutex_unlock(&esw->state_lock);

[ ... ]

> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> --- a/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/lag/lag.c

[ ... ]

> @@ -1274,6 +1274,11 @@ static void mlx5_lag_modify_device_vports_speed(struct mlx5_core_dev *mdev,
>  		if (vport->vport == MLX5_VPORT_UPLINK)
>  			continue;
>  
> +		vport->agg_max_tx_speed = speed;
> +
> +		if (!vport->enabled)

Is there a regression here regarding the lifetime of the vport pointer?

This loop locklessly iterates over the esw->vports xarray. Because the xarray
iteration drops the RCU read lock before executing the loop body, and
mlx5_esw_vport_free() calls kfree() synchronously without an RCU grace
period, could a concurrent vport deletion cause these new field accesses to
operate on freed memory?

> +			continue;
> +
>  		ret = mlx5_modify_vport_max_tx_speed(mdev, op_mod,
>  						     vport->vport, true, speed);

Does this code contain a time-of-check to time-of-use regression
that is affected by the lockless architecture here?

Looking at mlx5_modify_vport_max_tx_speed(), it queries the firmware
admin_state and writes it back without holding esw->state_lock. If the LAG
thread reads an UP state, but mlx5_esw_vport_disable() concurrently disables
the vport (setting the hardware state to DOWN), could the LAG thread
accidentally overwrite the admin_state back to UP, effectively reverting the
disable action?

>  		if (ret)
-- 
pw-bot: cr

      reply	other threads:[~2026-05-15  1:35 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-13  6:36 [PATCH net] net/mlx5: Skip disabled vports when setting max TX speed Tariq Toukan
2026-05-15  1:35 ` Jakub Kicinski [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260515013508.1308541-1-kuba@kernel.org \
    --to=kuba@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=gbayer@linux.ibm.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.