Netdev List
 help / color / mirror / Atom feed
From: Shay Drori <shayd@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	"Andrew Lunn" <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, Mark Bloch <mbloch@nvidia.com>,
	Nimrod Oren <noren@nvidia.com>, Yael Chemla <ychemla@nvidia.com>,
	Or Har-Toov <ohartoov@nvidia.com>,
	Edward Srouji <edwards@nvidia.com>,
	Maher Sanalla <msanalla@nvidia.com>,
	Simon Horman <horms@kernel.org>, Parav Pandit <parav@nvidia.com>,
	Patrisious Haddad <phaddad@nvidia.com>,
	Kees Cook <kees@kernel.org>, Moshe Shemesh <moshe@nvidia.com>,
	<linux-kernel@vger.kernel.org>, <netdev@vger.kernel.org>,
	<linux-rdma@vger.kernel.org>, Gal Pressman <gal@nvidia.com>
Subject: Re: [PATCH net-next 03/13] net/mlx5: E-Switch, move devcom init from TC to eswitch layer
Date: Thu, 28 May 2026 21:48:42 +0300	[thread overview]
Message-ID: <f9543e58-e5d4-49af-ba50-09ae6656153a@nvidia.com> (raw)
In-Reply-To: <20260527125427.385976-4-tariqt@nvidia.com>



On 27/05/2026 15:54, Tariq Toukan wrote:
> From: Shay Drory <shayd@nvidia.com>
> 
> Move the E-swtich devcom component management from TC layer to ESW
> layer.
> 
> This refactoring places devcom lifecycle management at the appropriate
> layer and prepares for SD LAG which needs devcom registration
> independent of the TC/representor initialization.
> 
> Signed-off-by: Shay Drory <shayd@nvidia.com>
> Reviewed-by: Mark Bloch <mbloch@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>   .../net/ethernet/mellanox/mlx5/core/en_tc.c   | 20 -------------------
>   .../mellanox/mlx5/core/eswitch_offloads.c     |  6 ++++++
>   2 files changed, 6 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> index a9001d1c902f..3846c16c3138 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> @@ -5394,8 +5394,6 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv)
>   {
>   	const size_t sz_enc_opts = sizeof(struct tunnel_match_enc_opts);
>   	u8 mapping_id[MLX5_SW_IMAGE_GUID_MAX_BYTES];
> -	struct mlx5_devcom_match_attr attr = {};
> -	struct netdev_phys_item_id ppid;
>   	struct mlx5e_rep_priv *rpriv;
>   	struct mapping_ctx *mapping;
>   	struct mlx5_eswitch *esw;
> @@ -5456,14 +5454,6 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv)
>   		goto err_action_counter;
>   	}
>   
> -	err = netif_get_port_parent_id(priv->netdev, &ppid, false);
> -	if (!err) {
> -		memcpy(&attr.key.buf, &ppid.id, ppid.id_len);
> -		attr.flags = MLX5_DEVCOM_MATCH_FLAGS_NS;
> -		attr.net = mlx5_core_net(esw->dev);
> -		mlx5_esw_offloads_devcom_init(esw, &attr);
> -	}
> -
>   	return 0;
>   
>   err_action_counter:
> @@ -5484,16 +5474,6 @@ int mlx5e_tc_esw_init(struct mlx5_rep_uplink_priv *uplink_priv)
>   
>   void mlx5e_tc_esw_cleanup(struct mlx5_rep_uplink_priv *uplink_priv)
>   {
> -	struct mlx5e_rep_priv *rpriv;
> -	struct mlx5_eswitch *esw;
> -	struct mlx5e_priv *priv;
> -
> -	rpriv = container_of(uplink_priv, struct mlx5e_rep_priv, uplink_priv);
> -	priv = netdev_priv(rpriv->netdev);
> -	esw = priv->mdev->priv.eswitch;
> -
> -	mlx5_esw_offloads_devcom_cleanup(esw);
> -
>   	mlx5e_tc_tun_cleanup(uplink_priv->encap);
>   
>   	mapping_destroy(uplink_priv->tunnel_enc_opts_mapping);
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> index 189be11c4c39..d9683d3ea0e7 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch_offloads.c
> @@ -3866,6 +3866,7 @@ bool mlx5_esw_offloads_controller_valid(const struct mlx5_eswitch *esw, u32 cont
>   int esw_offloads_enable(struct mlx5_eswitch *esw)
>   {
>   	u8 mapping_id[MLX5_SW_IMAGE_GUID_MAX_BYTES];
> +	struct mlx5_devcom_match_attr attr = {};
>   	struct mapping_ctx *reg_c0_obj_pool;
>   	struct mlx5_vport *vport;
>   	unsigned long i;
> @@ -3926,6 +3927,10 @@ int esw_offloads_enable(struct mlx5_eswitch *esw)
>   	if (err)
>   		goto err_vports;
>   
> +	memcpy(attr.key.buf, mapping_id, id_len);
> +	attr.flags = MLX5_DEVCOM_MATCH_FLAGS_NS;
> +	attr.net = mlx5_core_net(esw->dev);
> +	mlx5_esw_offloads_devcom_init(esw, &attr);
Sashiko.dev says:
"Does this code introduce a race condition by registering the VF representor
netdevices before the devcom component is initialized?
Because mlx5_eswitch_enable_pf_vf_vports() was called just before this 
block,
it registers the representor netdevices and emits RTM_NEWLINK events.
Userspace tools can then start asynchronously configuring the interfaces.
If userspace attempts to add a TC rule requiring peer flow programming (like
multipath or LAG) before mlx5_esw_offloads_devcom_init() completes, the 
driver
will attempt to access esw->devcom while it is still NULL. Functions like
mlx5_devcom_for_each_peer_begin() will return false, causing the TC rule
addition to fail with -ENODEV."

if devcom is NULL, is_peer_flow_needed() will return false, and
mlx5e_add_fdb_flow() will return success.


Sashiko.dev says:
"Can this unconditional devcom component initialization cause incorrect
pairing of independent devices if the system image GUID is unavailable?
Previously, netif_get_port_parent_id() would fail and safely bypass devcom
registration if the device lacked a valid hardware GUID. Now,
mlx5_query_nic_sw_system_image_guid() is called earlier in this function
and populates mapping_id and id_len.
If the device lacks a valid hardware GUID (id_len == 0), attr.key.buf will
remain filled with zeros (due to the struct initialization). Unconditionally
registering the devcom component via mlx5_esw_offloads_devcom_init()
with this all-zero key could cause multiple independent devices on the
same host to falsely pair with each other."

mlx5_query_nic_sw_system_image_guid() can only fail with -ENOMEM or
on a FW command error. In both cases mlx5 would have failed earlier
and never reached mlx5_esw_offloads_devcom_init().

>   	return 0;
>   
>   err_vports:
> @@ -3970,6 +3975,7 @@ static int esw_offloads_stop(struct mlx5_eswitch *esw,
>   
>   void esw_offloads_disable(struct mlx5_eswitch *esw)
>   {
> +	mlx5_esw_offloads_devcom_cleanup(esw);
>   	mlx5_eswitch_disable_pf_vf_vports(esw);
>   	mlx5_esw_offloads_rep_unload(esw, MLX5_VPORT_UPLINK);
>   	esw_set_passing_vport_metadata(esw, false);


  reply	other threads:[~2026-05-28 18:49 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-27 12:54 [PATCH net-next 00/13] net/mlx5: Add switchdev mode support for Socket Direct single netdev, part 1/2 Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 01/13] net/mlx5: LAG, factor out shared FDB code into dedicated file Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 02/13] net/mlx5: E-Switch, align disable sequence with switchdev-to-legacy transition Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 03/13] net/mlx5: E-Switch, move devcom init from TC to eswitch layer Tariq Toukan
2026-05-28 18:48   ` Shay Drori [this message]
2026-05-27 12:54 ` [PATCH net-next 04/13] net/mlx5: LAG, replace peer count check with direct peer lookup Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 05/13] net/mlx5: LAG, prepare for SD device integration Tariq Toukan
2026-05-28 18:56   ` Shay Drori
2026-05-27 12:54 ` [PATCH net-next 06/13] net/mlx5: LAG, extend shared FDB API with group_id filter Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 07/13] net/mlx5: SD, introduce Socket Direct LAG Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 08/13] net/mlx5: LAG, block RoCE and VF LAG for SD devices Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 09/13] net/mlx5: LAG, block multipath " Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 10/13] net/mlx5: SD, keep netdev resources on same PF in switchdev mode Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 11/13] net/mlx5e: TC, track peer flow slots with bitmap Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 12/13] net/mlx5e: TC, enable steering for SD LAG Tariq Toukan
2026-05-27 12:54 ` [PATCH net-next 13/13] net/mlx5e: Verify unique vhca_id count instead of range Tariq Toukan
2026-05-27 22:08 ` [PATCH net-next 00/13] net/mlx5: Add switchdev mode support for Socket Direct single netdev, part 1/2 Jacob Keller
2026-05-28  9:18   ` Shay Drori
2026-05-28 17:59     ` Jacob Keller
2026-05-29  0:40 ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f9543e58-e5d4-49af-ba50-09ae6656153a@nvidia.com \
    --to=shayd@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=noren@nvidia.com \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=tariqt@nvidia.com \
    --cc=ychemla@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox