Re: [PATCH net-next 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Mark Bloch <mbloch@nvidia.com>
To: Tariq Toukan <tariqt@nvidia.com>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>, Shay Drory <shayd@nvidia.com>,
	Or Har-Toov <ohartoov@nvidia.com>,
	Edward Srouji <edwards@nvidia.com>,
	Maher Sanalla <msanalla@nvidia.com>,
	Simon Horman <horms@kernel.org>, Moshe Shemesh <moshe@nvidia.com>,
	Kees Cook <kees@kernel.org>,
	Patrisious Haddad <phaddad@nvidia.com>,
	Gerd Bayer <gbayer@linux.ibm.com>,
	Parav Pandit <parav@nvidia.com>, Cosmin Ratiu <cratiu@nvidia.com>,
	Carolina Jubran <cjubran@nvidia.com>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-kernel@vger.kernel.org, Gal Pressman <gal@nvidia.com>,
	Dragos Tatulea <dtatulea@nvidia.com>
Subject: Re: [PATCH net-next 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init
Date: Thu, 9 Apr 2026 21:02:39 +0300	[thread overview]
Message-ID: <77ebbbc8-2991-40cf-baa3-080f670aaae5@nvidia.com> (raw)
In-Reply-To: <20260409115550.156419-8-tariqt@nvidia.com>



On 09/04/2026 14:55, Tariq Toukan wrote:
> From: Mark Bloch <mbloch@nvidia.com>
> 
> Deployments that always operate in switchdev mode currently require
> manual devlink configuration after driver probe, which complicates
> automated provisioning.
> 
> Introduce MLX5_PROF_MASK_DEF_SWITCHDEV, a new profile mask bit, and
> profile index 4. When a device is initialized or reloaded with this
> profile, the driver automatically switches the e-switch to switchdev
> mode by calling mlx5_devlink_eswitch_mode_set() immediately after
> bringing the device online.
> 
> A no-op stub of mlx5_devlink_eswitch_mode_set() is added for builds
> without CONFIG_MLX5_ESWITCH.
> 
> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
> Reviewed-by: Moshe Shemesh <moshe@nvidia.com>
> Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
> ---
>  .../net/ethernet/mellanox/mlx5/core/eswitch.h |  6 +++++
>  .../net/ethernet/mellanox/mlx5/core/main.c    | 26 ++++++++++++++++++-
>  include/linux/mlx5/driver.h                   |  1 +
>  3 files changed, 32 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> index 256ac3ad37bc..5dcca59c3125 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/eswitch.h
> @@ -1047,6 +1047,12 @@ mlx5_esw_lag_demux_rule_create(struct mlx5_eswitch *esw, u16 vport_num,
>  	return ERR_PTR(-EOPNOTSUPP);
>  }
>  
> +static inline int
> +mlx5_devlink_eswitch_mode_set(struct devlink *devlink, u16 mode,
> +			      struct netlink_ext_ack *extack)
> +{
> +	return -EOPNOTSUPP;
> +}
>  #endif /* CONFIG_MLX5_ESWITCH */
>  
>  #endif /* __MLX5_ESWITCH_H__ */
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index dc7f20a357d9..12f39b4b6c2a 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -86,7 +86,7 @@ MODULE_PARM_DESC(debug_mask, "debug mask: 1 = dump cmd data, 2 = dump cmd exec t
>  
>  static unsigned int prof_sel = MLX5_DEFAULT_PROF;
>  module_param_named(prof_sel, prof_sel, uint, 0444);
> -MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 2");
> +MODULE_PARM_DESC(prof_sel, "profile selector. Valid range 0 - 4");

Sashiko writes:
"Is it appropriate to expand a module parameter to configure a feature that
already has a standard devlink API (devlink dev eswitch set ... mode
switchdev)? Automated provisioning is typically expected to be handled in
userspace rather than configured via driver module parameters."

This is intended as an intermediate step.

The end goal is that for certain environments (e.g. ECPF/DPU), only
switchdev mode is supported and it becomes the default without requiring
user configuration.

There is also the question of multi-NIC systems, where a user may want
only a subset of devices to default to switchdev. This patch does not
aim to solve that. The long-term direction is to tie the behavior to the
NIC type / platform rather than expose it as a generic user-facing knob.

Accordingly, this is not meant to be a general-purpose api. It is intended
for controlled environments (e.g. DPU deployments) where switchdev is
the only supported mode.

>  
>  static u32 sw_owner_id[4];
>  #define MAX_SW_VHCA_ID (BIT(__mlx5_bit_sz(cmd_hca_cap_2, sw_vhca_id)) - 1)
> @@ -185,6 +185,11 @@ static struct mlx5_profile profile[] = {
>  		.log_max_qp	= LOG_MAX_SUPPORTED_QPS,
>  		.num_cmd_caches = 0,
>  	},
> +	[4] = {
> +		.mask = MLX5_PROF_MASK_DEF_SWITCHDEV | MLX5_PROF_MASK_QP_SIZE,
> +		.log_max_qp = LOG_MAX_SUPPORTED_QPS,
> +		.num_cmd_caches = MLX5_NUM_COMMAND_CACHES,
> +	},
>  };
>  
>  static int wait_fw_init(struct mlx5_core_dev *dev, u32 max_wait_mili,
> @@ -1451,6 +1456,17 @@ static void mlx5_unload(struct mlx5_core_dev *dev)
>  	mlx5_free_bfreg(dev, &dev->priv.bfreg);
>  }
>  
> +static void mlx5_set_default_switchdev(struct mlx5_core_dev *dev)
> +{
> +	int err;
> +
> +	err = mlx5_devlink_eswitch_mode_set(priv_to_devlink(dev),
> +					    DEVLINK_ESWITCH_MODE_SWITCHDEV,
> +					    NULL);

Sashiko writes:
"Does calling the internal driver eswitch mode function directly bypass the
devlink core? This appears to prevent the devlink subsystem from emitting
proper netlink state notifications to userspace when the mode transitions.
"

The intent here is to support platforms where switchdev must be the
default (notably DPU/ARM side). Today this transition is handled via
userspace scripts.

This change provides an intermediate step, allowing the device to come
up directly in switchdev mode without relying on userspace orchestration.
It still requires explicit opt-in via the profile.


> +	if (err && err != -EOPNOTSUPP)
> +		mlx5_core_warn(dev, "failed setting switchdev as default\n");
> +}
> +
>  int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev)
>  {
>  	bool light_probe = mlx5_dev_is_lightweight(dev);
> @@ -1497,6 +1513,10 @@ int mlx5_init_one_devl_locked(struct mlx5_core_dev *dev)
>  		mlx5_core_err(dev, "mlx5_hwmon_dev_register failed with error code %d\n", err);
>  
>  	mutex_unlock(&dev->intf_state_mutex);
> +
> +	if (dev->profile.mask & MLX5_PROF_MASK_DEF_SWITCHDEV)
> +		mlx5_set_default_switchdev(dev);
> +

Sashiko write:
"If a user boots with prof_sel=4 but later manually changes the eswitch mode
to legacy via standard devlink commands, will this call force the device
back into switchdev mode during a firmware recovery or device reload? This
seems to override user-driven runtime devlink configurations."

Given the intended use case, this is acceptable.

These devices are considered “switchdev by default”, so after a firmware
reset or device reload it is expected that they return to switchdev mode.
Preserving a prior user override to legacy mode is not a requirement for
this configuration.

Mark

>  	return 0;
>  
>  err_register:
> @@ -1598,6 +1618,10 @@ int mlx5_load_one_devl_locked(struct mlx5_core_dev *dev, bool recovery)
>  		goto err_attach;
>  
>  	mutex_unlock(&dev->intf_state_mutex);
> +
> +	if (dev->profile.mask & MLX5_PROF_MASK_DEF_SWITCHDEV)
> +		mlx5_set_default_switchdev(dev);
> +
>  	return 0;
>  
>  err_attach:
> diff --git a/include/linux/mlx5/driver.h b/include/linux/mlx5/driver.h
> index 1268fcf35ec7..cfbc0ff6292a 100644
> --- a/include/linux/mlx5/driver.h
> +++ b/include/linux/mlx5/driver.h
> @@ -706,6 +706,7 @@ struct mlx5_st;
>  enum {
>  	MLX5_PROF_MASK_QP_SIZE		= (u64)1 << 0,
>  	MLX5_PROF_MASK_MR_CACHE		= (u64)1 << 1,
> +	MLX5_PROF_MASK_DEF_SWITCHDEV    = (u64)1 << 2,
>  };
>  
>  enum {

next prev parent reply	other threads:[~2026-04-09 18:02 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-04-09 11:55 [PATCH net-next 0/7] net/mlx5: Improve representor lifecycle and fix work queue deadlock Tariq Toukan
2026-04-09 11:55 ` [PATCH net-next 1/7] net/mlx5: Lag: refactor representor reload handling Tariq Toukan
2026-04-09 17:57   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 2/7] net/mlx5: E-Switch, move work queue generation counter Tariq Toukan
2026-04-09 17:58   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 3/7] net/mlx5: E-Switch, introduce generic work queue dispatch helper Tariq Toukan
2026-04-09 11:55 ` [PATCH net-next 4/7] net/mlx5: E-Switch, fix deadlock between devlink lock and esw->wq Tariq Toukan
2026-04-09 18:01   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 5/7] net/mlx5: E-Switch, block representors during reconfiguration Tariq Toukan
2026-04-09 18:02   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 6/7] net/mlx5: E-switch, load reps via work queue after registration Tariq Toukan
2026-04-09 18:02   ` Mark Bloch
2026-04-09 11:55 ` [PATCH net-next 7/7] net/mlx5: Add profile to auto-enable switchdev mode at device init Tariq Toukan
2026-04-09 18:02   ` Mark Bloch [this message]
2026-04-09 18:20 ` [PATCH net-next 0/7] net/mlx5: Improve representor lifecycle and fix work queue deadlock Mark Bloch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=77ebbbc8-2991-40cf-baa3-080f670aaae5@nvidia.com \
    --to=mbloch@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=cjubran@nvidia.com \
    --cc=cratiu@nvidia.com \
    --cc=davem@davemloft.net \
    --cc=dtatulea@nvidia.com \
    --cc=edumazet@google.com \
    --cc=edwards@nvidia.com \
    --cc=gal@nvidia.com \
    --cc=gbayer@linux.ibm.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=moshe@nvidia.com \
    --cc=msanalla@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=ohartoov@nvidia.com \
    --cc=pabeni@redhat.com \
    --cc=parav@nvidia.com \
    --cc=phaddad@nvidia.com \
    --cc=saeedm@nvidia.com \
    --cc=shayd@nvidia.com \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox