Netdev List
 help / color / mirror / Atom feed
From: Mark Bloch <mbloch@nvidia.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Simon Horman <horms@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults
Date: Wed, 1 Jul 2026 15:55:48 +0300	[thread overview]
Message-ID: <188fc320-b7b7-4041-9132-9c71f1e0fe31@nvidia.com> (raw)
In-Reply-To: <akTencQhKSanuFeW@FV6GYCPJ69>



On 01/07/2026 12:38, Jiri Pirko wrote:
> Mon, Jun 29, 2026 at 08:20:58PM +0200, mbloch@nvidia.com wrote:
>> Add devlink_eswitch_mode= kernel command line parsing for a default
>> eswitch mode.
>>
>> The supported syntax selects either all devlink handles or one explicit
>> comma-separated handle list:
>>
>>  devlink_eswitch_mode=*=<mode>
>>
>>  devlink_eswitch_mode=<handle>[,<handle>...]=<mode>
>>
>> where <mode> is one of legacy, switchdev or switchdev_inactive. All
>> selected handles receive the same mode. Assigning different modes to
>> different handle lists in the same parameter value is not supported.
>>
>> Store the parsed selector and mode in devlink core so the default can be
>> applied by a downstream patch.
>>
>> Document the devlink_eswitch_mode= syntax and duplicate handle handling.
>>
>> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>> ---
>> .../admin-guide/kernel-parameters.txt         |  25 ++
>> .../networking/devlink/devlink-defaults.rst   |  78 ++++++
>> Documentation/networking/devlink/index.rst    |   1 +
>> net/devlink/core.c                            | 227 ++++++++++++++++++
>> 4 files changed, 331 insertions(+)
>> create mode 100644 Documentation/networking/devlink/devlink-defaults.rst
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index b5493a7f8f22..117300dd589c 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1249,6 +1249,31 @@ Kernel parameters
>> 	dell_smm_hwmon.fan_max=
>> 			[HW] Maximum configurable fan speed.
>>
>> +	devlink_eswitch_mode=
>> +			[NET]
>> +			Format:
>> +			<selector>=<mode>
>> +
>> +			<selector>:
>> +			* | <handle>[,<handle>...]
>> +
>> +			<handle>:
>> +			<bus-name>/<dev-name>
>> +
>> +			Configure default devlink eswitch mode for matching
>> +			devlink instances during device initialization.
>> +
>> +			<mode>:
>> +			legacy | switchdev | switchdev_inactive
>> +
>> +			Examples:
>> +			devlink_eswitch_mode=*=switchdev
>> +			devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>> +			devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>> +
>> +			See Documentation/networking/devlink/devlink-defaults.rst
>> +			for the full syntax.
>> +
>> 	dfltcc=		[HW,S390]
>> 			Format: { on | off | def_only | inf_only | always }
>> 			on:       s390 zlib hardware support for compression on
>> diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
>> new file mode 100644
>> index 000000000000..380c9e99210e
>> --- /dev/null
>> +++ b/Documentation/networking/devlink/devlink-defaults.rst
>> @@ -0,0 +1,78 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============================
>> +Devlink Eswitch Mode Defaults
>> +==============================
>> +
>> +Devlink eswitch mode defaults allow the eswitch mode to be provided on the
>> +kernel command line and applied to matching devlink instances during device
>> +initialization.
>> +
>> +The devlink device is selected by its devlink handle. For PCI devices this is
>> +the same handle shown by ``devlink dev show``, for example
>> +``pci/0000:08:00.0``.
>> +
>> +Kernel command line syntax
>> +==========================
>> +
>> +Defaults are specified with the ``devlink_eswitch_mode=`` kernel command line
>> +parameter.
>> +
>> +The general syntax is::
>> +
>> +  devlink_eswitch_mode=<selector>=<mode>
>> +
>> +``<selector>`` is either ``*`` or one or more devlink handles::
>> +
>> +  * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
>> +
>> +``*`` applies the mode to every devlink instance. All handles in the same
>> +selector receive the same eswitch mode.
>> +
>> +``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
>> +
>> +Syntax rules
>> +------------
>> +
>> +The following syntax rules apply:
>> +
>> +* Specify the default in one ``devlink_eswitch_mode=`` parameter. Repeated
>> +  ``devlink_eswitch_mode=`` parameters are not accumulated.
>> +* The ``devlink_eswitch_mode=`` value is limited by the kernel command line
>> +  size.
>> +* Whitespace is not allowed within the parameter value.
>> +* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
>> +  combined with explicit handles.
>> +* ``<bus-name>`` and ``<dev-name>`` must not be empty.
>> +* ``<dev-name>`` may contain ``:``. This allows PCI names such as
>> +  ``0000:08:00.0``.
>> +* Handles must not contain whitespace, ``*``, ``=`` or more than one ``/``.
>> +* A comma separates handles.
>> +* Comma-separated default assignments are not supported.
>> +* Duplicate handles are rejected and the devlink eswitch mode default is
>> +  ignored.
>> +
>> +The eswitch mode default corresponds to the userspace command::
>> +
>> +  devlink dev eswitch set <handle> mode <value>
>> +
>> +
>> +Examples
>> +========
>> +
>> +Set all devlink instances to switchdev mode::
>> +
>> +  devlink_eswitch_mode=*=switchdev
>> +
>> +Set one PCI devlink instance to switchdev mode::
>> +
>> +  devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>> +
>> +Set two PCI devlink instances to switchdev inactive mode::
>> +
>> +  devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>> +
>> +The following is invalid because comma-separated default assignments are not
>> +supported::
>> +
>> +  devlink_eswitch_mode=pci/0000:08:00.0=switchdev,pci/0000:09:00.0=switchdev_inactive
> 
> Interesting. I would think that this is something user may want to set
> for some usecases, no?
> 
> 
>> diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>> index 32f70879ddd0..93f09cb18c44 100644
>> --- a/Documentation/networking/devlink/index.rst
>> +++ b/Documentation/networking/devlink/index.rst
>> @@ -56,6 +56,7 @@ general.
>>    :maxdepth: 1
>>
>>    devlink-dpipe
>> +   devlink-defaults
>>    devlink-eswitch-attr
>>    devlink-flash
>>    devlink-health
>> diff --git a/net/devlink/core.c b/net/devlink/core.c
> 
> Wanna have this in a separate file perhaps? "default.c"?
> 
> 
>> index fe9f6a0a67d5..5126509a9c4e 100644
>> --- a/net/devlink/core.c
>> +++ b/net/devlink/core.c
>> @@ -4,6 +4,10 @@
>>  * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
>>  */
>>
>> +#include <linux/init.h>
>> +#include <linux/list.h>
>> +#include <linux/slab.h>
>> +#include <linux/string.h>
>> #include <net/genetlink.h>
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/devlink.h>
>> @@ -16,6 +20,193 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
>>
>> DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
>>
>> +static char *devlink_default_esw_mode_param;
>> +static bool devlink_default_esw_mode_match_all;
>> +static enum devlink_eswitch_mode devlink_default_esw_mode;
>> +static LIST_HEAD(devlink_default_esw_mode_nodes);
>> +
>> +struct devlink_default_esw_mode_node {
>> +	struct list_head list;
>> +	char *bus_name;
>> +	char *dev_name;
>> +};
>> +
>> +static int __init
>> +devlink_default_esw_mode_to_value(const char *str,
>> +				  enum devlink_eswitch_mode *mode)
>> +{
>> +	if (!strcmp(str, "legacy")) {
>> +		*mode = DEVLINK_ESWITCH_MODE_LEGACY;
>> +		return 0;
>> +	}
>> +	if (!strcmp(str, "switchdev")) {
>> +		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
>> +		return 0;
>> +	}
>> +	if (!strcmp(str, "switchdev_inactive")) {
>> +		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
>> +		return 0;
>> +	}
>> +
>> +	return -EINVAL;
>> +}
>> +
>> +static int __init
>> +devlink_default_esw_mode_handle_parse(char *handle, char **bus_name,
>> +				      char **dev_name)
>> +{
>> +	char *slash;
>> +	char *p;
>> +
>> +	if (!*handle)
>> +		return -EINVAL;
>> +
>> +	for (p = handle; *p; p++) {
>> +		if (*p == '*' || *p == '=')
>> +			return -EINVAL;
>> +	}
>> +
>> +	slash = strchr(handle, '/');
>> +	if (!slash || slash == handle || !slash[1])
>> +		return -EINVAL;
>> +	if (strchr(slash + 1, '/'))
>> +		return -EINVAL;
>> +
>> +	*slash = '\0';
>> +
>> +	*bus_name = handle;
>> +	*dev_name = slash + 1;
>> +	return 0;
>> +}
>> +
>> +static struct devlink_default_esw_mode_node *
>> +devlink_default_esw_mode_node_find(const char *bus_name, const char *dev_name)
>> +{
>> +	struct devlink_default_esw_mode_node *node;
>> +
>> +	list_for_each_entry(node, &devlink_default_esw_mode_nodes, list) {
>> +		if (!strcmp(node->bus_name, bus_name) &&
>> +		    !strcmp(node->dev_name, dev_name))
>> +			return node;
>> +	}
>> +
>> +	return NULL;
>> +}
>> +
>> +static int __init
>> +devlink_default_esw_mode_node_add(const char *bus_name, const char *dev_name)
>> +{
>> +	struct devlink_default_esw_mode_node *node;
>> +
>> +	if (devlink_default_esw_mode_node_find(bus_name, dev_name))
>> +		return -EEXIST;
>> +
>> +	node = kzalloc_obj(*node);
>> +	if (!node)
>> +		return -ENOMEM;
>> +
>> +	INIT_LIST_HEAD(&node->list);
>> +	node->bus_name = kstrdup(bus_name, GFP_KERNEL);
>> +	node->dev_name = kstrdup(dev_name, GFP_KERNEL);
>> +	if (!node->bus_name || !node->dev_name) {
>> +		kfree(node->bus_name);
>> +		kfree(node->dev_name);
>> +		kfree(node);
>> +		return -ENOMEM;
>> +	}
>> +
>> +	list_add_tail(&node->list, &devlink_default_esw_mode_nodes);
>> +	return 0;
>> +}
>> +
>> +static int __init devlink_default_esw_mode_handles_parse(char *handles)
>> +{
>> +	char *handle;
>> +	int err;
>> +
>> +	if (!strcmp(handles, "*")) {
>> +		devlink_default_esw_mode_match_all = true;
>> +		return 0;
>> +	}
>> +
>> +	while ((handle = strsep(&handles, ",")) != NULL) {
>> +		char *bus_name;
>> +		char *dev_name;
>> +
>> +		err = devlink_default_esw_mode_handle_parse(handle, &bus_name,
>> +							    &dev_name);
>> +		if (err)
>> +			return err;
>> +
>> +		err = devlink_default_esw_mode_node_add(bus_name, dev_name);
>> +		if (err)
>> +			return err;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void __init
>> +devlink_default_esw_mode_node_free(struct devlink_default_esw_mode_node *node)
>> +{
>> +	kfree(node->bus_name);
>> +	kfree(node->dev_name);
>> +	kfree(node);
>> +}
>> +
>> +static void __init devlink_default_esw_mode_nodes_clear(void)
>> +{
>> +	struct devlink_default_esw_mode_node *node;
>> +	struct devlink_default_esw_mode_node *node_tmp;
>> +
>> +	list_for_each_entry_safe(node, node_tmp,
>> +				 &devlink_default_esw_mode_nodes, list) {
>> +		list_del(&node->list);
>> +		devlink_default_esw_mode_node_free(node);
>> +	}
>> +
>> +	devlink_default_esw_mode_match_all = false;
>> +}
>> +
>> +static int __init devlink_default_esw_mode_parse(char *str)
>> +{
>> +	char *handles;
>> +	char *separator;
>> +	char *mode;
>> +	enum devlink_eswitch_mode esw_mode;
>> +	int err;
>> +
>> +	if (!*str)
>> +		return -EINVAL;
>> +
>> +	separator = strrchr(str, '=');
>> +	if (!separator || separator == str || !separator[1])
>> +		return -EINVAL;
>> +
>> +	*separator = '\0';
>> +	handles = str;
>> +	mode = separator + 1;
>> +
>> +	err = devlink_default_esw_mode_to_value(mode, &esw_mode);
>> +	if (err)
>> +		return err;
>> +
>> +	err = devlink_default_esw_mode_handles_parse(handles);
>> +	if (err)
>> +		devlink_default_esw_mode_nodes_clear();
>> +	else
>> +		devlink_default_esw_mode = esw_mode;
>> +
>> +	return err;
>> +}
>> +
>> +static int __init devlink_default_esw_mode_setup(char *str)
>> +{
>> +	devlink_default_esw_mode_param = str;
>> +	return 1;
>> +}
>> +__setup("devlink_eswitch_mode=", devlink_default_esw_mode_setup);
>> +
>> static struct devlink *devlinks_xa_get(unsigned long index)
>> {
>> 	struct devlink *devlink;
>> @@ -382,6 +573,14 @@ struct devlink *devlinks_xa_lookup_get(struct net *net, unsigned long index)
>> /**
>>  * devl_register - Register devlink instance
>>  * @devlink: devlink
>> + *
>> + * Make @devlink visible to userspace. Drivers must call this only after the
>> + * instance is fully initialized and its devlink operations can be called.
>> + *
>> + * Context: Caller must hold the devlink instance lock. Use devlink_register()
>> + * when the lock is not already held.
>> + *
>> + * Return: 0 on success.
>>  */
>> int devl_register(struct devlink *devlink)
>> {
>> @@ -580,6 +779,31 @@ static int __init devlink_init(void)
>> {
>> 	int err;
>>
>> +	if (devlink_default_esw_mode_param) {
>> +		char *def;
>> +
>> +		def = kstrdup(devlink_default_esw_mode_param, GFP_KERNEL);
>> +		if (!def) {
>> +			devlink_default_esw_mode_param = NULL;
>> +			pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>> +		} else {
>> +			err = devlink_default_esw_mode_parse(def);
>> +			kfree(def);
>> +			if (err == -EEXIST) {
>> +				devlink_default_esw_mode_param = NULL;
>> +				pr_warn("devlink: duplicate eswitch mode handles ignored\n");
>> +			} else if (err == -EINVAL) {
>> +				devlink_default_esw_mode_param = NULL;
>> +				pr_warn("devlink: invalid devlink_eswitch_mode parameter ignored\n");
>> +			} else if (err == -ENOMEM) {
>> +				devlink_default_esw_mode_param = NULL;
>> +				pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>> +			} else if (err) {
>> +				goto out;
>> +			}
> 
> Move this to a separate helper alongside the other "default" functions?

I did that in the following patch. If I respin I'll do it in this one.

Mark

> 
> 
>> +		}
>> +	}
>> +
>> 	err = register_pernet_subsys(&devlink_pernet_ops);
>> 	if (err)
>> 		goto out;
>> @@ -595,7 +819,10 @@ static int __init devlink_init(void)
>> out_unreg_pernet_subsys:
>> 	unregister_pernet_subsys(&devlink_pernet_ops);
>> out:
>> +	if (err)
>> +		devlink_default_esw_mode_nodes_clear();
>> 	WARN_ON(err);
>> +
>> 	return err;
>> }
>>
>> -- 
>> 2.43.0
>>


  reply	other threads:[~2026-07-01 12:55 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 18:20 [PATCH net-next V4 0/6] evlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 1/6] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 2/6] devlink: Factor out eswitch mode setting Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults Mark Bloch
2026-07-01  9:38   ` Jiri Pirko
2026-07-01 12:55     ` Mark Bloch [this message]
2026-07-01 13:14     ` Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 4/6] devlink: Apply " Mark Bloch
2026-07-01  9:48   ` Jiri Pirko
2026-07-01 12:57     ` Mark Bloch
2026-07-01 14:09       ` Jiri Pirko
2026-07-01 17:42         ` Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 5/6] devlink: Add API to apply eswitch mode boot default Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 6/6] net/mlx5: Apply devlink eswitch mode boot default on probe Mark Bloch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=188fc320-b7b7-4041-9132-9c71f1e0fe31@nvidia.com \
    --to=mbloch@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=corbet@lwn.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox