Netdev List
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Mark Bloch <mbloch@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
	 Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	 Tariq Toukan <tariqt@nvidia.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	 Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	netdev@vger.kernel.org,  linux-rdma@vger.kernel.org,
	linux-doc@vger.kernel.org
Subject: Re: [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults
Date: Wed, 1 Jul 2026 11:38:03 +0200	[thread overview]
Message-ID: <akTencQhKSanuFeW@FV6GYCPJ69> (raw)
In-Reply-To: <20260629182102.245150-4-mbloch@nvidia.com>

Mon, Jun 29, 2026 at 08:20:58PM +0200, mbloch@nvidia.com wrote:
>Add devlink_eswitch_mode= kernel command line parsing for a default
>eswitch mode.
>
>The supported syntax selects either all devlink handles or one explicit
>comma-separated handle list:
>
>  devlink_eswitch_mode=*=<mode>
>
>  devlink_eswitch_mode=<handle>[,<handle>...]=<mode>
>
>where <mode> is one of legacy, switchdev or switchdev_inactive. All
>selected handles receive the same mode. Assigning different modes to
>different handle lists in the same parameter value is not supported.
>
>Store the parsed selector and mode in devlink core so the default can be
>applied by a downstream patch.
>
>Document the devlink_eswitch_mode= syntax and duplicate handle handling.
>
>Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>---
> .../admin-guide/kernel-parameters.txt         |  25 ++
> .../networking/devlink/devlink-defaults.rst   |  78 ++++++
> Documentation/networking/devlink/index.rst    |   1 +
> net/devlink/core.c                            | 227 ++++++++++++++++++
> 4 files changed, 331 insertions(+)
> create mode 100644 Documentation/networking/devlink/devlink-defaults.rst
>
>diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>index b5493a7f8f22..117300dd589c 100644
>--- a/Documentation/admin-guide/kernel-parameters.txt
>+++ b/Documentation/admin-guide/kernel-parameters.txt
>@@ -1249,6 +1249,31 @@ Kernel parameters
> 	dell_smm_hwmon.fan_max=
> 			[HW] Maximum configurable fan speed.
> 
>+	devlink_eswitch_mode=
>+			[NET]
>+			Format:
>+			<selector>=<mode>
>+
>+			<selector>:
>+			* | <handle>[,<handle>...]
>+
>+			<handle>:
>+			<bus-name>/<dev-name>
>+
>+			Configure default devlink eswitch mode for matching
>+			devlink instances during device initialization.
>+
>+			<mode>:
>+			legacy | switchdev | switchdev_inactive
>+
>+			Examples:
>+			devlink_eswitch_mode=*=switchdev
>+			devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>+			devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>+
>+			See Documentation/networking/devlink/devlink-defaults.rst
>+			for the full syntax.
>+
> 	dfltcc=		[HW,S390]
> 			Format: { on | off | def_only | inf_only | always }
> 			on:       s390 zlib hardware support for compression on
>diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
>new file mode 100644
>index 000000000000..380c9e99210e
>--- /dev/null
>+++ b/Documentation/networking/devlink/devlink-defaults.rst
>@@ -0,0 +1,78 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+==============================
>+Devlink Eswitch Mode Defaults
>+==============================
>+
>+Devlink eswitch mode defaults allow the eswitch mode to be provided on the
>+kernel command line and applied to matching devlink instances during device
>+initialization.
>+
>+The devlink device is selected by its devlink handle. For PCI devices this is
>+the same handle shown by ``devlink dev show``, for example
>+``pci/0000:08:00.0``.
>+
>+Kernel command line syntax
>+==========================
>+
>+Defaults are specified with the ``devlink_eswitch_mode=`` kernel command line
>+parameter.
>+
>+The general syntax is::
>+
>+  devlink_eswitch_mode=<selector>=<mode>
>+
>+``<selector>`` is either ``*`` or one or more devlink handles::
>+
>+  * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
>+
>+``*`` applies the mode to every devlink instance. All handles in the same
>+selector receive the same eswitch mode.
>+
>+``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
>+
>+Syntax rules
>+------------
>+
>+The following syntax rules apply:
>+
>+* Specify the default in one ``devlink_eswitch_mode=`` parameter. Repeated
>+  ``devlink_eswitch_mode=`` parameters are not accumulated.
>+* The ``devlink_eswitch_mode=`` value is limited by the kernel command line
>+  size.
>+* Whitespace is not allowed within the parameter value.
>+* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
>+  combined with explicit handles.
>+* ``<bus-name>`` and ``<dev-name>`` must not be empty.
>+* ``<dev-name>`` may contain ``:``. This allows PCI names such as
>+  ``0000:08:00.0``.
>+* Handles must not contain whitespace, ``*``, ``=`` or more than one ``/``.
>+* A comma separates handles.
>+* Comma-separated default assignments are not supported.
>+* Duplicate handles are rejected and the devlink eswitch mode default is
>+  ignored.
>+
>+The eswitch mode default corresponds to the userspace command::
>+
>+  devlink dev eswitch set <handle> mode <value>
>+
>+
>+Examples
>+========
>+
>+Set all devlink instances to switchdev mode::
>+
>+  devlink_eswitch_mode=*=switchdev
>+
>+Set one PCI devlink instance to switchdev mode::
>+
>+  devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>+
>+Set two PCI devlink instances to switchdev inactive mode::
>+
>+  devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>+
>+The following is invalid because comma-separated default assignments are not
>+supported::
>+
>+  devlink_eswitch_mode=pci/0000:08:00.0=switchdev,pci/0000:09:00.0=switchdev_inactive

Interesting. I would think that this is something user may want to set
for some usecases, no?


>diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>index 32f70879ddd0..93f09cb18c44 100644
>--- a/Documentation/networking/devlink/index.rst
>+++ b/Documentation/networking/devlink/index.rst
>@@ -56,6 +56,7 @@ general.
>    :maxdepth: 1
> 
>    devlink-dpipe
>+   devlink-defaults
>    devlink-eswitch-attr
>    devlink-flash
>    devlink-health
>diff --git a/net/devlink/core.c b/net/devlink/core.c

Wanna have this in a separate file perhaps? "default.c"?


>index fe9f6a0a67d5..5126509a9c4e 100644
>--- a/net/devlink/core.c
>+++ b/net/devlink/core.c
>@@ -4,6 +4,10 @@
>  * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
>  */
> 
>+#include <linux/init.h>
>+#include <linux/list.h>
>+#include <linux/slab.h>
>+#include <linux/string.h>
> #include <net/genetlink.h>
> #define CREATE_TRACE_POINTS
> #include <trace/events/devlink.h>
>@@ -16,6 +20,193 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
> 
> DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
> 
>+static char *devlink_default_esw_mode_param;
>+static bool devlink_default_esw_mode_match_all;
>+static enum devlink_eswitch_mode devlink_default_esw_mode;
>+static LIST_HEAD(devlink_default_esw_mode_nodes);
>+
>+struct devlink_default_esw_mode_node {
>+	struct list_head list;
>+	char *bus_name;
>+	char *dev_name;
>+};
>+
>+static int __init
>+devlink_default_esw_mode_to_value(const char *str,
>+				  enum devlink_eswitch_mode *mode)
>+{
>+	if (!strcmp(str, "legacy")) {
>+		*mode = DEVLINK_ESWITCH_MODE_LEGACY;
>+		return 0;
>+	}
>+	if (!strcmp(str, "switchdev")) {
>+		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
>+		return 0;
>+	}
>+	if (!strcmp(str, "switchdev_inactive")) {
>+		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
>+		return 0;
>+	}
>+
>+	return -EINVAL;
>+}
>+
>+static int __init
>+devlink_default_esw_mode_handle_parse(char *handle, char **bus_name,
>+				      char **dev_name)
>+{
>+	char *slash;
>+	char *p;
>+
>+	if (!*handle)
>+		return -EINVAL;
>+
>+	for (p = handle; *p; p++) {
>+		if (*p == '*' || *p == '=')
>+			return -EINVAL;
>+	}
>+
>+	slash = strchr(handle, '/');
>+	if (!slash || slash == handle || !slash[1])
>+		return -EINVAL;
>+	if (strchr(slash + 1, '/'))
>+		return -EINVAL;
>+
>+	*slash = '\0';
>+
>+	*bus_name = handle;
>+	*dev_name = slash + 1;
>+	return 0;
>+}
>+
>+static struct devlink_default_esw_mode_node *
>+devlink_default_esw_mode_node_find(const char *bus_name, const char *dev_name)
>+{
>+	struct devlink_default_esw_mode_node *node;
>+
>+	list_for_each_entry(node, &devlink_default_esw_mode_nodes, list) {
>+		if (!strcmp(node->bus_name, bus_name) &&
>+		    !strcmp(node->dev_name, dev_name))
>+			return node;
>+	}
>+
>+	return NULL;
>+}
>+
>+static int __init
>+devlink_default_esw_mode_node_add(const char *bus_name, const char *dev_name)
>+{
>+	struct devlink_default_esw_mode_node *node;
>+
>+	if (devlink_default_esw_mode_node_find(bus_name, dev_name))
>+		return -EEXIST;
>+
>+	node = kzalloc_obj(*node);
>+	if (!node)
>+		return -ENOMEM;
>+
>+	INIT_LIST_HEAD(&node->list);
>+	node->bus_name = kstrdup(bus_name, GFP_KERNEL);
>+	node->dev_name = kstrdup(dev_name, GFP_KERNEL);
>+	if (!node->bus_name || !node->dev_name) {
>+		kfree(node->bus_name);
>+		kfree(node->dev_name);
>+		kfree(node);
>+		return -ENOMEM;
>+	}
>+
>+	list_add_tail(&node->list, &devlink_default_esw_mode_nodes);
>+	return 0;
>+}
>+
>+static int __init devlink_default_esw_mode_handles_parse(char *handles)
>+{
>+	char *handle;
>+	int err;
>+
>+	if (!strcmp(handles, "*")) {
>+		devlink_default_esw_mode_match_all = true;
>+		return 0;
>+	}
>+
>+	while ((handle = strsep(&handles, ",")) != NULL) {
>+		char *bus_name;
>+		char *dev_name;
>+
>+		err = devlink_default_esw_mode_handle_parse(handle, &bus_name,
>+							    &dev_name);
>+		if (err)
>+			return err;
>+
>+		err = devlink_default_esw_mode_node_add(bus_name, dev_name);
>+		if (err)
>+			return err;
>+	}
>+
>+	return 0;
>+}
>+
>+static void __init
>+devlink_default_esw_mode_node_free(struct devlink_default_esw_mode_node *node)
>+{
>+	kfree(node->bus_name);
>+	kfree(node->dev_name);
>+	kfree(node);
>+}
>+
>+static void __init devlink_default_esw_mode_nodes_clear(void)
>+{
>+	struct devlink_default_esw_mode_node *node;
>+	struct devlink_default_esw_mode_node *node_tmp;
>+
>+	list_for_each_entry_safe(node, node_tmp,
>+				 &devlink_default_esw_mode_nodes, list) {
>+		list_del(&node->list);
>+		devlink_default_esw_mode_node_free(node);
>+	}
>+
>+	devlink_default_esw_mode_match_all = false;
>+}
>+
>+static int __init devlink_default_esw_mode_parse(char *str)
>+{
>+	char *handles;
>+	char *separator;
>+	char *mode;
>+	enum devlink_eswitch_mode esw_mode;
>+	int err;
>+
>+	if (!*str)
>+		return -EINVAL;
>+
>+	separator = strrchr(str, '=');
>+	if (!separator || separator == str || !separator[1])
>+		return -EINVAL;
>+
>+	*separator = '\0';
>+	handles = str;
>+	mode = separator + 1;
>+
>+	err = devlink_default_esw_mode_to_value(mode, &esw_mode);
>+	if (err)
>+		return err;
>+
>+	err = devlink_default_esw_mode_handles_parse(handles);
>+	if (err)
>+		devlink_default_esw_mode_nodes_clear();
>+	else
>+		devlink_default_esw_mode = esw_mode;
>+
>+	return err;
>+}
>+
>+static int __init devlink_default_esw_mode_setup(char *str)
>+{
>+	devlink_default_esw_mode_param = str;
>+	return 1;
>+}
>+__setup("devlink_eswitch_mode=", devlink_default_esw_mode_setup);
>+
> static struct devlink *devlinks_xa_get(unsigned long index)
> {
> 	struct devlink *devlink;
>@@ -382,6 +573,14 @@ struct devlink *devlinks_xa_lookup_get(struct net *net, unsigned long index)
> /**
>  * devl_register - Register devlink instance
>  * @devlink: devlink
>+ *
>+ * Make @devlink visible to userspace. Drivers must call this only after the
>+ * instance is fully initialized and its devlink operations can be called.
>+ *
>+ * Context: Caller must hold the devlink instance lock. Use devlink_register()
>+ * when the lock is not already held.
>+ *
>+ * Return: 0 on success.
>  */
> int devl_register(struct devlink *devlink)
> {
>@@ -580,6 +779,31 @@ static int __init devlink_init(void)
> {
> 	int err;
> 
>+	if (devlink_default_esw_mode_param) {
>+		char *def;
>+
>+		def = kstrdup(devlink_default_esw_mode_param, GFP_KERNEL);
>+		if (!def) {
>+			devlink_default_esw_mode_param = NULL;
>+			pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>+		} else {
>+			err = devlink_default_esw_mode_parse(def);
>+			kfree(def);
>+			if (err == -EEXIST) {
>+				devlink_default_esw_mode_param = NULL;
>+				pr_warn("devlink: duplicate eswitch mode handles ignored\n");
>+			} else if (err == -EINVAL) {
>+				devlink_default_esw_mode_param = NULL;
>+				pr_warn("devlink: invalid devlink_eswitch_mode parameter ignored\n");
>+			} else if (err == -ENOMEM) {
>+				devlink_default_esw_mode_param = NULL;
>+				pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>+			} else if (err) {
>+				goto out;
>+			}

Move this to a separate helper alongside the other "default" functions?


>+		}
>+	}
>+
> 	err = register_pernet_subsys(&devlink_pernet_ops);
> 	if (err)
> 		goto out;
>@@ -595,7 +819,10 @@ static int __init devlink_init(void)
> out_unreg_pernet_subsys:
> 	unregister_pernet_subsys(&devlink_pernet_ops);
> out:
>+	if (err)
>+		devlink_default_esw_mode_nodes_clear();
> 	WARN_ON(err);
>+
> 	return err;
> }
> 
>-- 
>2.43.0
>

  reply	other threads:[~2026-07-01  9:38 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-29 18:20 [PATCH net-next V4 0/6] evlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 1/6] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 2/6] devlink: Factor out eswitch mode setting Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults Mark Bloch
2026-07-01  9:38   ` Jiri Pirko [this message]
2026-07-01 12:55     ` Mark Bloch
2026-07-01 13:14     ` Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 4/6] devlink: Apply " Mark Bloch
2026-07-01  9:48   ` Jiri Pirko
2026-07-01 12:57     ` Mark Bloch
2026-07-01 14:09       ` Jiri Pirko
2026-07-01 17:42         ` Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 5/6] devlink: Add API to apply eswitch mode boot default Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 6/6] net/mlx5: Apply devlink eswitch mode boot default on probe Mark Bloch

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=akTencQhKSanuFeW@FV6GYCPJ69 \
    --to=jiri@resnulli.us \
    --cc=andrew+netdev@lunn.ch \
    --cc=corbet@lwn.net \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox