From: Tariq Toukan <tariqt@nvidia.com>
To: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>
Cc: Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Jiri Pirko <jiri@resnulli.us>, Simon Horman <horms@kernel.org>,
"Saeed Mahameed" <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>, Mark Bloch <mbloch@nvidia.com>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
Andrew Morton <akpm@linux-foundation.org>,
Randy Dunlap <rdunlap@infradead.org>,
Thomas Gleixner <tglx@kernel.org>, Petr Mladek <pmladek@suse.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
"Tejun Heo" <tj@kernel.org>, Vlastimil Babka <vbabka@kernel.org>,
Feng Tang <feng.tang@linux.alibaba.com>,
Christian Brauner <brauner@kernel.org>,
"Dave Hansen" <dave.hansen@linux.intel.com>,
Dapeng Mi <dapeng1.mi@linux.intel.com>,
Kees Cook <kees@kernel.org>, Marco Elver <elver@google.com>,
Li RongQing <lirongqing@baidu.com>,
Eric Biggers <ebiggers@kernel.org>,
"Paul E. McKenney" <paulmck@kernel.org>,
<linux-doc@vger.kernel.org>, <linux-kernel@vger.kernel.org>,
<netdev@vger.kernel.org>, <linux-rdma@vger.kernel.org>,
Gal Pressman <gal@nvidia.com>,
Dragos Tatulea <dtatulea@nvidia.com>,
Jiri Pirko <jiri@nvidia.com>
Subject: [PATCH net-next 2/3] devlink: Add eswitch mode boot defaults
Date: Thu, 21 May 2026 10:24:33 +0300 [thread overview]
Message-ID: <20260521072434.362624-3-tariqt@nvidia.com> (raw)
In-Reply-To: <20260521072434.362624-1-tariqt@nvidia.com>
From: Mark Bloch <mbloch@nvidia.com>
Add devlink_eswitch_mode= command line support for setting an eswitch
mode during device initialization.
The supported syntax selects either all devlink handles or one explicit
comma-separated handle list:
devlink_eswitch_mode=[*]:<mode>
devlink_eswitch_mode=[<handle>[,<handle>...]]:<mode>
where <mode> is one of legacy, switchdev or switchdev_inactive. All
selected handles receive the same mode. Assigning different modes to
different handle lists in the same parameter value is not supported.
The default is applied through the existing eswitch_mode_set() devlink
operation, matching the userspace devlink eswitch set command.
Expose devl_apply_default_esw_mode() so drivers can apply the default at
the point where their devlink instance and eswitch operations are ready.
Document the devlink_eswitch_mode= syntax and duplicate handle handling.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
---
.../admin-guide/kernel-parameters.txt | 25 ++
.../networking/devlink/devlink-defaults.rst | 80 ++++++
Documentation/networking/devlink/index.rst | 1 +
include/net/devlink.h | 1 +
net/devlink/core.c | 255 ++++++++++++++++++
5 files changed, 362 insertions(+)
create mode 100644 Documentation/networking/devlink/devlink-defaults.rst
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7834ee927310..f87ae561c0dc 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1278,6 +1278,31 @@ Kernel parameters
dell_smm_hwmon.fan_max=
[HW] Maximum configurable fan speed.
+ devlink_eswitch_mode=
+ [NET]
+ Format:
+ [<selector>]:<mode>
+
+ <selector>:
+ * | <handle>[,<handle>...]
+
+ <handle>:
+ <bus-name>/<dev-name>
+
+ Configure default devlink eswitch mode for matching
+ devlink instances during device initialization.
+
+ <mode>:
+ legacy | switchdev | switchdev_inactive
+
+ Examples:
+ devlink_eswitch_mode=[*]:switchdev
+ devlink_eswitch_mode=[pci/0000:08:00.0]:switchdev
+ devlink_eswitch_mode=[pci/0000:08:00.0,pci/0000:09:00.1]:legacy
+
+ See Documentation/networking/devlink/devlink-defaults.rst
+ for the full syntax.
+
dfltcc= [HW,S390]
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
new file mode 100644
index 000000000000..b554e75eeeea
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-defaults.rst
@@ -0,0 +1,80 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Devlink Eswitch Mode Defaults
+==============================
+
+Devlink eswitch mode defaults allow the eswitch mode to be provided on the
+kernel command line and applied to matching devlink instances during device
+initialization.
+
+The devlink device is selected by its devlink handle. For PCI devices this is
+the same handle shown by ``devlink dev show``, for example
+``pci/0000:08:00.0``.
+
+Kernel command line syntax
+==========================
+
+Defaults are specified with the ``devlink_eswitch_mode=`` kernel command line
+parameter.
+
+The general syntax is::
+
+ devlink_eswitch_mode=[<selector>]:<mode>
+
+``<selector>`` is either ``*`` or one or more devlink handles::
+
+ * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
+
+``*`` applies the mode to every devlink instance. All handles in the same
+``[]`` list receive the same eswitch mode.
+
+``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
+
+Syntax rules
+------------
+
+The following syntax rules apply:
+
+* Specify the default in one ``devlink_eswitch_mode=`` parameter. Repeated
+ ``devlink_eswitch_mode=`` parameters are not accumulated.
+* The ``devlink_eswitch_mode=`` value is limited by the kernel command line
+ size.
+* Whitespace is not allowed within the parameter value.
+* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
+ combined with explicit handles.
+* ``<bus-name>`` and ``<dev-name>`` must not be empty.
+* ``<bus-name>`` must not contain ``:``.
+* ``<dev-name>`` may contain ``:``. This allows PCI names such as
+ ``0000:08:00.0``.
+* Handles must not contain whitespace, ``[``, ``]``, ``*`` or more than one
+ ``/``.
+* A comma inside ``[]`` separates handles.
+* Comma-separated default groups are not supported.
+* Duplicate handles are rejected and the devlink eswitch mode default is
+ ignored.
+
+The eswitch mode default corresponds to the userspace command::
+
+ devlink dev eswitch set <handle> mode <value>
+
+
+Examples
+========
+
+Set all devlink instances to switchdev mode::
+
+ devlink_eswitch_mode=[*]:switchdev
+
+Set one PCI devlink instance to switchdev mode::
+
+ devlink_eswitch_mode=[pci/0000:08:00.0]:switchdev
+
+Set two PCI devlink instances to legacy mode::
+
+ devlink_eswitch_mode=[pci/0000:08:00.0,pci/0000:09:00.1]:legacy
+
+The following is invalid because comma-separated default groups are not
+supported::
+
+ devlink_eswitch_mode=[pci/0000:08:00.0]:switchdev,[pci/0000:09:00.0]:switchdev_inactive
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index f7ba7dcf477d..0d27a7008b14 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -56,6 +56,7 @@ general.
:maxdepth: 1
devlink-dpipe
+ devlink-defaults
devlink-eswitch-attr
devlink-flash
devlink-health
diff --git a/include/net/devlink.h b/include/net/devlink.h
index bcd31de1f890..98885f7c6c10 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1622,6 +1622,7 @@ int devl_trylock(struct devlink *devlink);
void devl_unlock(struct devlink *devlink);
void devl_assert_locked(struct devlink *devlink);
bool devl_lock_is_held(struct devlink *devlink);
+int devl_apply_default_esw_mode(struct devlink *devlink);
DEFINE_GUARD(devl, struct devlink *, devl_lock(_T), devl_unlock(_T));
struct ib_device;
diff --git a/net/devlink/core.c b/net/devlink/core.c
index eeb6a71f5f56..4bc1734878d1 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -4,6 +4,10 @@
* Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
*/
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/string.h>
#include <net/genetlink.h>
#define CREATE_TRACE_POINTS
#include <trace/events/devlink.h>
@@ -16,6 +20,233 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
+static char *devlink_default_esw_mode_param;
+static bool devlink_default_esw_mode_match_all;
+static enum devlink_eswitch_mode devlink_default_esw_mode;
+static LIST_HEAD(devlink_default_esw_mode_nodes);
+
+struct devlink_default_esw_mode_node {
+ struct list_head list;
+ char *bus_name;
+ char *dev_name;
+};
+
+static int __init
+devlink_default_esw_mode_to_value(const char *str,
+ enum devlink_eswitch_mode *mode)
+{
+ if (!strcmp(str, "legacy")) {
+ *mode = DEVLINK_ESWITCH_MODE_LEGACY;
+ return 0;
+ }
+ if (!strcmp(str, "switchdev")) {
+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
+ return 0;
+ }
+ if (!strcmp(str, "switchdev_inactive")) {
+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static int devlink_default_esw_mode_apply(struct devlink *devlink)
+{
+ const struct devlink_ops *ops = devlink->ops;
+
+ if (!ops->eswitch_mode_set)
+ return -EOPNOTSUPP;
+
+ return ops->eswitch_mode_set(devlink, devlink_default_esw_mode,
+ NULL);
+}
+
+static int __init
+devlink_default_esw_mode_handle_parse(char *handle, char **bus_name,
+ char **dev_name)
+{
+ char *slash;
+ char *p;
+
+ if (!handle || !*handle)
+ return -EINVAL;
+
+ for (p = handle; *p; p++) {
+ if (*p == '[' || *p == ']' || *p == '*')
+ return -EINVAL;
+ }
+
+ slash = strchr(handle, '/');
+ if (!slash || slash == handle || !slash[1])
+ return -EINVAL;
+ if (strchr(slash + 1, '/'))
+ return -EINVAL;
+
+ *slash = '\0';
+ if (strchr(handle, ':'))
+ return -EINVAL;
+
+ *bus_name = handle;
+ *dev_name = slash + 1;
+ return 0;
+}
+
+static struct devlink_default_esw_mode_node *
+devlink_default_esw_mode_node_find(const char *bus_name, const char *dev_name)
+{
+ struct devlink_default_esw_mode_node *node;
+
+ list_for_each_entry(node, &devlink_default_esw_mode_nodes, list) {
+ if (!strcmp(node->bus_name, bus_name) &&
+ !strcmp(node->dev_name, dev_name))
+ return node;
+ }
+
+ return NULL;
+}
+
+static int __init
+devlink_default_esw_mode_node_add(const char *bus_name, const char *dev_name)
+{
+ struct devlink_default_esw_mode_node *node;
+
+ if (devlink_default_esw_mode_node_find(bus_name, dev_name))
+ return -EEXIST;
+
+ node = kzalloc_obj(*node);
+ if (!node)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&node->list);
+ node->bus_name = kstrdup(bus_name, GFP_KERNEL);
+ node->dev_name = kstrdup(dev_name, GFP_KERNEL);
+ if (!node->bus_name || !node->dev_name) {
+ kfree(node->bus_name);
+ kfree(node->dev_name);
+ kfree(node);
+ return -ENOMEM;
+ }
+
+ list_add_tail(&node->list, &devlink_default_esw_mode_nodes);
+ return 0;
+}
+
+static int __init devlink_default_esw_mode_handles_parse(char *handles)
+{
+ char *handle;
+ int err;
+
+ if (!strcmp(handles, "*")) {
+ devlink_default_esw_mode_match_all = true;
+ return 0;
+ }
+
+ while ((handle = strsep(&handles, ",")) != NULL) {
+ char *bus_name;
+ char *dev_name;
+
+ err = devlink_default_esw_mode_handle_parse(handle, &bus_name,
+ &dev_name);
+ if (err)
+ return err;
+
+ err = devlink_default_esw_mode_node_add(bus_name, dev_name);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static void __init
+devlink_default_esw_mode_node_free(struct devlink_default_esw_mode_node *node)
+{
+ kfree(node->bus_name);
+ kfree(node->dev_name);
+ kfree(node);
+}
+
+static void __init devlink_default_esw_mode_nodes_clear(void)
+{
+ struct devlink_default_esw_mode_node *node;
+ struct devlink_default_esw_mode_node *node_tmp;
+
+ list_for_each_entry_safe(node, node_tmp,
+ &devlink_default_esw_mode_nodes, list) {
+ list_del(&node->list);
+ devlink_default_esw_mode_node_free(node);
+ }
+
+ devlink_default_esw_mode_match_all = false;
+}
+
+static int __init devlink_default_esw_mode_parse(char *str)
+{
+ char *handles_end;
+ char *handles;
+ char *mode;
+ int err;
+
+ if (!str || *str != '[')
+ return -EINVAL;
+
+ handles = str + 1;
+ handles_end = strchr(handles, ']');
+ if (!handles_end || handles_end[1] != ':' || !handles_end[2])
+ return -EINVAL;
+
+ *handles_end = '\0';
+ mode = handles_end + 2;
+ if (!*handles)
+ return -EINVAL;
+
+ err = devlink_default_esw_mode_to_value(mode,
+ &devlink_default_esw_mode);
+ if (err)
+ return err;
+
+ err = devlink_default_esw_mode_handles_parse(handles);
+ if (err)
+ devlink_default_esw_mode_nodes_clear();
+
+ return err;
+}
+
+/**
+ * devl_apply_default_esw_mode - Apply default eswitch mode to devlink instance
+ * @devlink: devlink
+ *
+ * The caller must hold the devlink instance lock.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int devl_apply_default_esw_mode(struct devlink *devlink)
+{
+ const char *bus_name = devlink_bus_name(devlink);
+ const char *dev_name = devlink_dev_name(devlink);
+ struct devlink_default_esw_mode_node *node;
+
+ devl_assert_locked(devlink);
+
+ if (devlink_default_esw_mode_match_all)
+ return devlink_default_esw_mode_apply(devlink);
+
+ node = devlink_default_esw_mode_node_find(bus_name, dev_name);
+ if (node)
+ return devlink_default_esw_mode_apply(devlink);
+
+ return 0;
+}
+EXPORT_SYMBOL_GPL(devl_apply_default_esw_mode);
+
+static int __init devlink_default_esw_mode_setup(char *str)
+{
+ devlink_default_esw_mode_param = str;
+ return 1;
+}
+__setup("devlink_eswitch_mode=", devlink_default_esw_mode_setup);
+
static struct devlink *devlinks_xa_get(unsigned long index)
{
struct devlink *devlink;
@@ -578,6 +809,27 @@ static int __init devlink_init(void)
{
int err;
+ if (devlink_default_esw_mode_param) {
+ char *def;
+
+ def = kstrdup(devlink_default_esw_mode_param, GFP_KERNEL);
+ if (!def) {
+ err = -ENOMEM;
+ goto out;
+ }
+ err = devlink_default_esw_mode_parse(def);
+ kfree(def);
+ if (err == -EEXIST) {
+ devlink_default_esw_mode_param = NULL;
+ pr_warn("devlink: duplicate eswitch mode handles ignored\n");
+ } else if (err == -EINVAL) {
+ devlink_default_esw_mode_param = NULL;
+ pr_warn("devlink: invalid devlink_eswitch_mode parameter ignored\n");
+ } else if (err) {
+ goto out;
+ }
+ }
+
err = register_pernet_subsys(&devlink_pernet_ops);
if (err)
goto out;
@@ -593,7 +845,10 @@ static int __init devlink_init(void)
out_unreg_pernet_subsys:
unregister_pernet_subsys(&devlink_pernet_ops);
out:
+ if (err)
+ devlink_default_esw_mode_nodes_clear();
WARN_ON(err);
+
return err;
}
--
2.44.0
next prev parent reply other threads:[~2026-05-21 7:25 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-21 7:24 [PATCH net-next 0/3] devlink: Add boot-time eswitch mode defaults Tariq Toukan
2026-05-21 7:24 ` [PATCH net-next 1/3] net/mlx5: Clear FW reset-in-progress bit before reload Tariq Toukan
2026-05-21 7:24 ` Tariq Toukan [this message]
2026-05-21 7:24 ` [PATCH net-next 3/3] net/mlx5: Apply devlink default eswitch mode during init Tariq Toukan
2026-05-21 13:16 ` Mark Bloch
2026-05-21 13:41 ` Thomas Weißschuh
2026-05-21 21:02 ` Mark Bloch
2026-05-26 7:44 ` Jiri Pirko
2026-05-25 19:42 ` [PATCH net-next 0/3] devlink: Add boot-time eswitch mode defaults Jakub Kicinski
2026-05-26 7:41 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260521072434.362624-3-tariqt@nvidia.com \
--to=tariqt@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=andrew+netdev@lunn.ch \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dapeng1.mi@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=dtatulea@nvidia.com \
--cc=ebiggers@kernel.org \
--cc=edumazet@google.com \
--cc=elver@google.com \
--cc=feng.tang@linux.alibaba.com \
--cc=gal@nvidia.com \
--cc=horms@kernel.org \
--cc=jiri@nvidia.com \
--cc=jiri@resnulli.us \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rdunlap@infradead.org \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=tglx@kernel.org \
--cc=tj@kernel.org \
--cc=vbabka@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox