From: Mark Bloch <mbloch@nvidia.com>
To: Jiri Pirko <jiri@resnulli.us>, Eric Dumazet <edumazet@google.com>,
"Jakub Kicinski" <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>
Cc: Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>, <netdev@vger.kernel.org>,
<linux-rdma@vger.kernel.org>, <linux-doc@vger.kernel.org>,
Mark Bloch <mbloch@nvidia.com>
Subject: [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults
Date: Mon, 29 Jun 2026 21:20:58 +0300 [thread overview]
Message-ID: <20260629182102.245150-4-mbloch@nvidia.com> (raw)
In-Reply-To: <20260629182102.245150-1-mbloch@nvidia.com>
Add devlink_eswitch_mode= kernel command line parsing for a default
eswitch mode.
The supported syntax selects either all devlink handles or one explicit
comma-separated handle list:
devlink_eswitch_mode=*=<mode>
devlink_eswitch_mode=<handle>[,<handle>...]=<mode>
where <mode> is one of legacy, switchdev or switchdev_inactive. All
selected handles receive the same mode. Assigning different modes to
different handle lists in the same parameter value is not supported.
Store the parsed selector and mode in devlink core so the default can be
applied by a downstream patch.
Document the devlink_eswitch_mode= syntax and duplicate handle handling.
Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
.../admin-guide/kernel-parameters.txt | 25 ++
.../networking/devlink/devlink-defaults.rst | 78 ++++++
Documentation/networking/devlink/index.rst | 1 +
net/devlink/core.c | 227 ++++++++++++++++++
4 files changed, 331 insertions(+)
create mode 100644 Documentation/networking/devlink/devlink-defaults.rst
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index b5493a7f8f22..117300dd589c 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1249,6 +1249,31 @@ Kernel parameters
dell_smm_hwmon.fan_max=
[HW] Maximum configurable fan speed.
+ devlink_eswitch_mode=
+ [NET]
+ Format:
+ <selector>=<mode>
+
+ <selector>:
+ * | <handle>[,<handle>...]
+
+ <handle>:
+ <bus-name>/<dev-name>
+
+ Configure default devlink eswitch mode for matching
+ devlink instances during device initialization.
+
+ <mode>:
+ legacy | switchdev | switchdev_inactive
+
+ Examples:
+ devlink_eswitch_mode=*=switchdev
+ devlink_eswitch_mode=pci/0000:08:00.0=switchdev
+ devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
+
+ See Documentation/networking/devlink/devlink-defaults.rst
+ for the full syntax.
+
dfltcc= [HW,S390]
Format: { on | off | def_only | inf_only | always }
on: s390 zlib hardware support for compression on
diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
new file mode 100644
index 000000000000..380c9e99210e
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-defaults.rst
@@ -0,0 +1,78 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==============================
+Devlink Eswitch Mode Defaults
+==============================
+
+Devlink eswitch mode defaults allow the eswitch mode to be provided on the
+kernel command line and applied to matching devlink instances during device
+initialization.
+
+The devlink device is selected by its devlink handle. For PCI devices this is
+the same handle shown by ``devlink dev show``, for example
+``pci/0000:08:00.0``.
+
+Kernel command line syntax
+==========================
+
+Defaults are specified with the ``devlink_eswitch_mode=`` kernel command line
+parameter.
+
+The general syntax is::
+
+ devlink_eswitch_mode=<selector>=<mode>
+
+``<selector>`` is either ``*`` or one or more devlink handles::
+
+ * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
+
+``*`` applies the mode to every devlink instance. All handles in the same
+selector receive the same eswitch mode.
+
+``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
+
+Syntax rules
+------------
+
+The following syntax rules apply:
+
+* Specify the default in one ``devlink_eswitch_mode=`` parameter. Repeated
+ ``devlink_eswitch_mode=`` parameters are not accumulated.
+* The ``devlink_eswitch_mode=`` value is limited by the kernel command line
+ size.
+* Whitespace is not allowed within the parameter value.
+* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
+ combined with explicit handles.
+* ``<bus-name>`` and ``<dev-name>`` must not be empty.
+* ``<dev-name>`` may contain ``:``. This allows PCI names such as
+ ``0000:08:00.0``.
+* Handles must not contain whitespace, ``*``, ``=`` or more than one ``/``.
+* A comma separates handles.
+* Comma-separated default assignments are not supported.
+* Duplicate handles are rejected and the devlink eswitch mode default is
+ ignored.
+
+The eswitch mode default corresponds to the userspace command::
+
+ devlink dev eswitch set <handle> mode <value>
+
+
+Examples
+========
+
+Set all devlink instances to switchdev mode::
+
+ devlink_eswitch_mode=*=switchdev
+
+Set one PCI devlink instance to switchdev mode::
+
+ devlink_eswitch_mode=pci/0000:08:00.0=switchdev
+
+Set two PCI devlink instances to switchdev inactive mode::
+
+ devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
+
+The following is invalid because comma-separated default assignments are not
+supported::
+
+ devlink_eswitch_mode=pci/0000:08:00.0=switchdev,pci/0000:09:00.0=switchdev_inactive
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index 32f70879ddd0..93f09cb18c44 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -56,6 +56,7 @@ general.
:maxdepth: 1
devlink-dpipe
+ devlink-defaults
devlink-eswitch-attr
devlink-flash
devlink-health
diff --git a/net/devlink/core.c b/net/devlink/core.c
index fe9f6a0a67d5..5126509a9c4e 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -4,6 +4,10 @@
* Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
*/
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/string.h>
#include <net/genetlink.h>
#define CREATE_TRACE_POINTS
#include <trace/events/devlink.h>
@@ -16,6 +20,193 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
+static char *devlink_default_esw_mode_param;
+static bool devlink_default_esw_mode_match_all;
+static enum devlink_eswitch_mode devlink_default_esw_mode;
+static LIST_HEAD(devlink_default_esw_mode_nodes);
+
+struct devlink_default_esw_mode_node {
+ struct list_head list;
+ char *bus_name;
+ char *dev_name;
+};
+
+static int __init
+devlink_default_esw_mode_to_value(const char *str,
+ enum devlink_eswitch_mode *mode)
+{
+ if (!strcmp(str, "legacy")) {
+ *mode = DEVLINK_ESWITCH_MODE_LEGACY;
+ return 0;
+ }
+ if (!strcmp(str, "switchdev")) {
+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
+ return 0;
+ }
+ if (!strcmp(str, "switchdev_inactive")) {
+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
+ return 0;
+ }
+
+ return -EINVAL;
+}
+
+static int __init
+devlink_default_esw_mode_handle_parse(char *handle, char **bus_name,
+ char **dev_name)
+{
+ char *slash;
+ char *p;
+
+ if (!*handle)
+ return -EINVAL;
+
+ for (p = handle; *p; p++) {
+ if (*p == '*' || *p == '=')
+ return -EINVAL;
+ }
+
+ slash = strchr(handle, '/');
+ if (!slash || slash == handle || !slash[1])
+ return -EINVAL;
+ if (strchr(slash + 1, '/'))
+ return -EINVAL;
+
+ *slash = '\0';
+
+ *bus_name = handle;
+ *dev_name = slash + 1;
+ return 0;
+}
+
+static struct devlink_default_esw_mode_node *
+devlink_default_esw_mode_node_find(const char *bus_name, const char *dev_name)
+{
+ struct devlink_default_esw_mode_node *node;
+
+ list_for_each_entry(node, &devlink_default_esw_mode_nodes, list) {
+ if (!strcmp(node->bus_name, bus_name) &&
+ !strcmp(node->dev_name, dev_name))
+ return node;
+ }
+
+ return NULL;
+}
+
+static int __init
+devlink_default_esw_mode_node_add(const char *bus_name, const char *dev_name)
+{
+ struct devlink_default_esw_mode_node *node;
+
+ if (devlink_default_esw_mode_node_find(bus_name, dev_name))
+ return -EEXIST;
+
+ node = kzalloc_obj(*node);
+ if (!node)
+ return -ENOMEM;
+
+ INIT_LIST_HEAD(&node->list);
+ node->bus_name = kstrdup(bus_name, GFP_KERNEL);
+ node->dev_name = kstrdup(dev_name, GFP_KERNEL);
+ if (!node->bus_name || !node->dev_name) {
+ kfree(node->bus_name);
+ kfree(node->dev_name);
+ kfree(node);
+ return -ENOMEM;
+ }
+
+ list_add_tail(&node->list, &devlink_default_esw_mode_nodes);
+ return 0;
+}
+
+static int __init devlink_default_esw_mode_handles_parse(char *handles)
+{
+ char *handle;
+ int err;
+
+ if (!strcmp(handles, "*")) {
+ devlink_default_esw_mode_match_all = true;
+ return 0;
+ }
+
+ while ((handle = strsep(&handles, ",")) != NULL) {
+ char *bus_name;
+ char *dev_name;
+
+ err = devlink_default_esw_mode_handle_parse(handle, &bus_name,
+ &dev_name);
+ if (err)
+ return err;
+
+ err = devlink_default_esw_mode_node_add(bus_name, dev_name);
+ if (err)
+ return err;
+ }
+
+ return 0;
+}
+
+static void __init
+devlink_default_esw_mode_node_free(struct devlink_default_esw_mode_node *node)
+{
+ kfree(node->bus_name);
+ kfree(node->dev_name);
+ kfree(node);
+}
+
+static void __init devlink_default_esw_mode_nodes_clear(void)
+{
+ struct devlink_default_esw_mode_node *node;
+ struct devlink_default_esw_mode_node *node_tmp;
+
+ list_for_each_entry_safe(node, node_tmp,
+ &devlink_default_esw_mode_nodes, list) {
+ list_del(&node->list);
+ devlink_default_esw_mode_node_free(node);
+ }
+
+ devlink_default_esw_mode_match_all = false;
+}
+
+static int __init devlink_default_esw_mode_parse(char *str)
+{
+ char *handles;
+ char *separator;
+ char *mode;
+ enum devlink_eswitch_mode esw_mode;
+ int err;
+
+ if (!*str)
+ return -EINVAL;
+
+ separator = strrchr(str, '=');
+ if (!separator || separator == str || !separator[1])
+ return -EINVAL;
+
+ *separator = '\0';
+ handles = str;
+ mode = separator + 1;
+
+ err = devlink_default_esw_mode_to_value(mode, &esw_mode);
+ if (err)
+ return err;
+
+ err = devlink_default_esw_mode_handles_parse(handles);
+ if (err)
+ devlink_default_esw_mode_nodes_clear();
+ else
+ devlink_default_esw_mode = esw_mode;
+
+ return err;
+}
+
+static int __init devlink_default_esw_mode_setup(char *str)
+{
+ devlink_default_esw_mode_param = str;
+ return 1;
+}
+__setup("devlink_eswitch_mode=", devlink_default_esw_mode_setup);
+
static struct devlink *devlinks_xa_get(unsigned long index)
{
struct devlink *devlink;
@@ -382,6 +573,14 @@ struct devlink *devlinks_xa_lookup_get(struct net *net, unsigned long index)
/**
* devl_register - Register devlink instance
* @devlink: devlink
+ *
+ * Make @devlink visible to userspace. Drivers must call this only after the
+ * instance is fully initialized and its devlink operations can be called.
+ *
+ * Context: Caller must hold the devlink instance lock. Use devlink_register()
+ * when the lock is not already held.
+ *
+ * Return: 0 on success.
*/
int devl_register(struct devlink *devlink)
{
@@ -580,6 +779,31 @@ static int __init devlink_init(void)
{
int err;
+ if (devlink_default_esw_mode_param) {
+ char *def;
+
+ def = kstrdup(devlink_default_esw_mode_param, GFP_KERNEL);
+ if (!def) {
+ devlink_default_esw_mode_param = NULL;
+ pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
+ } else {
+ err = devlink_default_esw_mode_parse(def);
+ kfree(def);
+ if (err == -EEXIST) {
+ devlink_default_esw_mode_param = NULL;
+ pr_warn("devlink: duplicate eswitch mode handles ignored\n");
+ } else if (err == -EINVAL) {
+ devlink_default_esw_mode_param = NULL;
+ pr_warn("devlink: invalid devlink_eswitch_mode parameter ignored\n");
+ } else if (err == -ENOMEM) {
+ devlink_default_esw_mode_param = NULL;
+ pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
+ } else if (err) {
+ goto out;
+ }
+ }
+ }
+
err = register_pernet_subsys(&devlink_pernet_ops);
if (err)
goto out;
@@ -595,7 +819,10 @@ static int __init devlink_init(void)
out_unreg_pernet_subsys:
unregister_pernet_subsys(&devlink_pernet_ops);
out:
+ if (err)
+ devlink_default_esw_mode_nodes_clear();
WARN_ON(err);
+
return err;
}
--
2.43.0
next prev parent reply other threads:[~2026-06-29 18:22 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-29 18:20 [PATCH net-next V4 0/6] evlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 1/6] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 2/6] devlink: Factor out eswitch mode setting Mark Bloch
2026-06-29 18:20 ` Mark Bloch [this message]
2026-06-29 18:20 ` [PATCH net-next V4 4/6] devlink: Apply eswitch mode boot defaults Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 5/6] devlink: Add API to apply eswitch mode boot default Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 6/6] net/mlx5: Apply devlink eswitch mode boot default on probe Mark Bloch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20260629182102.245150-4-mbloch@nvidia.com \
--to=mbloch@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=corbet@lwn.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox