From: Jiri Pirko <jiri@resnulli.us>
To: Mark Bloch <mbloch@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>,
Paolo Abeni <pabeni@redhat.com>, Simon Horman <horms@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-doc@vger.kernel.org
Subject: Re: [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults
Date: Wed, 1 Jul 2026 11:38:03 +0200 [thread overview]
Message-ID: <akTencQhKSanuFeW@FV6GYCPJ69> (raw)
In-Reply-To: <20260629182102.245150-4-mbloch@nvidia.com>
Mon, Jun 29, 2026 at 08:20:58PM +0200, mbloch@nvidia.com wrote:
>Add devlink_eswitch_mode= kernel command line parsing for a default
>eswitch mode.
>
>The supported syntax selects either all devlink handles or one explicit
>comma-separated handle list:
>
> devlink_eswitch_mode=*=<mode>
>
> devlink_eswitch_mode=<handle>[,<handle>...]=<mode>
>
>where <mode> is one of legacy, switchdev or switchdev_inactive. All
>selected handles receive the same mode. Assigning different modes to
>different handle lists in the same parameter value is not supported.
>
>Store the parsed selector and mode in devlink core so the default can be
>applied by a downstream patch.
>
>Document the devlink_eswitch_mode= syntax and duplicate handle handling.
>
>Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>---
> .../admin-guide/kernel-parameters.txt | 25 ++
> .../networking/devlink/devlink-defaults.rst | 78 ++++++
> Documentation/networking/devlink/index.rst | 1 +
> net/devlink/core.c | 227 ++++++++++++++++++
> 4 files changed, 331 insertions(+)
> create mode 100644 Documentation/networking/devlink/devlink-defaults.rst
>
>diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>index b5493a7f8f22..117300dd589c 100644
>--- a/Documentation/admin-guide/kernel-parameters.txt
>+++ b/Documentation/admin-guide/kernel-parameters.txt
>@@ -1249,6 +1249,31 @@ Kernel parameters
> dell_smm_hwmon.fan_max=
> [HW] Maximum configurable fan speed.
>
>+ devlink_eswitch_mode=
>+ [NET]
>+ Format:
>+ <selector>=<mode>
>+
>+ <selector>:
>+ * | <handle>[,<handle>...]
>+
>+ <handle>:
>+ <bus-name>/<dev-name>
>+
>+ Configure default devlink eswitch mode for matching
>+ devlink instances during device initialization.
>+
>+ <mode>:
>+ legacy | switchdev | switchdev_inactive
>+
>+ Examples:
>+ devlink_eswitch_mode=*=switchdev
>+ devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>+ devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>+
>+ See Documentation/networking/devlink/devlink-defaults.rst
>+ for the full syntax.
>+
> dfltcc= [HW,S390]
> Format: { on | off | def_only | inf_only | always }
> on: s390 zlib hardware support for compression on
>diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
>new file mode 100644
>index 000000000000..380c9e99210e
>--- /dev/null
>+++ b/Documentation/networking/devlink/devlink-defaults.rst
>@@ -0,0 +1,78 @@
>+.. SPDX-License-Identifier: GPL-2.0
>+
>+==============================
>+Devlink Eswitch Mode Defaults
>+==============================
>+
>+Devlink eswitch mode defaults allow the eswitch mode to be provided on the
>+kernel command line and applied to matching devlink instances during device
>+initialization.
>+
>+The devlink device is selected by its devlink handle. For PCI devices this is
>+the same handle shown by ``devlink dev show``, for example
>+``pci/0000:08:00.0``.
>+
>+Kernel command line syntax
>+==========================
>+
>+Defaults are specified with the ``devlink_eswitch_mode=`` kernel command line
>+parameter.
>+
>+The general syntax is::
>+
>+ devlink_eswitch_mode=<selector>=<mode>
>+
>+``<selector>`` is either ``*`` or one or more devlink handles::
>+
>+ * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
>+
>+``*`` applies the mode to every devlink instance. All handles in the same
>+selector receive the same eswitch mode.
>+
>+``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
>+
>+Syntax rules
>+------------
>+
>+The following syntax rules apply:
>+
>+* Specify the default in one ``devlink_eswitch_mode=`` parameter. Repeated
>+ ``devlink_eswitch_mode=`` parameters are not accumulated.
>+* The ``devlink_eswitch_mode=`` value is limited by the kernel command line
>+ size.
>+* Whitespace is not allowed within the parameter value.
>+* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
>+ combined with explicit handles.
>+* ``<bus-name>`` and ``<dev-name>`` must not be empty.
>+* ``<dev-name>`` may contain ``:``. This allows PCI names such as
>+ ``0000:08:00.0``.
>+* Handles must not contain whitespace, ``*``, ``=`` or more than one ``/``.
>+* A comma separates handles.
>+* Comma-separated default assignments are not supported.
>+* Duplicate handles are rejected and the devlink eswitch mode default is
>+ ignored.
>+
>+The eswitch mode default corresponds to the userspace command::
>+
>+ devlink dev eswitch set <handle> mode <value>
>+
>+
>+Examples
>+========
>+
>+Set all devlink instances to switchdev mode::
>+
>+ devlink_eswitch_mode=*=switchdev
>+
>+Set one PCI devlink instance to switchdev mode::
>+
>+ devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>+
>+Set two PCI devlink instances to switchdev inactive mode::
>+
>+ devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>+
>+The following is invalid because comma-separated default assignments are not
>+supported::
>+
>+ devlink_eswitch_mode=pci/0000:08:00.0=switchdev,pci/0000:09:00.0=switchdev_inactive
Interesting. I would think that this is something user may want to set
for some usecases, no?
>diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>index 32f70879ddd0..93f09cb18c44 100644
>--- a/Documentation/networking/devlink/index.rst
>+++ b/Documentation/networking/devlink/index.rst
>@@ -56,6 +56,7 @@ general.
> :maxdepth: 1
>
> devlink-dpipe
>+ devlink-defaults
> devlink-eswitch-attr
> devlink-flash
> devlink-health
>diff --git a/net/devlink/core.c b/net/devlink/core.c
Wanna have this in a separate file perhaps? "default.c"?
>index fe9f6a0a67d5..5126509a9c4e 100644
>--- a/net/devlink/core.c
>+++ b/net/devlink/core.c
>@@ -4,6 +4,10 @@
> * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
> */
>
>+#include <linux/init.h>
>+#include <linux/list.h>
>+#include <linux/slab.h>
>+#include <linux/string.h>
> #include <net/genetlink.h>
> #define CREATE_TRACE_POINTS
> #include <trace/events/devlink.h>
>@@ -16,6 +20,193 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
>
> DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
>
>+static char *devlink_default_esw_mode_param;
>+static bool devlink_default_esw_mode_match_all;
>+static enum devlink_eswitch_mode devlink_default_esw_mode;
>+static LIST_HEAD(devlink_default_esw_mode_nodes);
>+
>+struct devlink_default_esw_mode_node {
>+ struct list_head list;
>+ char *bus_name;
>+ char *dev_name;
>+};
>+
>+static int __init
>+devlink_default_esw_mode_to_value(const char *str,
>+ enum devlink_eswitch_mode *mode)
>+{
>+ if (!strcmp(str, "legacy")) {
>+ *mode = DEVLINK_ESWITCH_MODE_LEGACY;
>+ return 0;
>+ }
>+ if (!strcmp(str, "switchdev")) {
>+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
>+ return 0;
>+ }
>+ if (!strcmp(str, "switchdev_inactive")) {
>+ *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
>+ return 0;
>+ }
>+
>+ return -EINVAL;
>+}
>+
>+static int __init
>+devlink_default_esw_mode_handle_parse(char *handle, char **bus_name,
>+ char **dev_name)
>+{
>+ char *slash;
>+ char *p;
>+
>+ if (!*handle)
>+ return -EINVAL;
>+
>+ for (p = handle; *p; p++) {
>+ if (*p == '*' || *p == '=')
>+ return -EINVAL;
>+ }
>+
>+ slash = strchr(handle, '/');
>+ if (!slash || slash == handle || !slash[1])
>+ return -EINVAL;
>+ if (strchr(slash + 1, '/'))
>+ return -EINVAL;
>+
>+ *slash = '\0';
>+
>+ *bus_name = handle;
>+ *dev_name = slash + 1;
>+ return 0;
>+}
>+
>+static struct devlink_default_esw_mode_node *
>+devlink_default_esw_mode_node_find(const char *bus_name, const char *dev_name)
>+{
>+ struct devlink_default_esw_mode_node *node;
>+
>+ list_for_each_entry(node, &devlink_default_esw_mode_nodes, list) {
>+ if (!strcmp(node->bus_name, bus_name) &&
>+ !strcmp(node->dev_name, dev_name))
>+ return node;
>+ }
>+
>+ return NULL;
>+}
>+
>+static int __init
>+devlink_default_esw_mode_node_add(const char *bus_name, const char *dev_name)
>+{
>+ struct devlink_default_esw_mode_node *node;
>+
>+ if (devlink_default_esw_mode_node_find(bus_name, dev_name))
>+ return -EEXIST;
>+
>+ node = kzalloc_obj(*node);
>+ if (!node)
>+ return -ENOMEM;
>+
>+ INIT_LIST_HEAD(&node->list);
>+ node->bus_name = kstrdup(bus_name, GFP_KERNEL);
>+ node->dev_name = kstrdup(dev_name, GFP_KERNEL);
>+ if (!node->bus_name || !node->dev_name) {
>+ kfree(node->bus_name);
>+ kfree(node->dev_name);
>+ kfree(node);
>+ return -ENOMEM;
>+ }
>+
>+ list_add_tail(&node->list, &devlink_default_esw_mode_nodes);
>+ return 0;
>+}
>+
>+static int __init devlink_default_esw_mode_handles_parse(char *handles)
>+{
>+ char *handle;
>+ int err;
>+
>+ if (!strcmp(handles, "*")) {
>+ devlink_default_esw_mode_match_all = true;
>+ return 0;
>+ }
>+
>+ while ((handle = strsep(&handles, ",")) != NULL) {
>+ char *bus_name;
>+ char *dev_name;
>+
>+ err = devlink_default_esw_mode_handle_parse(handle, &bus_name,
>+ &dev_name);
>+ if (err)
>+ return err;
>+
>+ err = devlink_default_esw_mode_node_add(bus_name, dev_name);
>+ if (err)
>+ return err;
>+ }
>+
>+ return 0;
>+}
>+
>+static void __init
>+devlink_default_esw_mode_node_free(struct devlink_default_esw_mode_node *node)
>+{
>+ kfree(node->bus_name);
>+ kfree(node->dev_name);
>+ kfree(node);
>+}
>+
>+static void __init devlink_default_esw_mode_nodes_clear(void)
>+{
>+ struct devlink_default_esw_mode_node *node;
>+ struct devlink_default_esw_mode_node *node_tmp;
>+
>+ list_for_each_entry_safe(node, node_tmp,
>+ &devlink_default_esw_mode_nodes, list) {
>+ list_del(&node->list);
>+ devlink_default_esw_mode_node_free(node);
>+ }
>+
>+ devlink_default_esw_mode_match_all = false;
>+}
>+
>+static int __init devlink_default_esw_mode_parse(char *str)
>+{
>+ char *handles;
>+ char *separator;
>+ char *mode;
>+ enum devlink_eswitch_mode esw_mode;
>+ int err;
>+
>+ if (!*str)
>+ return -EINVAL;
>+
>+ separator = strrchr(str, '=');
>+ if (!separator || separator == str || !separator[1])
>+ return -EINVAL;
>+
>+ *separator = '\0';
>+ handles = str;
>+ mode = separator + 1;
>+
>+ err = devlink_default_esw_mode_to_value(mode, &esw_mode);
>+ if (err)
>+ return err;
>+
>+ err = devlink_default_esw_mode_handles_parse(handles);
>+ if (err)
>+ devlink_default_esw_mode_nodes_clear();
>+ else
>+ devlink_default_esw_mode = esw_mode;
>+
>+ return err;
>+}
>+
>+static int __init devlink_default_esw_mode_setup(char *str)
>+{
>+ devlink_default_esw_mode_param = str;
>+ return 1;
>+}
>+__setup("devlink_eswitch_mode=", devlink_default_esw_mode_setup);
>+
> static struct devlink *devlinks_xa_get(unsigned long index)
> {
> struct devlink *devlink;
>@@ -382,6 +573,14 @@ struct devlink *devlinks_xa_lookup_get(struct net *net, unsigned long index)
> /**
> * devl_register - Register devlink instance
> * @devlink: devlink
>+ *
>+ * Make @devlink visible to userspace. Drivers must call this only after the
>+ * instance is fully initialized and its devlink operations can be called.
>+ *
>+ * Context: Caller must hold the devlink instance lock. Use devlink_register()
>+ * when the lock is not already held.
>+ *
>+ * Return: 0 on success.
> */
> int devl_register(struct devlink *devlink)
> {
>@@ -580,6 +779,31 @@ static int __init devlink_init(void)
> {
> int err;
>
>+ if (devlink_default_esw_mode_param) {
>+ char *def;
>+
>+ def = kstrdup(devlink_default_esw_mode_param, GFP_KERNEL);
>+ if (!def) {
>+ devlink_default_esw_mode_param = NULL;
>+ pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>+ } else {
>+ err = devlink_default_esw_mode_parse(def);
>+ kfree(def);
>+ if (err == -EEXIST) {
>+ devlink_default_esw_mode_param = NULL;
>+ pr_warn("devlink: duplicate eswitch mode handles ignored\n");
>+ } else if (err == -EINVAL) {
>+ devlink_default_esw_mode_param = NULL;
>+ pr_warn("devlink: invalid devlink_eswitch_mode parameter ignored\n");
>+ } else if (err == -ENOMEM) {
>+ devlink_default_esw_mode_param = NULL;
>+ pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>+ } else if (err) {
>+ goto out;
>+ }
Move this to a separate helper alongside the other "default" functions?
>+ }
>+ }
>+
> err = register_pernet_subsys(&devlink_pernet_ops);
> if (err)
> goto out;
>@@ -595,7 +819,10 @@ static int __init devlink_init(void)
> out_unreg_pernet_subsys:
> unregister_pernet_subsys(&devlink_pernet_ops);
> out:
>+ if (err)
>+ devlink_default_esw_mode_nodes_clear();
> WARN_ON(err);
>+
> return err;
> }
>
>--
>2.43.0
>
next prev parent reply other threads:[~2026-07-01 9:38 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-29 18:20 [PATCH net-next V4 0/6] evlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 1/6] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 2/6] devlink: Factor out eswitch mode setting Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults Mark Bloch
2026-07-01 9:38 ` Jiri Pirko [this message]
2026-07-01 12:55 ` Mark Bloch
2026-07-01 13:14 ` Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 4/6] devlink: Apply " Mark Bloch
2026-07-01 9:48 ` Jiri Pirko
2026-07-01 12:57 ` Mark Bloch
2026-07-01 14:09 ` Jiri Pirko
2026-07-01 17:42 ` Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 5/6] devlink: Add API to apply eswitch mode boot default Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 6/6] net/mlx5: Apply devlink eswitch mode boot default on probe Mark Bloch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=akTencQhKSanuFeW@FV6GYCPJ69 \
--to=jiri@resnulli.us \
--cc=andrew+netdev@lunn.ch \
--cc=corbet@lwn.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=mbloch@nvidia.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox