From: Mark Bloch <mbloch@nvidia.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-doc@vger.kernel.org
Subject: Re: [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults
Date: Wed, 1 Jul 2026 15:55:48 +0300 [thread overview]
Message-ID: <188fc320-b7b7-4041-9132-9c71f1e0fe31@nvidia.com> (raw)
In-Reply-To: <akTencQhKSanuFeW@FV6GYCPJ69>
On 01/07/2026 12:38, Jiri Pirko wrote:
> Mon, Jun 29, 2026 at 08:20:58PM +0200, mbloch@nvidia.com wrote:
>> Add devlink_eswitch_mode= kernel command line parsing for a default
>> eswitch mode.
>>
>> The supported syntax selects either all devlink handles or one explicit
>> comma-separated handle list:
>>
>> devlink_eswitch_mode=*=<mode>
>>
>> devlink_eswitch_mode=<handle>[,<handle>...]=<mode>
>>
>> where <mode> is one of legacy, switchdev or switchdev_inactive. All
>> selected handles receive the same mode. Assigning different modes to
>> different handle lists in the same parameter value is not supported.
>>
>> Store the parsed selector and mode in devlink core so the default can be
>> applied by a downstream patch.
>>
>> Document the devlink_eswitch_mode= syntax and duplicate handle handling.
>>
>> Signed-off-by: Mark Bloch <mbloch@nvidia.com>
>> ---
>> .../admin-guide/kernel-parameters.txt | 25 ++
>> .../networking/devlink/devlink-defaults.rst | 78 ++++++
>> Documentation/networking/devlink/index.rst | 1 +
>> net/devlink/core.c | 227 ++++++++++++++++++
>> 4 files changed, 331 insertions(+)
>> create mode 100644 Documentation/networking/devlink/devlink-defaults.rst
>>
>> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
>> index b5493a7f8f22..117300dd589c 100644
>> --- a/Documentation/admin-guide/kernel-parameters.txt
>> +++ b/Documentation/admin-guide/kernel-parameters.txt
>> @@ -1249,6 +1249,31 @@ Kernel parameters
>> dell_smm_hwmon.fan_max=
>> [HW] Maximum configurable fan speed.
>>
>> + devlink_eswitch_mode=
>> + [NET]
>> + Format:
>> + <selector>=<mode>
>> +
>> + <selector>:
>> + * | <handle>[,<handle>...]
>> +
>> + <handle>:
>> + <bus-name>/<dev-name>
>> +
>> + Configure default devlink eswitch mode for matching
>> + devlink instances during device initialization.
>> +
>> + <mode>:
>> + legacy | switchdev | switchdev_inactive
>> +
>> + Examples:
>> + devlink_eswitch_mode=*=switchdev
>> + devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>> + devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>> +
>> + See Documentation/networking/devlink/devlink-defaults.rst
>> + for the full syntax.
>> +
>> dfltcc= [HW,S390]
>> Format: { on | off | def_only | inf_only | always }
>> on: s390 zlib hardware support for compression on
>> diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
>> new file mode 100644
>> index 000000000000..380c9e99210e
>> --- /dev/null
>> +++ b/Documentation/networking/devlink/devlink-defaults.rst
>> @@ -0,0 +1,78 @@
>> +.. SPDX-License-Identifier: GPL-2.0
>> +
>> +==============================
>> +Devlink Eswitch Mode Defaults
>> +==============================
>> +
>> +Devlink eswitch mode defaults allow the eswitch mode to be provided on the
>> +kernel command line and applied to matching devlink instances during device
>> +initialization.
>> +
>> +The devlink device is selected by its devlink handle. For PCI devices this is
>> +the same handle shown by ``devlink dev show``, for example
>> +``pci/0000:08:00.0``.
>> +
>> +Kernel command line syntax
>> +==========================
>> +
>> +Defaults are specified with the ``devlink_eswitch_mode=`` kernel command line
>> +parameter.
>> +
>> +The general syntax is::
>> +
>> + devlink_eswitch_mode=<selector>=<mode>
>> +
>> +``<selector>`` is either ``*`` or one or more devlink handles::
>> +
>> + * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
>> +
>> +``*`` applies the mode to every devlink instance. All handles in the same
>> +selector receive the same eswitch mode.
>> +
>> +``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
>> +
>> +Syntax rules
>> +------------
>> +
>> +The following syntax rules apply:
>> +
>> +* Specify the default in one ``devlink_eswitch_mode=`` parameter. Repeated
>> + ``devlink_eswitch_mode=`` parameters are not accumulated.
>> +* The ``devlink_eswitch_mode=`` value is limited by the kernel command line
>> + size.
>> +* Whitespace is not allowed within the parameter value.
>> +* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
>> + combined with explicit handles.
>> +* ``<bus-name>`` and ``<dev-name>`` must not be empty.
>> +* ``<dev-name>`` may contain ``:``. This allows PCI names such as
>> + ``0000:08:00.0``.
>> +* Handles must not contain whitespace, ``*``, ``=`` or more than one ``/``.
>> +* A comma separates handles.
>> +* Comma-separated default assignments are not supported.
>> +* Duplicate handles are rejected and the devlink eswitch mode default is
>> + ignored.
>> +
>> +The eswitch mode default corresponds to the userspace command::
>> +
>> + devlink dev eswitch set <handle> mode <value>
>> +
>> +
>> +Examples
>> +========
>> +
>> +Set all devlink instances to switchdev mode::
>> +
>> + devlink_eswitch_mode=*=switchdev
>> +
>> +Set one PCI devlink instance to switchdev mode::
>> +
>> + devlink_eswitch_mode=pci/0000:08:00.0=switchdev
>> +
>> +Set two PCI devlink instances to switchdev inactive mode::
>> +
>> + devlink_eswitch_mode=pci/0000:08:00.0,pci/0000:09:00.1=switchdev_inactive
>> +
>> +The following is invalid because comma-separated default assignments are not
>> +supported::
>> +
>> + devlink_eswitch_mode=pci/0000:08:00.0=switchdev,pci/0000:09:00.0=switchdev_inactive
>
> Interesting. I would think that this is something user may want to set
> for some usecases, no?
>
>
>> diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
>> index 32f70879ddd0..93f09cb18c44 100644
>> --- a/Documentation/networking/devlink/index.rst
>> +++ b/Documentation/networking/devlink/index.rst
>> @@ -56,6 +56,7 @@ general.
>> :maxdepth: 1
>>
>> devlink-dpipe
>> + devlink-defaults
>> devlink-eswitch-attr
>> devlink-flash
>> devlink-health
>> diff --git a/net/devlink/core.c b/net/devlink/core.c
>
> Wanna have this in a separate file perhaps? "default.c"?
>
>
>> index fe9f6a0a67d5..5126509a9c4e 100644
>> --- a/net/devlink/core.c
>> +++ b/net/devlink/core.c
>> @@ -4,6 +4,10 @@
>> * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
>> */
>>
>> +#include <linux/init.h>
>> +#include <linux/list.h>
>> +#include <linux/slab.h>
>> +#include <linux/string.h>
>> #include <net/genetlink.h>
>> #define CREATE_TRACE_POINTS
>> #include <trace/events/devlink.h>
>> @@ -16,6 +20,193 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
>>
>> DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
>>
>> +static char *devlink_default_esw_mode_param;
>> +static bool devlink_default_esw_mode_match_all;
>> +static enum devlink_eswitch_mode devlink_default_esw_mode;
>> +static LIST_HEAD(devlink_default_esw_mode_nodes);
>> +
>> +struct devlink_default_esw_mode_node {
>> + struct list_head list;
>> + char *bus_name;
>> + char *dev_name;
>> +};
>> +
>> +static int __init
>> +devlink_default_esw_mode_to_value(const char *str,
>> + enum devlink_eswitch_mode *mode)
>> +{
>> + if (!strcmp(str, "legacy")) {
>> + *mode = DEVLINK_ESWITCH_MODE_LEGACY;
>> + return 0;
>> + }
>> + if (!strcmp(str, "switchdev")) {
>> + *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
>> + return 0;
>> + }
>> + if (!strcmp(str, "switchdev_inactive")) {
>> + *mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
>> + return 0;
>> + }
>> +
>> + return -EINVAL;
>> +}
>> +
>> +static int __init
>> +devlink_default_esw_mode_handle_parse(char *handle, char **bus_name,
>> + char **dev_name)
>> +{
>> + char *slash;
>> + char *p;
>> +
>> + if (!*handle)
>> + return -EINVAL;
>> +
>> + for (p = handle; *p; p++) {
>> + if (*p == '*' || *p == '=')
>> + return -EINVAL;
>> + }
>> +
>> + slash = strchr(handle, '/');
>> + if (!slash || slash == handle || !slash[1])
>> + return -EINVAL;
>> + if (strchr(slash + 1, '/'))
>> + return -EINVAL;
>> +
>> + *slash = '\0';
>> +
>> + *bus_name = handle;
>> + *dev_name = slash + 1;
>> + return 0;
>> +}
>> +
>> +static struct devlink_default_esw_mode_node *
>> +devlink_default_esw_mode_node_find(const char *bus_name, const char *dev_name)
>> +{
>> + struct devlink_default_esw_mode_node *node;
>> +
>> + list_for_each_entry(node, &devlink_default_esw_mode_nodes, list) {
>> + if (!strcmp(node->bus_name, bus_name) &&
>> + !strcmp(node->dev_name, dev_name))
>> + return node;
>> + }
>> +
>> + return NULL;
>> +}
>> +
>> +static int __init
>> +devlink_default_esw_mode_node_add(const char *bus_name, const char *dev_name)
>> +{
>> + struct devlink_default_esw_mode_node *node;
>> +
>> + if (devlink_default_esw_mode_node_find(bus_name, dev_name))
>> + return -EEXIST;
>> +
>> + node = kzalloc_obj(*node);
>> + if (!node)
>> + return -ENOMEM;
>> +
>> + INIT_LIST_HEAD(&node->list);
>> + node->bus_name = kstrdup(bus_name, GFP_KERNEL);
>> + node->dev_name = kstrdup(dev_name, GFP_KERNEL);
>> + if (!node->bus_name || !node->dev_name) {
>> + kfree(node->bus_name);
>> + kfree(node->dev_name);
>> + kfree(node);
>> + return -ENOMEM;
>> + }
>> +
>> + list_add_tail(&node->list, &devlink_default_esw_mode_nodes);
>> + return 0;
>> +}
>> +
>> +static int __init devlink_default_esw_mode_handles_parse(char *handles)
>> +{
>> + char *handle;
>> + int err;
>> +
>> + if (!strcmp(handles, "*")) {
>> + devlink_default_esw_mode_match_all = true;
>> + return 0;
>> + }
>> +
>> + while ((handle = strsep(&handles, ",")) != NULL) {
>> + char *bus_name;
>> + char *dev_name;
>> +
>> + err = devlink_default_esw_mode_handle_parse(handle, &bus_name,
>> + &dev_name);
>> + if (err)
>> + return err;
>> +
>> + err = devlink_default_esw_mode_node_add(bus_name, dev_name);
>> + if (err)
>> + return err;
>> + }
>> +
>> + return 0;
>> +}
>> +
>> +static void __init
>> +devlink_default_esw_mode_node_free(struct devlink_default_esw_mode_node *node)
>> +{
>> + kfree(node->bus_name);
>> + kfree(node->dev_name);
>> + kfree(node);
>> +}
>> +
>> +static void __init devlink_default_esw_mode_nodes_clear(void)
>> +{
>> + struct devlink_default_esw_mode_node *node;
>> + struct devlink_default_esw_mode_node *node_tmp;
>> +
>> + list_for_each_entry_safe(node, node_tmp,
>> + &devlink_default_esw_mode_nodes, list) {
>> + list_del(&node->list);
>> + devlink_default_esw_mode_node_free(node);
>> + }
>> +
>> + devlink_default_esw_mode_match_all = false;
>> +}
>> +
>> +static int __init devlink_default_esw_mode_parse(char *str)
>> +{
>> + char *handles;
>> + char *separator;
>> + char *mode;
>> + enum devlink_eswitch_mode esw_mode;
>> + int err;
>> +
>> + if (!*str)
>> + return -EINVAL;
>> +
>> + separator = strrchr(str, '=');
>> + if (!separator || separator == str || !separator[1])
>> + return -EINVAL;
>> +
>> + *separator = '\0';
>> + handles = str;
>> + mode = separator + 1;
>> +
>> + err = devlink_default_esw_mode_to_value(mode, &esw_mode);
>> + if (err)
>> + return err;
>> +
>> + err = devlink_default_esw_mode_handles_parse(handles);
>> + if (err)
>> + devlink_default_esw_mode_nodes_clear();
>> + else
>> + devlink_default_esw_mode = esw_mode;
>> +
>> + return err;
>> +}
>> +
>> +static int __init devlink_default_esw_mode_setup(char *str)
>> +{
>> + devlink_default_esw_mode_param = str;
>> + return 1;
>> +}
>> +__setup("devlink_eswitch_mode=", devlink_default_esw_mode_setup);
>> +
>> static struct devlink *devlinks_xa_get(unsigned long index)
>> {
>> struct devlink *devlink;
>> @@ -382,6 +573,14 @@ struct devlink *devlinks_xa_lookup_get(struct net *net, unsigned long index)
>> /**
>> * devl_register - Register devlink instance
>> * @devlink: devlink
>> + *
>> + * Make @devlink visible to userspace. Drivers must call this only after the
>> + * instance is fully initialized and its devlink operations can be called.
>> + *
>> + * Context: Caller must hold the devlink instance lock. Use devlink_register()
>> + * when the lock is not already held.
>> + *
>> + * Return: 0 on success.
>> */
>> int devl_register(struct devlink *devlink)
>> {
>> @@ -580,6 +779,31 @@ static int __init devlink_init(void)
>> {
>> int err;
>>
>> + if (devlink_default_esw_mode_param) {
>> + char *def;
>> +
>> + def = kstrdup(devlink_default_esw_mode_param, GFP_KERNEL);
>> + if (!def) {
>> + devlink_default_esw_mode_param = NULL;
>> + pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>> + } else {
>> + err = devlink_default_esw_mode_parse(def);
>> + kfree(def);
>> + if (err == -EEXIST) {
>> + devlink_default_esw_mode_param = NULL;
>> + pr_warn("devlink: duplicate eswitch mode handles ignored\n");
>> + } else if (err == -EINVAL) {
>> + devlink_default_esw_mode_param = NULL;
>> + pr_warn("devlink: invalid devlink_eswitch_mode parameter ignored\n");
>> + } else if (err == -ENOMEM) {
>> + devlink_default_esw_mode_param = NULL;
>> + pr_warn("devlink: devlink_eswitch_mode parameter ignored, failed to allocate memory\n");
>> + } else if (err) {
>> + goto out;
>> + }
>
> Move this to a separate helper alongside the other "default" functions?
I did that in the following patch. If I respin I'll do it in this one.
Mark
>
>
>> + }
>> + }
>> +
>> err = register_pernet_subsys(&devlink_pernet_ops);
>> if (err)
>> goto out;
>> @@ -595,7 +819,10 @@ static int __init devlink_init(void)
>> out_unreg_pernet_subsys:
>> unregister_pernet_subsys(&devlink_pernet_ops);
>> out:
>> + if (err)
>> + devlink_default_esw_mode_nodes_clear();
>> WARN_ON(err);
>> +
>> return err;
>> }
>>
>> --
>> 2.43.0
>>
next prev parent reply other threads:[~2026-07-01 12:55 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-29 18:20 [PATCH net-next V4 0/6] evlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 1/6] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 2/6] devlink: Factor out eswitch mode setting Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults Mark Bloch
2026-07-01 9:38 ` Jiri Pirko
2026-07-01 12:55 ` Mark Bloch [this message]
2026-07-01 13:14 ` Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 4/6] devlink: Apply " Mark Bloch
2026-07-01 9:48 ` Jiri Pirko
2026-07-01 12:57 ` Mark Bloch
2026-07-01 14:09 ` Jiri Pirko
2026-07-01 17:42 ` Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 5/6] devlink: Add API to apply eswitch mode boot default Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 6/6] net/mlx5: Apply devlink eswitch mode boot default on probe Mark Bloch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=188fc320-b7b7-4041-9132-9c71f1e0fe31@nvidia.com \
--to=mbloch@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=corbet@lwn.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox