Netdev List
 help / color / mirror / Atom feed
* [RFC V2 net-next 0/2] devlink: Add boot-time defaults
@ 2026-05-10 18:54 Mark Bloch
  2026-05-10 18:54 ` [RFC V2 net-next 1/2] devlink: Add eswitch mode boot defaults Mark Bloch
  2026-05-10 18:54 ` [RFC V2 net-next 2/2] net/mlx5: Apply devlink boot defaults during init Mark Bloch
  0 siblings, 2 replies; 3+ messages in thread
From: Mark Bloch @ 2026-05-10 18:54 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller, Jiri Pirko
  Cc: Jonathan Corbet, Shuah Khan, Simon Horman, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Mark Bloch, Andrew Morton,
	Borislav Petkov (AMD), Randy Dunlap, Petr Mladek,
	Peter Zijlstra (Intel), Christian Brauner, Thomas Gleixner,
	Dapeng Mi, Kees Cook, Marco Elver, Li RongQing, Eric Biggers,
	Paul E. McKenney, linux-doc, linux-kernel, netdev, linux-rdma

This series adds a devlink= kernel command line parameter for applying
selected devlink settings during device initialization.

Following a discussion with Jakub[1], I am sending this RFC to get the
conversation moving. I started from Jakub's example/request and extended
it to cover requirements from production systems and configurations that
customers use. The implementation is intended to support the following
properties:

- A system may have multiple devlink devices that usually need the same
configuration. For a configuration such as eswitch mode switchdev, a user
can specify either all devlink devices or an explicit list of devices to
which that configuration applies.

- Deployments can set devlink defaults before normal userspace
orchestration runs, while still using devlink concepts and driver callbacks
rather than adding driver-specific module parameters.

A default is scoped to either all devlink handles or to a comma-separated
list of devlink handles, for example:

devlink=[*]:esw:mode:switchdev
devlink=[pci/0000:08:00.0,pci/0000:09:00.1]:esw:mode:switchdev_inactive

The first supported command is eswitch mode configuration.

mlx5 wires this into device initialization after the devlink instance is
registered and after mlx5 devlink operations are available, so eswitch mode
defaults can be applied to matching PCI devlink devices.

Patch 1 adds the devlink boot-default parser, storage,
devl_apply_defaults() API, eswitch mode default support and documentation
for the devlink= syntax.

Patch 2 calls devl_apply_defaults() from mlx5 device initialization.

Changelog:

V1 -> V2

- Reduced the series from 4 patches to 2 patches by folding the generic
boot default infrastructure and eswitch mode support into one devlink
patch.

- Narrowed the scope to eswitch mode defaults only. Dropped runtime
devlink parameter boot defaults and simplified parser.

- Added [*] selector support to apply the same eswitch mode default to all
devlink instances.

- Removed support for multiple comma-separated default groups and for
assigning different defaults to different handles in the same devlink=
parameter.

[1] https://lore.kernel.org/all/20260502184153.4fd8d06f@kernel.org/
V1 : https://lore.kernel.org/all/20260506123739.1959770-1-mbloch@nvidia.com/

Mark Bloch (2):
  devlink: Add eswitch mode boot defaults
  net/mlx5: Apply devlink boot defaults during init

 .../admin-guide/kernel-parameters.txt         |  24 ++
 .../networking/devlink/devlink-defaults.rst   |  93 ++++++
 Documentation/networking/devlink/index.rst    |   1 +
 .../net/ethernet/mellanox/mlx5/core/main.c    |   2 +
 include/net/devlink.h                         |   1 +
 net/devlink/core.c                            | 269 ++++++++++++++++++
 6 files changed, 390 insertions(+)
 create mode 100644 Documentation/networking/devlink/devlink-defaults.rst


base-commit: a93245816556ba03549bb626de48ea2b11c7f9d2
-- 
2.34.1


^ permalink raw reply	[flat|nested] 3+ messages in thread

* [RFC V2 net-next 1/2] devlink: Add eswitch mode boot defaults
  2026-05-10 18:54 [RFC V2 net-next 0/2] devlink: Add boot-time defaults Mark Bloch
@ 2026-05-10 18:54 ` Mark Bloch
  2026-05-10 18:54 ` [RFC V2 net-next 2/2] net/mlx5: Apply devlink boot defaults during init Mark Bloch
  1 sibling, 0 replies; 3+ messages in thread
From: Mark Bloch @ 2026-05-10 18:54 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller, Jiri Pirko
  Cc: Jonathan Corbet, Shuah Khan, Simon Horman, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Mark Bloch, Andrew Morton,
	Borislav Petkov (AMD), Randy Dunlap, Petr Mladek,
	Peter Zijlstra (Intel), Christian Brauner, Thomas Gleixner,
	Dapeng Mi, Kees Cook, Marco Elver, Li RongQing, Eric Biggers,
	Paul E. McKenney, linux-doc, linux-kernel, netdev, linux-rdma

Add devlink= kernel command line support for configuring devlink
eswitch mode during device initialization.

The supported syntax selects either all devlink handles or one explicit
comma-separated handle list:

  devlink=[*]:esw:mode:<mode>
  devlink=[<handle>[,<handle>...]]:esw:mode:<mode>

where <mode> is one of legacy, switchdev or switchdev_inactive. All
selected handles receive the same mode. Comma-separated default groups are
not supported.

The default is applied through the existing eswitch_mode_set() devlink
operation, matching the userspace devlink eswitch set command.

Document the devlink= syntax and duplicate handle handling.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 .../admin-guide/kernel-parameters.txt         |  24 ++
 .../networking/devlink/devlink-defaults.rst   |  93 ++++++
 Documentation/networking/devlink/index.rst    |   1 +
 include/net/devlink.h                         |   1 +
 net/devlink/core.c                            | 269 ++++++++++++++++++
 5 files changed, 388 insertions(+)
 create mode 100644 Documentation/networking/devlink/devlink-defaults.rst

diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt
index 7834ee927310..46435bdfe039 100644
--- a/Documentation/admin-guide/kernel-parameters.txt
+++ b/Documentation/admin-guide/kernel-parameters.txt
@@ -1278,6 +1278,30 @@ Kernel parameters
 	dell_smm_hwmon.fan_max=
 			[HW] Maximum configurable fan speed.
 
+	devlink=	[NET]
+			Format:
+			[<selector>]:esw:mode:<mode>
+
+			<selector>:
+			* | <handle>[,<handle>...]
+
+			<handle>:
+			<bus-name>/<dev-name>
+
+			Configure default devlink settings for matching
+			devlink instances during device initialization.
+
+			Currently supported settings:
+			esw:mode:{ legacy | switchdev | switchdev_inactive }
+
+			Examples:
+			devlink=[*]:esw:mode:switchdev
+			devlink=[pci/0000:08:00.0]:esw:mode:switchdev
+			devlink=[pci/0000:08:00.0,pci/0000:09:00.1]:esw:mode:legacy
+
+			See Documentation/networking/devlink/devlink-defaults.rst
+			for the full syntax.
+
 	dfltcc=		[HW,S390]
 			Format: { on | off | def_only | inf_only | always }
 			on:       s390 zlib hardware support for compression on
diff --git a/Documentation/networking/devlink/devlink-defaults.rst b/Documentation/networking/devlink/devlink-defaults.rst
new file mode 100644
index 000000000000..4db3025c540f
--- /dev/null
+++ b/Documentation/networking/devlink/devlink-defaults.rst
@@ -0,0 +1,93 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+================
+Devlink Defaults
+================
+
+Devlink defaults allow selected devlink settings to be provided on the
+kernel command line and applied to matching devlink instances during device
+initialization.
+
+The devlink device is selected by its devlink handle. For PCI devices this is
+the same handle shown by ``devlink dev show``, for example
+``pci/0000:08:00.0``.
+
+Kernel command line syntax
+==========================
+
+Defaults are specified with the ``devlink=`` kernel command line parameter.
+
+The general syntax is::
+
+  devlink=[<selector>]:esw:mode:<mode>
+
+``<selector>`` is either ``*`` or one or more devlink handles::
+
+  * | <bus-name>/<dev-name>[,<bus-name>/<dev-name>...]
+
+``*`` applies the default to every devlink instance. All handles in the same
+``[]`` list receive the same eswitch mode setting.
+
+``<mode>`` is one of ``legacy``, ``switchdev`` or ``switchdev_inactive``.
+
+Syntax rules
+------------
+
+The following syntax rules apply:
+
+* Specify the default in one ``devlink=`` parameter. Repeated ``devlink=``
+  parameters are not accumulated.
+* The ``devlink=`` value is limited by the kernel command line size.
+* Whitespace is not allowed within the parameter value.
+* ``<selector>`` must be either ``*`` or a handle list. ``*`` cannot be
+  combined with explicit handles.
+* ``<bus-name>`` and ``<dev-name>`` must not be empty.
+* ``<bus-name>`` must not contain ``:``.
+* ``<dev-name>`` may contain ``:``. This allows PCI names such as
+  ``0000:08:00.0``.
+* Handles must not contain whitespace, ``[``, ``]``, ``*`` or more than one
+  ``/``.
+* A comma inside ``[]`` separates handles.
+* Comma-separated default groups are not supported.
+* Duplicate handles are rejected and the devlink default is ignored.
+
+Supported defaults
+==================
+
+The supported command is ``esw``:
+
+.. list-table::
+   :widths: 10 25 35
+   :header-rows: 1
+
+   * - Command
+     - Options
+     - Values
+   * - ``esw``
+     - ``mode:<mode>``
+     - ``legacy``, ``switchdev``, ``switchdev_inactive``
+
+The ``esw:mode`` default corresponds to the userspace command::
+
+  devlink dev eswitch set <handle> mode <value>
+
+
+Examples
+========
+
+Set all devlink instances to switchdev mode::
+
+  devlink=[*]:esw:mode:switchdev
+
+Set one PCI devlink instance to switchdev mode::
+
+  devlink=[pci/0000:08:00.0]:esw:mode:switchdev
+
+Set two PCI devlink instances to legacy mode::
+
+  devlink=[pci/0000:08:00.0,pci/0000:09:00.1]:esw:mode:legacy
+
+The following is invalid because comma-separated default groups are not
+supported::
+
+  devlink=[pci/0000:08:00.0]:esw:mode:switchdev,[pci/0000:09:00.0]:esw:mode:switchdev_inactive
diff --git a/Documentation/networking/devlink/index.rst b/Documentation/networking/devlink/index.rst
index f7ba7dcf477d..0d27a7008b14 100644
--- a/Documentation/networking/devlink/index.rst
+++ b/Documentation/networking/devlink/index.rst
@@ -56,6 +56,7 @@ general.
    :maxdepth: 1
 
    devlink-dpipe
+   devlink-defaults
    devlink-eswitch-attr
    devlink-flash
    devlink-health
diff --git a/include/net/devlink.h b/include/net/devlink.h
index bcd31de1f890..058654d6800f 100644
--- a/include/net/devlink.h
+++ b/include/net/devlink.h
@@ -1622,6 +1622,7 @@ int devl_trylock(struct devlink *devlink);
 void devl_unlock(struct devlink *devlink);
 void devl_assert_locked(struct devlink *devlink);
 bool devl_lock_is_held(struct devlink *devlink);
+int devl_apply_defaults(struct devlink *devlink);
 DEFINE_GUARD(devl, struct devlink *, devl_lock(_T), devl_unlock(_T));
 
 struct ib_device;
diff --git a/net/devlink/core.c b/net/devlink/core.c
index eeb6a71f5f56..2cfd50f9393b 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -4,6 +4,10 @@
  * Copyright (c) 2016 Jiri Pirko <jiri@mellanox.com>
  */
 
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/slab.h>
+#include <linux/string.h>
 #include <net/genetlink.h>
 #define CREATE_TRACE_POINTS
 #include <trace/events/devlink.h>
@@ -16,6 +20,247 @@ EXPORT_TRACEPOINT_SYMBOL_GPL(devlink_trap_report);
 
 DEFINE_XARRAY_FLAGS(devlinks, XA_FLAGS_ALLOC);
 
+static char *devlink_default;
+static bool devlink_default_match_all;
+static enum devlink_eswitch_mode devlink_default_eswitch_mode;
+static LIST_HEAD(devlink_default_nodes);
+
+struct devlink_default_node {
+	struct list_head list;
+	char *bus_name;
+	char *dev_name;
+};
+
+static int __init
+devlink_default_esw_mode_to_value(const char *str,
+				  enum devlink_eswitch_mode *mode)
+{
+	if (!strcmp(str, "legacy")) {
+		*mode = DEVLINK_ESWITCH_MODE_LEGACY;
+		return 0;
+	}
+	if (!strcmp(str, "switchdev")) {
+		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV;
+		return 0;
+	}
+	if (!strcmp(str, "switchdev_inactive")) {
+		*mode = DEVLINK_ESWITCH_MODE_SWITCHDEV_INACTIVE;
+		return 0;
+	}
+
+	return -EINVAL;
+}
+
+static int __init
+devlink_default_cmd_parse(char *str)
+{
+	char *cmd;
+	char *attr;
+	char *mode;
+
+	cmd = strsep(&str, ":");
+	attr = strsep(&str, ":");
+	mode = strsep(&str, ":");
+	if (!cmd || strcmp(cmd, "esw") || !attr || strcmp(attr, "mode") ||
+	    !mode || !*mode || str)
+		return -EINVAL;
+
+	return devlink_default_esw_mode_to_value(mode,
+						 &devlink_default_eswitch_mode);
+}
+
+static int devlink_default_eswitch_apply(struct devlink *devlink)
+{
+	const struct devlink_ops *ops = devlink->ops;
+
+	if (!ops->eswitch_mode_set)
+		return -EOPNOTSUPP;
+
+	return ops->eswitch_mode_set(devlink, devlink_default_eswitch_mode,
+				     NULL);
+}
+
+static int __init
+devlink_default_handle_parse(char *handle, char **bus_name, char **dev_name)
+{
+	char *slash;
+	char *p;
+
+	if (!handle || !*handle)
+		return -EINVAL;
+
+	for (p = handle; *p; p++) {
+		if (*p == '[' || *p == ']' || *p == '*')
+			return -EINVAL;
+	}
+
+	slash = strchr(handle, '/');
+	if (!slash || slash == handle || !slash[1])
+		return -EINVAL;
+	if (strchr(slash + 1, '/'))
+		return -EINVAL;
+
+	*slash = '\0';
+	if (strchr(handle, ':'))
+		return -EINVAL;
+
+	*bus_name = handle;
+	*dev_name = slash + 1;
+	return 0;
+}
+
+static struct devlink_default_node *
+devlink_default_node_find(const char *bus_name, const char *dev_name)
+{
+	struct devlink_default_node *node;
+
+	list_for_each_entry(node, &devlink_default_nodes, list) {
+		if (!strcmp(node->bus_name, bus_name) &&
+		    !strcmp(node->dev_name, dev_name))
+			return node;
+	}
+
+	return NULL;
+}
+
+static int __init
+devlink_default_node_add(const char *bus_name, const char *dev_name)
+{
+	struct devlink_default_node *node;
+
+	if (devlink_default_node_find(bus_name, dev_name))
+		return -EEXIST;
+
+	node = kzalloc_obj(*node);
+	if (!node)
+		return -ENOMEM;
+
+	INIT_LIST_HEAD(&node->list);
+	node->bus_name = kstrdup(bus_name, GFP_KERNEL);
+	node->dev_name = kstrdup(dev_name, GFP_KERNEL);
+	if (!node->bus_name || !node->dev_name) {
+		kfree(node->bus_name);
+		kfree(node->dev_name);
+		kfree(node);
+		return -ENOMEM;
+	}
+
+	list_add_tail(&node->list, &devlink_default_nodes);
+	return 0;
+}
+
+static int __init devlink_default_handles_parse(char *handles)
+{
+	char *handle;
+	int err;
+
+	if (!strcmp(handles, "*")) {
+		devlink_default_match_all = true;
+		return 0;
+	}
+
+	while ((handle = strsep(&handles, ",")) != NULL) {
+		char *bus_name;
+		char *dev_name;
+
+		err = devlink_default_handle_parse(handle, &bus_name,
+						   &dev_name);
+		if (err)
+			return err;
+
+		err = devlink_default_node_add(bus_name, dev_name);
+		if (err)
+			return err;
+	}
+
+	return 0;
+}
+
+static void __init devlink_default_node_free(struct devlink_default_node *node)
+{
+	kfree(node->bus_name);
+	kfree(node->dev_name);
+	kfree(node);
+}
+
+static void __init devlink_default_nodes_clear(void)
+{
+	struct devlink_default_node *node;
+	struct devlink_default_node *node_tmp;
+
+	list_for_each_entry_safe(node, node_tmp, &devlink_default_nodes, list) {
+		list_del(&node->list);
+		devlink_default_node_free(node);
+	}
+
+	devlink_default_match_all = false;
+}
+
+static int __init devlink_default_parse(char *str)
+{
+	char *handles_end;
+	char *handles;
+	char *cmd;
+	int err;
+
+	if (!str || *str != '[')
+		return -EINVAL;
+
+	handles = str + 1;
+	handles_end = strchr(handles, ']');
+	if (!handles_end || handles_end[1] != ':' || !handles_end[2])
+		return -EINVAL;
+
+	*handles_end = '\0';
+	cmd = handles_end + 2;
+	if (!*handles)
+		return -EINVAL;
+
+	err = devlink_default_cmd_parse(cmd);
+	if (err)
+		return err;
+
+	err = devlink_default_handles_parse(handles);
+	if (err)
+		devlink_default_nodes_clear();
+
+	return err;
+}
+
+/**
+ * devl_apply_defaults - Apply defaults matching the devlink instance
+ * @devlink: devlink
+ *
+ * The caller must hold the devlink instance lock.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+int devl_apply_defaults(struct devlink *devlink)
+{
+	const char *bus_name = devlink_bus_name(devlink);
+	const char *dev_name = devlink_dev_name(devlink);
+	struct devlink_default_node *node;
+
+	devl_assert_locked(devlink);
+
+	if (devlink_default_match_all)
+		return devlink_default_eswitch_apply(devlink);
+
+	node = devlink_default_node_find(bus_name, dev_name);
+	if (node)
+		return devlink_default_eswitch_apply(devlink);
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(devl_apply_defaults);
+
+static int __init devlink_default_setup(char *str)
+{
+	devlink_default = str;
+	return 1;
+}
+__setup("devlink=", devlink_default_setup);
+
 static struct devlink *devlinks_xa_get(unsigned long index)
 {
 	struct devlink *devlink;
@@ -578,6 +823,27 @@ static int __init devlink_init(void)
 {
 	int err;
 
+	if (devlink_default) {
+		char *def;
+
+		def = kstrdup(devlink_default, GFP_KERNEL);
+		if (!def) {
+			err = -ENOMEM;
+			goto out;
+		}
+		err = devlink_default_parse(def);
+		kfree(def);
+		if (err == -EEXIST) {
+			devlink_default = NULL;
+			pr_warn("devlink: duplicate handles ignored\n");
+		} else if (err == -EINVAL) {
+			devlink_default = NULL;
+			pr_warn("devlink: invalid command line parameter ignored\n");
+		} else if (err) {
+			goto out;
+		}
+	}
+
 	err = register_pernet_subsys(&devlink_pernet_ops);
 	if (err)
 		goto out;
@@ -593,7 +859,10 @@ static int __init devlink_init(void)
 out_unreg_pernet_subsys:
 	unregister_pernet_subsys(&devlink_pernet_ops);
 out:
+	if (err)
+		devlink_default_nodes_clear();
 	WARN_ON(err);
+
 	return err;
 }
 
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* [RFC V2 net-next 2/2] net/mlx5: Apply devlink boot defaults during init
  2026-05-10 18:54 [RFC V2 net-next 0/2] devlink: Add boot-time defaults Mark Bloch
  2026-05-10 18:54 ` [RFC V2 net-next 1/2] devlink: Add eswitch mode boot defaults Mark Bloch
@ 2026-05-10 18:54 ` Mark Bloch
  1 sibling, 0 replies; 3+ messages in thread
From: Mark Bloch @ 2026-05-10 18:54 UTC (permalink / raw)
  To: Eric Dumazet, Jakub Kicinski, Paolo Abeni, Andrew Lunn,
	David S. Miller, Jiri Pirko
  Cc: Jonathan Corbet, Shuah Khan, Simon Horman, Saeed Mahameed,
	Leon Romanovsky, Tariq Toukan, Mark Bloch, Andrew Morton,
	Borislav Petkov (AMD), Randy Dunlap, Petr Mladek,
	Peter Zijlstra (Intel), Christian Brauner, Thomas Gleixner,
	Dapeng Mi, Kees Cook, Marco Elver, Li RongQing, Eric Biggers,
	Paul E. McKenney, linux-doc, linux-kernel, netdev, linux-rdma

Apply devlink boot defaults for mlx5 devices after successful device
initialization while holding the devlink instance lock.

At this point the devlink instance is registered and the mlx5 devlink
operations are available, so eswitch mode defaults can be applied to
the matching PCI devlink handle.

Signed-off-by: Mark Bloch <mbloch@nvidia.com>
---
 drivers/net/ethernet/mellanox/mlx5/core/main.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c b/drivers/net/ethernet/mellanox/mlx5/core/main.c
index 296c5223cf61..deea7150084f 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
@@ -1470,6 +1470,8 @@ int mlx5_init_one(struct mlx5_core_dev *dev)
 	err = mlx5_init_one_devl_locked(dev);
 	if (err)
 		devl_unregister(devlink);
+	else
+		devl_apply_defaults(devlink);
 unlock:
 	devl_unlock(devlink);
 	return err;
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2026-05-10 18:55 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2026-05-10 18:54 [RFC V2 net-next 0/2] devlink: Add boot-time defaults Mark Bloch
2026-05-10 18:54 ` [RFC V2 net-next 1/2] devlink: Add eswitch mode boot defaults Mark Bloch
2026-05-10 18:54 ` [RFC V2 net-next 2/2] net/mlx5: Apply devlink boot defaults during init Mark Bloch

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox