From: Mark Bloch <mbloch@nvidia.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Simon Horman <horms@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>,
Andrew Morton <akpm@linux-foundation.org>,
"Borislav Petkov (AMD)" <bp@alien8.de>,
Randy Dunlap <rdunlap@infradead.org>,
Dave Hansen <dave.hansen@linux.intel.com>,
Christian Brauner <brauner@kernel.org>,
Petr Mladek <pmladek@suse.com>,
"Peter Zijlstra (Intel)" <peterz@infradead.org>,
Thomas Gleixner <tglx@kernel.org>,
Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
Dapeng Mi <dapeng1.mi@linux.intel.com>,
Kees Cook <kees@kernel.org>, Marco Elver <elver@google.com>,
Eric Biggers <ebiggers@kernel.org>,
Li RongQing <lirongqing@baidu.com>,
"Paul E. McKenney" <paulmck@kernel.org>,
linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [RFC net-next 0/4] devlink: Add boot-time defaults
Date: Wed, 6 May 2026 20:35:10 +0300 [thread overview]
Message-ID: <3f9215c4-7c84-46d9-ba74-30dabe24db09@nvidia.com> (raw)
In-Reply-To: <aftaW-irGmkfA7FS@FV6GYCPJ69>
On 06/05/2026 18:22, Jiri Pirko wrote:
> Wed, May 06, 2026 at 02:37:35PM +0200, mbloch@nvidia.com wrote:
>> This series adds a devlink= kernel command line parameter for applying
>> selected devlink settings during device initialization.
>>
>> Following a discussion with Jakub[1], I am sending this RFC to get the
>> conversation moving. I started from Jakub's example/request and extended
>> it to cover requirements from production systems and configurations that
>> customers use.
>>
>> One important caveat is that the parsing logic in this RFC was written
>> with AI assistance. I am also not sure whether the resulting syntax and
>> parser are too complex for a kernel command line interface. This is part
>> of why I am sending it as an RFC: to understand what direction and level
>> of complexity would be acceptable to people.
>>
>> The implementation is intended to support the following properties:
>>
>> - A system may have multiple devlink devices that usually need the same
>> configuration. For a configuration such as eswitch mode switchdev, a
>> user should be able to specify multiple devices to which that
>> configuration applies.
>>
>> - There may be ordering dependencies between options. For example, in
>> mlx5, flow_steering_mode should be set before moving to switchdev.
>> With this in mind, defaults are applied per device in the left-to-right
>> order in which they appear on the command line.
>>
>> The intent is to let deployments set devlink defaults before normal
>> userspace orchestration runs, while still using devlink concepts and
>
> "defaults before normal userspace orchestrarion". I read it as config
> before config, which eventually could be skipped.
>
>
>> driver callbacks rather than adding driver-specific module parameters.
>> A default is scoped to one or more devlink handles, for example:
>>
>> devlink=[pci/0000:08:00.0]:esw:mode:switchdev
>> devlink=[pci/0000:08:00.0]:param:flow_steering_mode:smfs
>> devlink=[pci/0000:08:00.0,pci/0000:08:00.1]:param:flow_steering_mode:hmfs,[pci/0000:08:00.0,pci/0000:08:00.1]:esw:mode:switchdev
>
> I don't like this. What you do, you are basically introducing user
> configuration tool on kernel cmdline.
>
> The same you would achieve with a proper userspace tool/daemon.
> I did try to come up with it and push it here:
> https://github.com/systemd/systemd/pull/37393
> That didn't get merged for unknown reason, but the idea is sound. You
> provide configuration files for devlink object and systemd-devlinkd
> will apply when they appear. Wouldn't this help your case?
I agree that systemd-devlinkd is the right shape for normal
devlink configuration, and it could probably replace the udev/devlink
plumbing we use today.
The case I am trying to cover is earlier than that.
On BlueField/ECPF/DPU systems, the host PF driver cannot always finish
probing independently of the ECPF side. When the ECPF is the eswitch
manager, the host PF is kept in initializing state until the ECPF eswitch
side is set up and mlx5 enables the external host PF HCA. That happens as
part of moving the ECPF to switchdev.
Today userspace observes the ECPF instance and then switches the
mode through devlink, usually via udev or similar plumbing. That still
leaves a window where the ECPF has probed, userspace has not applied the
mode yet, and the host PF is waiting. With many ECPFs this becomes visible
in host PF probe/boot time. A daemon reacting to the devlink object
appearing can make the userspace side cleaner, but it still runs after the
device has appeared and after userspace scheduling/uevent handling.
Long term, for these DPU deployments, we would like mlx5 to initialize
directly in switchdev. I am hesitant to make that unconditional because it
changes existing behavior and there is no early opt-out before probe. The
cmdline parameter was meant as an explicit opt-in middle step: ask the
driver to apply the same devlink operation during init, before this path
depends on userspace.
We previously tried to address this with an mlx5 module parameter. By
design, that was too coarse: it applied to all mlx5 devices handled by the
module. That makes it usable only for narrow DPU-only configurations. The
devlink-handle based cmdline syntax was intended to keep the opt-in scoped
to the specific devices that need this early switchdev transition.
Mark
>
> [..]
prev parent reply other threads:[~2026-05-06 17:35 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-05-06 12:37 [RFC net-next 0/4] devlink: Add boot-time defaults Mark Bloch
2026-05-06 12:37 ` [RFC net-next 1/4] devlink: Add infrastructure for " Mark Bloch
2026-05-06 12:37 ` [RFC net-next 2/4] devlink: Add eswitch mode boot default Mark Bloch
2026-05-06 12:37 ` [RFC net-next 3/4] devlink: Add runtime parameter boot defaults Mark Bloch
2026-05-06 12:37 ` [RFC net-next 4/4] net/mlx5: Apply devlink boot defaults during init Mark Bloch
2026-05-06 15:22 ` [RFC net-next 0/4] devlink: Add boot-time defaults Jiri Pirko
2026-05-06 17:35 ` Mark Bloch [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=3f9215c4-7c84-46d9-ba74-30dabe24db09@nvidia.com \
--to=mbloch@nvidia.com \
--cc=akpm@linux-foundation.org \
--cc=andrew+netdev@lunn.ch \
--cc=bp@alien8.de \
--cc=brauner@kernel.org \
--cc=corbet@lwn.net \
--cc=dapeng1.mi@linux.intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=davem@davemloft.net \
--cc=ebiggers@kernel.org \
--cc=edumazet@google.com \
--cc=elver@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kees@kernel.org \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=lirongqing@baidu.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=paulmck@kernel.org \
--cc=pawan.kumar.gupta@linux.intel.com \
--cc=peterz@infradead.org \
--cc=pmladek@suse.com \
--cc=rdunlap@infradead.org \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
--cc=tglx@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox