Linux RDMA and InfiniBand development
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Mark Bloch <mbloch@nvidia.com>
Cc: Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>,
	 Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	 "David S. Miller" <davem@davemloft.net>,
	Jonathan Corbet <corbet@lwn.net>,
	 Shuah Khan <skhan@linuxfoundation.org>,
	Simon Horman <horms@kernel.org>,
	 Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	 Tariq Toukan <tariqt@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	 "Borislav Petkov (AMD)" <bp@alien8.de>,
	Randy Dunlap <rdunlap@infradead.org>,
	 Dave Hansen <dave.hansen@linux.intel.com>,
	Christian Brauner <brauner@kernel.org>,
	 Petr Mladek <pmladek@suse.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	 Thomas Gleixner <tglx@kernel.org>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	 Dapeng Mi <dapeng1.mi@linux.intel.com>,
	Kees Cook <kees@kernel.org>, Marco Elver <elver@google.com>,
	 Eric Biggers <ebiggers@kernel.org>,
	Li RongQing <lirongqing@baidu.com>,
	 "Paul E. McKenney" <paulmck@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	 netdev@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [RFC net-next 0/4] devlink: Add boot-time defaults
Date: Thu, 7 May 2026 13:03:08 +0200	[thread overview]
Message-ID: <afxvzOjqw-vxUAED@FV6GYCPJ69> (raw)
In-Reply-To: <3f9215c4-7c84-46d9-ba74-30dabe24db09@nvidia.com>

Wed, May 06, 2026 at 07:35:10PM +0200, mbloch@nvidia.com wrote:
>
>
>On 06/05/2026 18:22, Jiri Pirko wrote:
>> Wed, May 06, 2026 at 02:37:35PM +0200, mbloch@nvidia.com wrote:
>>> This series adds a devlink= kernel command line parameter for applying
>>> selected devlink settings during device initialization.
>>>
>>> Following a discussion with Jakub[1], I am sending this RFC to get the
>>> conversation moving. I started from Jakub's example/request and extended
>>> it to cover requirements from production systems and configurations that
>>> customers use.
>>>
>>> One important caveat is that the parsing logic in this RFC was written
>>> with AI assistance. I am also not sure whether the resulting syntax and
>>> parser are too complex for a kernel command line interface. This is part
>>> of why I am sending it as an RFC: to understand what direction and level
>>> of complexity would be acceptable to people.
>>>
>>> The implementation is intended to support the following properties:
>>>
>>> - A system may have multiple devlink devices that usually need the same
>>>  configuration. For a configuration such as eswitch mode switchdev, a
>>>  user should be able to specify multiple devices to which that
>>>  configuration applies.
>>>
>>> - There may be ordering dependencies between options. For example, in
>>>  mlx5, flow_steering_mode should be set before moving to switchdev.
>>>  With this in mind, defaults are applied per device in the left-to-right
>>>  order in which they appear on the command line.
>>>
>>> The intent is to let deployments set devlink defaults before normal
>>> userspace orchestration runs, while still using devlink concepts and
>> 
>> "defaults before normal userspace orchestrarion". I read it as config
>> before config, which eventually could be skipped.
>> 
>> 
>>> driver callbacks rather than adding driver-specific module parameters.
>>> A default is scoped to one or more devlink handles, for example:
>>>
>>>  devlink=[pci/0000:08:00.0]:esw:mode:switchdev
>>>  devlink=[pci/0000:08:00.0]:param:flow_steering_mode:smfs
>>>  devlink=[pci/0000:08:00.0,pci/0000:08:00.1]:param:flow_steering_mode:hmfs,[pci/0000:08:00.0,pci/0000:08:00.1]:esw:mode:switchdev
>> 
>> I don't like this. What you do, you are basically introducing user
>> configuration tool on kernel cmdline.
>> 
>> The same you would achieve with a proper userspace tool/daemon.
>> I did try to come up with it and push it here:
>> https://github.com/systemd/systemd/pull/37393
>> That didn't get merged for unknown reason, but the idea is sound. You
>> provide configuration files for devlink object and systemd-devlinkd
>> will apply when they appear. Wouldn't this help your case?
>
>I agree that systemd-devlinkd is the right shape for normal
>devlink configuration, and it could probably replace the udev/devlink
>plumbing we use today.
>
>The case I am trying to cover is earlier than that.
>
>On BlueField/ECPF/DPU systems, the host PF driver cannot always finish
>probing independently of the ECPF side. When the ECPF is the eswitch
>manager, the host PF is kept in initializing state until the ECPF eswitch
>side is set up and mlx5 enables the external host PF HCA. That happens as
>part of moving the ECPF to switchdev.
>
>Today userspace observes the ECPF instance and then switches the
>mode through devlink, usually via udev or similar plumbing. That still
>leaves a window where the ECPF has probed, userspace has not applied the
>mode yet, and the host PF is waiting. With many ECPFs this becomes visible
>in host PF probe/boot time. A daemon reacting to the devlink object
>appearing can make the userspace side cleaner, but it still runs after the
>device has appeared and after userspace scheduling/uevent handling.
>
>Long term, for these DPU deployments, we would like mlx5 to initialize
>directly in switchdev. I am hesitant to make that unconditional because it
>changes existing behavior and there is no early opt-out before probe. The
>cmdline parameter was meant as an explicit opt-in middle step: ask the
>driver to apply the same devlink operation during init, before this path
>depends on userspace.
>
>We previously tried to address this with an mlx5 module parameter. By
>design, that was too coarse: it applied to all mlx5 devices handled by the
>module. That makes it usable only for narrow DPU-only configurations. The
>devlink-handle based cmdline syntax was intended to keep the opt-in scoped
>to the specific devices that need this early switchdev transition.

The switchdev mode was introduced at roughly the time CX4 was out. What
stopped us from making it default for CX4+ ?

Introducing this horrible plumbing only bacause we were not able to
change the default sounds so absurd.

Can we write the default mode as a bit in ASIC NV memory perhaps? Simple
devlink cmode permanent param to write it, the driver can read this bit
during init to decide the init flow path?

  reply	other threads:[~2026-05-07 11:03 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 12:37 [RFC net-next 0/4] devlink: Add boot-time defaults Mark Bloch
2026-05-06 12:37 ` [RFC net-next 1/4] devlink: Add infrastructure for " Mark Bloch
2026-05-06 12:37 ` [RFC net-next 2/4] devlink: Add eswitch mode boot default Mark Bloch
2026-05-06 12:37 ` [RFC net-next 3/4] devlink: Add runtime parameter boot defaults Mark Bloch
2026-05-06 12:37 ` [RFC net-next 4/4] net/mlx5: Apply devlink boot defaults during init Mark Bloch
2026-05-06 15:22 ` [RFC net-next 0/4] devlink: Add boot-time defaults Jiri Pirko
2026-05-06 17:35   ` Mark Bloch
2026-05-07 11:03     ` Jiri Pirko [this message]
2026-05-08 17:59       ` Mark Bloch
2026-05-08 18:07         ` Jiri Pirko
2026-05-09  0:52           ` Jakub Kicinski
2026-05-09  7:01             ` Jiri Pirko
2026-05-10 12:31               ` Mark Bloch
2026-05-11  8:07                 ` Jiri Pirko
2026-05-10 16:37               ` Jakub Kicinski
2026-05-11  8:42                 ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=afxvzOjqw-vxUAED@FV6GYCPJ69 \
    --to=jiri@resnulli.us \
    --cc=akpm@linux-foundation.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@kernel.org \
    --cc=edumazet@google.com \
    --cc=elver@google.com \
    --cc=horms@kernel.org \
    --cc=kees@kernel.org \
    --cc=kuba@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rdunlap@infradead.org \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tariqt@nvidia.com \
    --cc=tglx@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox