Netdev List
 help / color / mirror / Atom feed
From: Jakub Kicinski <kuba@kernel.org>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Mark Bloch <mbloch@nvidia.com>,
	Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Simon Horman <horms@kernel.org>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Borislav Petkov (AMD)" <bp@alien8.de>,
	Randy Dunlap <rdunlap@infradead.org>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	Christian Brauner <brauner@kernel.org>,
	Petr Mladek <pmladek@suse.com>,
	"Peter Zijlstra (Intel)" <peterz@infradead.org>,
	Thomas Gleixner <tglx@kernel.org>,
	Pawan Gupta <pawan.kumar.gupta@linux.intel.com>,
	Dapeng Mi <dapeng1.mi@linux.intel.com>,
	Kees Cook <kees@kernel.org>, Marco Elver <elver@google.com>,
	Eric Biggers <ebiggers@kernel.org>,
	Li RongQing <lirongqing@baidu.com>,
	"Paul E. McKenney" <paulmck@kernel.org>,
	linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org,
	netdev@vger.kernel.org, linux-rdma@vger.kernel.org
Subject: Re: [RFC net-next 0/4] devlink: Add boot-time defaults
Date: Fri, 8 May 2026 17:52:13 -0700	[thread overview]
Message-ID: <20260508175213.1952097f@kernel.org> (raw)
In-Reply-To: <af4lBIJdCuN5VKq_@FV6GYCPJ69>

On Fri, 8 May 2026 20:07:44 +0200 Jiri Pirko wrote:
> >I don't think switchdev by default should mean CX4+ in general. If we get
> >there, I would expect it to be limited to the DPU/BlueField/ECPF case, where
> >the host PF probe path can depend on the ECPF reaching switchdev. Changing the
> >default for regular host NIC deployments feels like a much larger compatibility
> >change.  
> 
> We can't travel throught time, but if from CX5 onwards the default would
> be switchdev, nobody would feel broken in terms of compatibility. That
> is my point. Having "legacy" as default is simply wrong for never NIC
> generations. That is why it is called "legacy" and it should have been
> rotten through and out since CX4 times.

legacy vs switchdev only describes the eswitch configuration.
As a non-SR-IOV user I really don't want to see the extra representors
hanging around my systems, confusing all daemons. IIRC mlx5 had some
limitations around the uplink representor. Maybe that's the disconnect.
But for a real, fully featured switchdev eswitches having the
PHY and PF representors on boot, always, will not make sense.

IOW it's not a question of the generation of the card but of
the deployment type / use case.

> >For the ASIC/NV bit: maybe technically possible, but it feels like the wrong
> >layer. This is boot/deployment policy, not a persistent hardware property, and
> >storing it in NV memory would make the state persist across kernels/hosts in a
> >surprising way.  
> 
> Well, as any other nv config, it persists across kernels/hosts. Think
> about it as "unbreak-my-not-legacy-device" bit.

For most devices the switchdev mode does not change anything
substantial about the device. It's purely a kernel / driver config. 
It changes what objects and default rules kernel / driver installs. 
So I don't get why it would make sense to flash into the device
nvmem a Linux SW stack specific config.

> >I do agree the RFC probably went too far by making a generic devlink cmdline
> >configuration language. Maybe the smaller thing to discuss is only:
> >
> >devlink=[pci/...]:esw:mode:{legacy|switchdev|switchdev_inactive}
> >
> >No runtime params, no ordering between different operations, just early eswitch
> >mode for explicitly selected handles.  

Yes, let's cut this down, AI went too far :) As I said we should just
document how we envision the format growing but for now we can literally
implement just the global "esw mode".

One note on the formatting, you mentioned:

  devlink=[pci/0000:08:00.0,pci/0000:08:00.1]:param:flow_steering_mode:hmfs,[pci/0000:08:00.0,pci/0000:08:00.1]:esw:mode:switchdev

TBH when I used the square brackets I meant that the field is optional.
But I guess you used them like we use them for IPv6 addresses to
separate the : signs, makes sense.

Since AFAIU we only care about global default should we focus on
supporting:

 devlink=*:esw:mode:switchdev

meaning all devices default to switchdev?

> FWIW, I'm still against this.

One more option, tho IDK if it actually is good enough for Mark,
would be to let user space "pause" devlink probing. So that the
systemd daemon can configure the device before it populates all
the netdev stuff. Basically make the devices probe into the reload_down
state, until user space configures them. IDK how much of the time
is spent building and tearing down the legacy mode on mlx5 but
the thinking is that we'd at least stave that wasted effort.

  reply	other threads:[~2026-05-09  0:52 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-05-06 12:37 [RFC net-next 0/4] devlink: Add boot-time defaults Mark Bloch
2026-05-06 12:37 ` [RFC net-next 1/4] devlink: Add infrastructure for " Mark Bloch
2026-05-06 12:37 ` [RFC net-next 2/4] devlink: Add eswitch mode boot default Mark Bloch
2026-05-06 12:37 ` [RFC net-next 3/4] devlink: Add runtime parameter boot defaults Mark Bloch
2026-05-06 12:37 ` [RFC net-next 4/4] net/mlx5: Apply devlink boot defaults during init Mark Bloch
2026-05-06 15:22 ` [RFC net-next 0/4] devlink: Add boot-time defaults Jiri Pirko
2026-05-06 17:35   ` Mark Bloch
2026-05-07 11:03     ` Jiri Pirko
2026-05-08 17:59       ` Mark Bloch
2026-05-08 18:07         ` Jiri Pirko
2026-05-09  0:52           ` Jakub Kicinski [this message]
2026-05-09  7:01             ` Jiri Pirko
2026-05-10 12:31               ` Mark Bloch
2026-05-11  8:07                 ` Jiri Pirko
2026-05-11 18:21                 ` Parav Pandit
2026-05-10 16:37               ` Jakub Kicinski
2026-05-11  8:42                 ` Jiri Pirko
2026-05-11 23:41                   ` Jakub Kicinski

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20260508175213.1952097f@kernel.org \
    --to=kuba@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=corbet@lwn.net \
    --cc=dapeng1.mi@linux.intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=davem@davemloft.net \
    --cc=ebiggers@kernel.org \
    --cc=edumazet@google.com \
    --cc=elver@google.com \
    --cc=horms@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=kees@kernel.org \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=lirongqing@baidu.com \
    --cc=mbloch@nvidia.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=paulmck@kernel.org \
    --cc=pawan.kumar.gupta@linux.intel.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.com \
    --cc=rdunlap@infradead.org \
    --cc=saeedm@nvidia.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tariqt@nvidia.com \
    --cc=tglx@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox