From: Mark Bloch <mbloch@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>,
Paolo Abeni <pabeni@redhat.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
"David S. Miller" <davem@davemloft.net>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
Jiri Pirko <jiri@resnulli.us>, Simon Horman <horms@kernel.org>,
Sunil Goutham <sgoutham@marvell.com>,
Linu Cherian <lcherian@marvell.com>,
Geetha sowjanya <gakula@marvell.com>,
hariprasad <hkelam@marvell.com>,
Subbaraya Sundeep <sbhatta@marvell.com>,
Bharat Bhushan <bbhushan2@marvell.com>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>,
Ethan Nelson-Moore <enelsonmoore@gmail.com>,
linux-doc@vger.kernel.org, netdev@vger.kernel.org,
linux-rdma@vger.kernel.org
Subject: Re: [PATCH net-next V3 2/7] netdevsim: Register devlink after device init
Date: Thu, 11 Jun 2026 20:43:22 +0300 [thread overview]
Message-ID: <f266dfa5-0c6c-4be0-b73e-b2185dadd6a7@nvidia.com> (raw)
In-Reply-To: <20260611085440.4fe36bf2@kernel.org>
On 11/06/2026 18:54, Jakub Kicinski wrote:
> On Thu, 11 Jun 2026 09:02:03 +0300 Mark Bloch wrote:
>> On 11/06/2026 2:50, Jakub Kicinski wrote:
>>> On Fri, 5 Jun 2026 21:10:25 +0300 Mark Bloch wrote:
>>>> devl_register() makes the devlink instance visible to userspace. A later
>>>> patch also makes registration the point where devlink core may call
>>>> eswitch_mode_set() to apply a boot-time default eswitch mode.
>>>>
>>>> Move netdevsim registration after all objects (resources, params, regions,
>>>> traps, debugfs etc) are initialized, and after the initial eswitch mode is
>>>> set to legacy.
>>>>
>>>> Move devl_unregister() to the beginning of nsim_drv_remove(), before those
>>>> devlink objects are torn down. This keeps devlink register/unregister as
>>>> the notification barrier and makes the later object teardown paths run
>>>> after devlink is no longer registered, so they do not emit their own
>>>> netlink DEL notifications.
>>>
>>> This is going backwards. At some point someone from nVidia thought that
>>> we can order our way out of locking, so mlx5 is likely ordered this way,
>>> but this must not be required, or in any way normalized.
>>> We (syzbot) quickly discovered that it doesn't cover all corner cases.
>>> devl_lock() is exposed specifically to allow the driver to finish
>>> whatever init it needs without letting user space invoke callbacks, yet.
>>> Almost (?) all driver callbacks hold devl_lock(), so maybe the devlink
>>> instance is "visible" to user space but that should not matter.
>>
>> Let me clarify.
>>
>> No locking is changed here, and I don't want to make register/unregister
>> ordering a substitute for devl_lock().
>>
>> The only requirement I have for this series is that devl_register() is called
>> only once the driver is ready for devlink core to call eswitch_mode_set().
>> That follows from the earlier direction to have the core apply the default
>> mode from devl_register() instead of adding an explicit driver call.
>
> This is exactly what I'm objecting to. AFAIU we are trading off
> explicit call to get the default value for an implicit behavior
> depending on order of calls. We want to optimize for how easy it
> is to get the API wrong, not for LoC.
Right, the reason I moved in this direction is that in v1 I had
the explicit driver call, and Jiri asked to make this transparent
from devlink core instead.
>
> If we don't have a clean way to implement this without driver
> changes let's add the explicit API to get the default value.
> If driver doesn't call it schedule a work to go via the callback
> once devl_lock() is dropped. That way drivers which care can optimize
> themselves by reading the default value upfront. Drivers which don't
> care will work correctly, and there's no API call order trap.
The workqueue fallback is possible, but I think it makes the semantics
more complicated.
We would need to track devlink instances which still need the default
applied, and the worker would have to skip/remove them once handled.
More importantly, the worker can race with userspace setting the
eswitch mode, so we would also need some state to tell whether the user
already changed the mode. That feels more fragile than an explicit
driver call.
>
> Not ideal, but isn't that best we can do here?
> I still have flashbacks of the fallout from the call ordering games,
> we have too many drivers to keep this straight...
That's why I started with the explicit call in the first place.
I can switch back to this model: drivers which support boot time eswitch
defaults will opt in and call the helper once they are ready. This keeps
the support explicit per driver and avoids making it depend on where
devl_register() happens in the init path.
With that, devlink can tell at register time whether the instance supports
boot time eswitch defaults. If the user configured a default for an instance
whose driver did not opt in, devlink can write to dmesg from
devl_register().
Not perfect, but at least the user gets a visible failure instead of the
config being silently ignored.
Mark
>
>> So if the objection is to the commit message wording, I can fix that and drop
>> the "notification barrier" language.
>>
>> For unregister, I can probably leave the old ordering as-is. I moved it only
>> to mirror the register path, which felt cleaner, but it is not required for
>> the default-mode change and as the lock is held I see no issue with doing
>> that.
next prev parent reply other threads:[~2026-06-11 17:43 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-05 18:10 [PATCH net-next V3 0/7] devlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 1/7] devlink: Skip health recover notifications before register Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 2/7] netdevsim: Register devlink after device init Mark Bloch
2026-06-10 23:50 ` Jakub Kicinski
2026-06-11 6:02 ` Mark Bloch
2026-06-11 15:54 ` Jakub Kicinski
2026-06-11 17:43 ` Mark Bloch [this message]
2026-06-05 18:10 ` [PATCH net-next V3 3/7] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 4/7] net/mlx5: Register devlink after device init Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 5/7] octeontx2-af: Register devlink after SR-IOV init Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 6/7] octeontx2-pf: Register devlink after SR-IOV state init Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 7/7] devlink: Add eswitch mode boot defaults Mark Bloch
2026-06-05 19:37 ` [PATCH net-next V3 0/7] devlink: Add boot-time eswitch mode defaults Borislav Petkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f266dfa5-0c6c-4be0-b73e-b2185dadd6a7@nvidia.com \
--to=mbloch@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=bbhushan2@marvell.com \
--cc=corbet@lwn.net \
--cc=davem@davemloft.net \
--cc=edumazet@google.com \
--cc=enelsonmoore@gmail.com \
--cc=gakula@marvell.com \
--cc=hkelam@marvell.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=lcherian@marvell.com \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=sbhatta@marvell.com \
--cc=sgoutham@marvell.com \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox