Netdev List
 help / color / mirror / Atom feed
From: Mark Bloch <mbloch@nvidia.com>
To: Jakub Kicinski <kuba@kernel.org>, Jiri Pirko <jiri@resnulli.us>
Cc: Eric Dumazet <edumazet@google.com>,
	Paolo Abeni <pabeni@redhat.com>,
	Andrew Lunn <andrew+netdev@lunn.ch>,
	"David S. Miller" <davem@davemloft.net>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <skhan@linuxfoundation.org>,
	Jiri Pirko <jiri@resnulli.us>, Simon Horman <horms@kernel.org>,
	Sunil Goutham <sgoutham@marvell.com>,
	Linu Cherian <lcherian@marvell.com>,
	Geetha sowjanya <gakula@marvell.com>,
	hariprasad <hkelam@marvell.com>,
	Subbaraya Sundeep <sbhatta@marvell.com>,
	Bharat Bhushan <bbhushan2@marvell.com>,
	Saeed Mahameed <saeedm@nvidia.com>,
	Leon Romanovsky <leon@kernel.org>,
	Tariq Toukan <tariqt@nvidia.com>,
	Ethan Nelson-Moore <enelsonmoore@gmail.com>,
	linux-doc@vger.kernel.org, netdev@vger.kernel.org,
	linux-rdma@vger.kernel.org
Subject: Re: [PATCH net-next V3 2/7] netdevsim: Register devlink after device init
Date: Tue, 16 Jun 2026 20:29:25 +0300	[thread overview]
Message-ID: <7635d50c-1c82-4090-8907-53a72444fc04@nvidia.com> (raw)
In-Reply-To: <f266dfa5-0c6c-4be0-b73e-b2185dadd6a7@nvidia.com>



On 11/06/2026 20:43, Mark Bloch wrote:
> 
> 
> On 11/06/2026 18:54, Jakub Kicinski wrote:
>> On Thu, 11 Jun 2026 09:02:03 +0300 Mark Bloch wrote:
>>> On 11/06/2026 2:50, Jakub Kicinski wrote:
>>>> On Fri, 5 Jun 2026 21:10:25 +0300 Mark Bloch wrote:  
>>>>> devl_register() makes the devlink instance visible to userspace. A later
>>>>> patch also makes registration the point where devlink core may call
>>>>> eswitch_mode_set() to apply a boot-time default eswitch mode.
>>>>>
>>>>> Move netdevsim registration after all objects (resources, params, regions,
>>>>> traps, debugfs etc) are initialized, and after the initial eswitch mode is
>>>>> set to legacy.
>>>>>
>>>>> Move devl_unregister() to the beginning of nsim_drv_remove(), before those
>>>>> devlink objects are torn down. This keeps devlink register/unregister as
>>>>> the notification barrier and makes the later object teardown paths run
>>>>> after devlink is no longer registered, so they do not emit their own
>>>>> netlink DEL notifications.  
>>>>
>>>> This is going backwards. At some point someone from nVidia thought that
>>>> we can order our way out of locking, so mlx5 is likely ordered this way,
>>>> but this must not be required, or in any way normalized.
>>>> We (syzbot) quickly discovered that it doesn't cover all corner cases.
>>>> devl_lock() is exposed specifically to allow the driver to finish
>>>> whatever init it needs without letting user space invoke callbacks, yet.
>>>> Almost (?) all driver callbacks hold devl_lock(), so maybe the devlink
>>>> instance is "visible" to user space but that should not matter.  
>>>
>>> Let me clarify.
>>>
>>> No locking is changed here, and I don't want to make register/unregister
>>> ordering a substitute for devl_lock().
>>>
>>> The only requirement I have for this series is that devl_register() is called
>>> only once the driver is ready for devlink core to call eswitch_mode_set().
>>> That follows from the earlier direction to have the core apply the default
>>> mode from devl_register() instead of adding an explicit driver call.
>>
>> This is exactly what I'm objecting to. AFAIU we are trading off
>> explicit call to get the default value for an implicit behavior
>> depending on order of calls. We want to optimize for how easy it
>> is to get the API wrong, not for LoC.
> 
> Right, the reason I moved in this direction is that in v1 I had
> the explicit driver call, and Jiri asked to make this transparent
> from devlink core instead.
> 
>>
>> If we don't have a clean way to implement this without driver
>> changes let's add the explicit API to get the default value.
>> If driver doesn't call it schedule a work to go via the callback
>> once devl_lock() is dropped. That way drivers which care can optimize
>> themselves by reading the default value upfront. Drivers which don't 
>> care will work correctly, and there's no API call order trap.
> 
> The workqueue fallback is possible, but I think it makes the semantics
> more complicated.
> 
> We would need to track devlink instances which still need the default
> applied, and the worker would have to skip/remove them once handled.
> 
> More importantly, the worker can race with userspace setting the
> eswitch mode, so we would also need some state to tell whether the user
> already changed the mode. That feels more fragile than an explicit
> driver call.
> 
>>
>> Not ideal, but isn't that best we can do here?
>> I still have flashbacks of the fallout from the call ordering games, 
>> we have too many drivers to keep this straight...
> 
> That's why I started with the explicit call in the first place.
> 
> I can switch back to this model: drivers which support boot time eswitch
> defaults will opt in and call the helper once they are ready. This keeps
> the support explicit per driver and avoids making it depend on where
> devl_register() happens in the init path.
> 
> With that, devlink can tell at register time whether the instance supports
> boot time eswitch defaults. If the user configured a default for an instance
> whose driver did not opt in, devlink can write to dmesg from
> devl_register().
> 
> Not perfect, but at least the user gets a visible failure instead of the
> config being silently ignored.
> 
> Mark

Jakub, Jiri, any thoughts?

I think the explicit helper is the cleanest option here, without any
workqueue fallback inside devlink. It avoids depending on devl_register()
ordering, and makes the support explicit per driver.

Does that sound like an acceptable direction?

Mark

> 
>>
>>> So if the objection is to the commit message wording, I can fix that and drop
>>> the "notification barrier" language.
>>>
>>> For unregister, I can probably leave the old ordering as-is. I moved it only
>>> to mirror the register path, which felt cleaner, but it is not required for
>>> the default-mode change and as the lock is held I see no issue with doing
>>> that.
> 
> 


  reply	other threads:[~2026-06-16 17:29 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2026-06-05 18:10 [PATCH net-next V3 0/7] devlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 1/7] devlink: Skip health recover notifications before register Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 2/7] netdevsim: Register devlink after device init Mark Bloch
2026-06-10 23:50   ` Jakub Kicinski
2026-06-11  6:02     ` Mark Bloch
2026-06-11 15:54       ` Jakub Kicinski
2026-06-11 17:43         ` Mark Bloch
2026-06-16 17:29           ` Mark Bloch [this message]
2026-06-16 21:05             ` Jakub Kicinski
2026-06-05 18:10 ` [PATCH net-next V3 3/7] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 4/7] net/mlx5: Register devlink after device init Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 5/7] octeontx2-af: Register devlink after SR-IOV init Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 6/7] octeontx2-pf: Register devlink after SR-IOV state init Mark Bloch
2026-06-05 18:10 ` [PATCH net-next V3 7/7] devlink: Add eswitch mode boot defaults Mark Bloch
2026-06-05 19:37 ` [PATCH net-next V3 0/7] devlink: Add boot-time eswitch mode defaults Borislav Petkov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7635d50c-1c82-4090-8907-53a72444fc04@nvidia.com \
    --to=mbloch@nvidia.com \
    --cc=andrew+netdev@lunn.ch \
    --cc=bbhushan2@marvell.com \
    --cc=corbet@lwn.net \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=enelsonmoore@gmail.com \
    --cc=gakula@marvell.com \
    --cc=hkelam@marvell.com \
    --cc=horms@kernel.org \
    --cc=jiri@resnulli.us \
    --cc=kuba@kernel.org \
    --cc=lcherian@marvell.com \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-rdma@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=saeedm@nvidia.com \
    --cc=sbhatta@marvell.com \
    --cc=sgoutham@marvell.com \
    --cc=skhan@linuxfoundation.org \
    --cc=tariqt@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox