From: Mark Bloch <mbloch@nvidia.com>
To: Jiri Pirko <jiri@resnulli.us>
Cc: Eric Dumazet <edumazet@google.com>,
Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
Simon Horman <horms@kernel.org>,
Saeed Mahameed <saeedm@nvidia.com>,
Leon Romanovsky <leon@kernel.org>,
Tariq Toukan <tariqt@nvidia.com>,
Andrew Lunn <andrew+netdev@lunn.ch>,
Jonathan Corbet <corbet@lwn.net>,
Shuah Khan <skhan@linuxfoundation.org>,
netdev@vger.kernel.org, linux-rdma@vger.kernel.org,
linux-doc@vger.kernel.org
Subject: Re: [PATCH net-next V4 4/6] devlink: Apply eswitch mode boot defaults
Date: Fri, 3 Jul 2026 21:27:28 +0300 [thread overview]
Message-ID: <4fa57470-0d4f-43dc-af4d-e66ddb450923@nvidia.com> (raw)
In-Reply-To: <akYX4pMrDTnxa6yK@FV6GYCPJ69>
On 02/07/2026 10:52, Jiri Pirko wrote:
> Wed, Jul 01, 2026 at 07:42:57PM +0200, mbloch@nvidia.com wrote:
>>
>>
>> On 01/07/2026 17:09, Jiri Pirko wrote:
>>> Wed, Jul 01, 2026 at 02:57:21PM +0200, mbloch@nvidia.com wrote:
>>>>
>>>>
>>>> On 01/07/2026 12:48, Jiri Pirko wrote:
>>>>> Mon, Jun 29, 2026 at 08:20:59PM +0200, mbloch@nvidia.com wrote:
>>>>>> Apply parsed devlink_eswitch_mode= defaults after devlink registration
>>>>>> and after successful reload.
>>>>>>
>>>>>> devl_register() may still be called before the device is ready for an
>>>>>
>>>>> How so? I would assume that driver calls devl_register only after
>>>>> everything is up and running and ready. If not, isn't it a bug?
>>>>>
>>>>
>>>> You would think so :)
>>>>
>>>> Some drivers, mlx5 included, call devl_register() while holding the
>>>> devlink instance lock and then finish setting up state before releasing
>>>> the lock.
>>>>
>>>> In v3 I tried to enforce exactly that model, move devl_register() to
>>>> be the last thing the driver does. Jakub pushed back on making that a
>>>> general rule. So in v4 I changed the approach. devl_register() only
>>>> schedules the work, and the actual eswitch mode change can run only
>>>> after the driver releases the devlink lock.
>>>
>>> Wouldn't it make sense to use a completion instead of loop-reschedule of
>>> delayed work?
>>
>> Just to make sure I understand the suggestion, this would mean that the
>> work waits until the devlink lock holder drops the lock, and devl_unlock()
>> would signal it, something like:
>>
>> void devl_unlock(struct devlink *devlink)
>> {
>> ool complete_apply = devlink->default_esw_mode_apply_pending;
>>
>> mutex_unlock(&devlink->lock);
>>
>> if (complete_apply)
>> complete(&devlink->default_esw_mode_apply_ready);
>> }
>>
>> That would avoid the retry loop, but it also means the queued work
>> sleeps until the driver drops devl_lock. It does keep one worker
>> blocked per pending instance and adds this default-esw-mode signalling to
>> the generic devl_unlock() path.
>>
>> The delayed retry was meant to avoid a sleeping worker and keep the
>> instances independent. If one devlink instance is still locked, we just
>> try it again later while other instances can progress.
>>
>> If you prefer the completion approach I can switch to it, but I don't see
>> it as simpler overall.
>
> Yeah, I don't have preference. I was just wondering. Feel free to leave
> it as is.
>
> Maybe, instead of "complete", you can schedule with "0" delay in
> devl_unlock? Well, it does not really need to be delayed work, right?
> The only single schedule may be done from devl_unlock. That would help
> to eliminate the rescheduling. Am I missing something?
Yeah, that can work.
The only part I don't really like is adding default-esw-mode specific
logic to devl_unlock(). But if you are fine with that, I can switch to
this approach.
There is still a small race between mutex_unlock() and queue_work(), where
someone else can take devl_lock() first. So the worker may still wait on
the lock, but the window should be small and we get rid of the delayed
retry loop.
Mark
>
>
>>
>> Mark
>>
>>>
>>>>
>>>> Mark
>>>>
>>>>>
>>>>>> eswitch mode change, so keep a per-devlink delayed work item and pending
>>>>>> flag for the registration path. Registration queues the work, and the
>>>>>> worker tries to take the devlink instance lock.
>>>>>>
>>>>>> If the lock is busy, the worker requeues itself with a delay.
>>>>>>
>>>>>> For successful reloads that performed DRIVER_REINIT, devlink_reload()
>>>>>> already holds the devlink instance lock and the driver has completed
>>>>>> reload_up(). Clear pending work and apply the default directly from the
>>>>>> reload path instead of queueing work.
>>>>>>
>>>>>> If a user sets eswitch mode through netlink before the pending
>>>>>> registration work runs, clear the pending flag so the queued default does
>>>>>> not override that user request. Cancel pending default apply work when
>>>>>> freeing the devlink instance.
>>>>>
>>>>> These AI generated code descriptive messages are generally not very
>>>>> useful :(
>>>>>
>>>>
>>
next prev parent reply other threads:[~2026-07-03 18:27 UTC|newest]
Thread overview: 18+ messages / expand[flat|nested] mbox.gz Atom feed top
2026-06-29 18:20 [PATCH net-next V4 0/6] evlink: Add boot-time eswitch mode defaults Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 1/6] net/mlx5: Clear FW reset-in-progress bit before reload Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 2/6] devlink: Factor out eswitch mode setting Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 3/6] devlink: Parse eswitch mode boot defaults Mark Bloch
2026-07-01 9:38 ` Jiri Pirko
2026-07-01 12:55 ` Mark Bloch
2026-07-01 13:14 ` Mark Bloch
2026-06-29 18:20 ` [PATCH net-next V4 4/6] devlink: Apply " Mark Bloch
2026-07-01 9:48 ` Jiri Pirko
2026-07-01 12:57 ` Mark Bloch
2026-07-01 14:09 ` Jiri Pirko
2026-07-01 17:42 ` Mark Bloch
2026-07-02 7:52 ` Jiri Pirko
2026-07-03 18:27 ` Mark Bloch [this message]
2026-07-02 7:41 ` Paolo Abeni
2026-07-03 18:32 ` Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 5/6] devlink: Add API to apply eswitch mode boot default Mark Bloch
2026-06-29 18:21 ` [PATCH net-next V4 6/6] net/mlx5: Apply devlink eswitch mode boot default on probe Mark Bloch
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4fa57470-0d4f-43dc-af4d-e66ddb450923@nvidia.com \
--to=mbloch@nvidia.com \
--cc=andrew+netdev@lunn.ch \
--cc=corbet@lwn.net \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jiri@resnulli.us \
--cc=kuba@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-rdma@vger.kernel.org \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=saeedm@nvidia.com \
--cc=skhan@linuxfoundation.org \
--cc=tariqt@nvidia.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox