Re: [PATCH v2] phy: use per-PHY lockdep keys

From: Johan Hovold <johan@kernel.org>
To: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Cc: Vinod Koul <vkoul@kernel.org>,
	Kishon Vijay Abraham I <kishon@kernel.org>,
	Abel Vesa <abel.vesa@linaro.org>,
	Neil Armstrong <neil.armstrong@linaro.org>,
	linux-arm-msm@vger.kernel.org, linux-phy@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v2] phy: use per-PHY lockdep keys
Date: Thu, 5 Jun 2025 10:03:17 +0200	[thread overview]
Message-ID: <aEFPRWErB4QkbMkt@hovoldconsulting.com> (raw)
In-Reply-To: <20250530-phy-subinit-v2-1-09dfe80e82a8@oss.qualcomm.com>

On Fri, May 30, 2025 at 07:08:28PM +0300, Dmitry Baryshkov wrote:
> If the PHY driver uses another PHY internally (e.g. in case of eUSB2,
> repeaters are represented as PHYs), then it would trigger the following
> lockdep splat because all PHYs use a single static lockdep key and thus
> lockdep can not identify whether there is a dependency or not and
> reports a false positive.
> 
> Make PHY subsystem use dynamic lockdep keys, assigning each driver a
> separate key. This way lockdep can correctly identify dependency graph
> between mutexes.
> 
>  ============================================
>  WARNING: possible recursive locking detected
>  6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 Not tainted
>  --------------------------------------------
>  kworker/u51:0/78 is trying to acquire lock:
>  ffff0008116554f0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c
> 
>  but task is already holding lock:
>  ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c
> 
>  other info that might help us debug this:
>   Possible unsafe locking scenario:
> 
>         CPU0
>         ----
>    lock(&phy->mutex);
>    lock(&phy->mutex);
> 
>   *** DEADLOCK ***
> 
>   May be due to missing lock nesting notation
> 
>  4 locks held by kworker/u51:0/78:
>   #0: ffff000800010948 ((wq_completion)events_unbound){+.+.}-{0:0}, at: process_one_work+0x18c/0x5ec
>   #1: ffff80008036bdb0 (deferred_probe_work){+.+.}-{0:0}, at: process_one_work+0x1b4/0x5ec
>   #2: ffff0008094ac8f8 (&dev->mutex){....}-{4:4}, at: __device_attach+0x38/0x188
>   #3: ffff000813c10cf0 (&phy->mutex){+.+.}-{4:4}, at: phy_init+0x4c/0x12c
> 
>  stack backtrace:
>  CPU: 0 UID: 0 PID: 78 Comm: kworker/u51:0 Not tainted 6.15.0-rc7-next-20250522-12896-g3932f283970c #3455 PREEMPT
>  Hardware name: Qualcomm CRD, BIOS 6.0.240904.BOOT.MXF.2.4-00528.1-HAMOA-1 09/ 4/2024
>  Workqueue: events_unbound deferred_probe_work_func
>  Call trace:
>   show_stack+0x18/0x24 (C)
>   dump_stack_lvl+0x90/0xd0
>   dump_stack+0x18/0x24
>   print_deadlock_bug+0x258/0x348
>   __lock_acquire+0x10fc/0x1f84
>   lock_acquire+0x1c8/0x338
>   __mutex_lock+0xb8/0x59c
>   mutex_lock_nested+0x24/0x30
>   phy_init+0x4c/0x12c
>   snps_eusb2_hsphy_init+0x54/0x1a0
>   phy_init+0xe0/0x12c
>   dwc3_core_init+0x450/0x10b4
>   dwc3_core_probe+0xce4/0x15fc
>   dwc3_probe+0x64/0xb0

>   platform_probe+0x68/0xc4
>   really_probe+0xbc/0x298
>   __driver_probe_device+0x78/0x12c
>   driver_probe_device+0x3c/0x160
>   __device_attach_driver+0xb8/0x138
>   bus_for_each_drv+0x84/0xe0
>   __device_attach+0x9c/0x188
>   device_initial_probe+0x14/0x20
>   bus_probe_device+0xac/0xb0
>   deferred_probe_work_func+0x8c/0xc8
>   process_one_work+0x208/0x5ec
>   worker_thread+0x1c0/0x368
>   kthread+0x14c/0x20c
>   ret_from_fork+0x10/0x20

Nit: This last bit of the stack trace adds little value and can be
dropped.

> Fixes: 3584f6392f09 ("phy: qcom: phy-qcom-snps-eusb2: Add support for eUSB2 repeater")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
> ---
> Note: I've used a Fixes tag pointing to the commit which actually
> started using nested PHYs. If you think that it's incorrect, I'm fine
> with dropping it.

I think it's warranted. And if there were further users before this one
as Neil suggested you could just list them all as each has been
introducing a new splat.

> Note2: I've tried using mutex_lock_nested, however that didn't play
> well. We can not store nest level in the struct phy (as it can be used
> by different drivers), so using mutex_lock_nested() would require us to
> change and wrap all PHY APIs which take a lock internally. Using dynamic
> lockdep keys looks like a more ellegant solution, especially granted
> that there is no extra impact if lockdep is disabled.

Thanks for fixing this. I've been using a local hack based on
mutex_lock_nested() too but dynamic keys looks like the right way to go.

Perhaps you can add:

Reported-by: Johan Hovold <johan+linaro@kernel.org>
Link: https://lore.kernel.org/lkml/ZnpoAVGJMG4Zu-Jw@hovoldconsulting.com/

Works fine on the T14s:

Reviewed-by: Johan Hovold <johan+linaro@kernel.org>
Tested-by: Johan Hovold <johan+linaro@kernel.org>

Johan