linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
@ 2025-07-30 17:58 Russell King (Oracle)
  2025-07-30 19:43 ` Krzysztof Kozlowski
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2025-07-30 17:58 UTC (permalink / raw)
  To: Chanwoo Choi, Krzysztof Kozlowski
  Cc: Alexandre Belloni, Linus Walleij, Bartosz Golaszewski, linux-gpio,
	linux-rtc, linux-kernel

Hi,

First, I'm not sure who is responsible for the max77620-gpio driver
(it's not in MAINTAINERS) but this bug points towards a problem with
one or other of these drivers.

Here is /proc/interrupts which may help debug this:

           CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
 94:          1          0          0          0          0          0 max77620-
top   4 Edge      max77686-rtc
 95:          1          0          0          0          0          0 max77686-rtc   1 Edge      rtc-alarm1

While running 6.16-rc7 on the Jetson Xavier NX platform, upon suspend,
I receive the following lockdep splat. I've added some instrumentation
into irq_set_irq_wake() which appears twice in the calltrace to print
the IRQ number and the "on" parameter to locate which interrupts are
involved in this splat. This splat is 100% reproducable.

[   46.721367] irq_set_irq_wake: irq=95 on=1
[   46.722067] irq_set_irq_wake: irq=94 on=1
[   46.722181] ============================================
[   46.722578] WARNING: possible recursive locking detected
[   46.722852] 6.16.0-rc7-net-next+ #432 Not tainted
[   46.722965] --------------------------------------------
[   46.723127] rtcwake/3984 is trying to acquire lock:
[   46.723235] ffff0000813b2c68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
[   46.723452]
               but task is already holding lock:
[   46.723556] ffff00008504dc68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
[   46.723780]
               other info that might help us debug this:
[   46.723903]  Possible unsafe locking scenario:

[   46.724015]        CPU0
[   46.724067]        ----
[   46.724119]   lock(&d->lock);
[   46.724212]   lock(&d->lock);
[   46.724282]
                *** DEADLOCK ***

[   46.724348]  May be due to missing lock nesting notation

[   46.724492] 6 locks held by rtcwake/3984:
[   46.724576]  #0: ffff0000825693f8 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x184/0x350
[   46.724902]  #1: ffff00008fd7fa88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x104/0x1c8
[   46.725258]  #2: ffff000080a64588 (kn->active#87){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x1c8
[   46.725609]  #3: ffff8000815d4fb8 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x220/0x300
[   46.725897]  #4: ffff00008500a8f8 (&dev->mutex){....}-{4:4}, at: device_suspend+0x1d8/0x630
[   46.726173]  #5: ffff00008504dc68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
[   46.732435]
               stack backtrace:
[   46.734019] CPU: 3 UID: 0 PID: 3984 Comm: rtcwake Not tainted 6.16.0-rc7-net-next+ #432 PREEMPT
[   46.734029] Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
[   46.734033] Call trace:
[   46.734036]  show_stack+0x18/0x24 (C)
[   46.734070]  dump_stack_lvl+0x90/0xd0
[   46.734080]  dump_stack+0x18/0x24
[   46.734107]  print_deadlock_bug+0x260/0x350
[   46.734114]  __lock_acquire+0xf28/0x2088
[   46.734120]  lock_acquire+0x19c/0x33c
[   46.734126]  __mutex_lock+0x84/0x530
[   46.734135]  mutex_lock_nested+0x24/0x30
[   46.734155]  regmap_irq_lock+0x18/0x24
[   46.734161]  __irq_get_desc_lock+0x8c/0x9c
[   46.734170]  irq_set_irq_wake+0x5c/0x1ac	<== I guess IRQ 94
[   46.734176]  regmap_irq_sync_unlock+0x314/0x4f4
[   46.734182]  __irq_put_desc_unlock+0x48/0x4c
[   46.734190]  irq_set_irq_wake+0x88/0x1ac	<== I guess IRQ 95
[   46.734195]  max77686_rtc_suspend+0x34/0x74
[   46.734206]  platform_pm_suspend+0x2c/0x6c
[   46.734214]  dpm_run_callback+0xa4/0x218
[   46.734221]  device_suspend+0x200/0x630
[   46.734227]  dpm_suspend+0x17c/0x2d0
[   46.734233]  dpm_suspend_start+0x74/0x7c
[   46.734240]  suspend_devices_and_enter+0x104/0x618
[   46.734247]  pm_suspend+0x1b4/0x300
[   46.734254]  state_store+0x8c/0x110
[   46.734261]  kobj_attr_store+0x18/0x2c
[   46.734268]  sysfs_kf_write+0x50/0x7c
[   46.734275]  kernfs_fop_write_iter+0x134/0x1c8
[   46.734282]  vfs_write+0x24c/0x350
[   46.734289]  ksys_write+0x58/0xec
[   46.734295]  __arm64_sys_write+0x1c/0x28
[   46.734302]  invoke_syscall.constprop.0+0x50/0xe4
[   46.734311]  do_el0_svc+0x40/0xc8
[   46.734318]  el0_svc+0x44/0x148
[   46.734327]  el0t_64_sync_handler+0xc8/0xcc
[   46.734333]  el0t_64_sync+0x19c/0x1a0

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-30 17:58 [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc Russell King (Oracle)
@ 2025-07-30 19:43 ` Krzysztof Kozlowski
  2025-07-31 12:18   ` Russell King (Oracle)
  0 siblings, 1 reply; 12+ messages in thread
From: Krzysztof Kozlowski @ 2025-07-30 19:43 UTC (permalink / raw)
  To: Russell King (Oracle), Chanwoo Choi
  Cc: Alexandre Belloni, Linus Walleij, Bartosz Golaszewski, linux-gpio,
	linux-rtc, linux-kernel, Thierry Reding

On 30/07/2025 19:58, Russell King (Oracle) wrote:
> Hi,
> 
> First, I'm not sure who is responsible for the max77620-gpio driver

77620 is only for nvidia platforms and nvidia was upstreaming it,
although it shares the RTC driver part with max77686. You should Cc
nvidia SoC maintainers, maybe Thierry has someone around who could
investigate it.

> (it's not in MAINTAINERS) but this bug points towards a problem with
> one or other of these drivers.
> 
> Here is /proc/interrupts which may help debug this:
> 
>            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
>  94:          1          0          0          0          0          0 max77620-
> top   4 Edge      max77686-rtc
>  95:          1          0          0          0          0          0 max77686-rtc   1 Edge      rtc-alarm1
> 
> While running 6.16-rc7 on the Jetson Xavier NX platform, upon suspend,
> I receive the following lockdep splat. I've added some instrumentation
> into irq_set_irq_wake() which appears twice in the calltrace to print
> the IRQ number and the "on" parameter to locate which interrupts are
> involved in this splat. This splat is 100% reproducable.
> 
> [   46.721367] irq_set_irq_wake: irq=95 on=1
> [   46.722067] irq_set_irq_wake: irq=94 on=1
> [   46.722181] ============================================
> [   46.722578] WARNING: possible recursive locking detected
> [   46.722852] 6.16.0-rc7-net-next+ #432 Not tainted
> [   46.722965] --------------------------------------------
> [   46.723127] rtcwake/3984 is trying to acquire lock:
> [   46.723235] ffff0000813b2c68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
> [   46.723452]
>                but task is already holding lock:
> [   46.723556] ffff00008504dc68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
> [   46.723780]
>                other info that might help us debug this:
> [   46.723903]  Possible unsafe locking scenario:
> 
> [   46.724015]        CPU0
> [   46.724067]        ----
> [   46.724119]   lock(&d->lock);
> [   46.724212]   lock(&d->lock);
> [   46.724282]
>                 *** DEADLOCK ***
> 
> [   46.724348]  May be due to missing lock nesting notation
> 
> [   46.724492] 6 locks held by rtcwake/3984:
> [   46.724576]  #0: ffff0000825693f8 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x184/0x350
> [   46.724902]  #1: ffff00008fd7fa88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x104/0x1c8
> [   46.725258]  #2: ffff000080a64588 (kn->active#87){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x1c8
> [   46.725609]  #3: ffff8000815d4fb8 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x220/0x300
> [   46.725897]  #4: ffff00008500a8f8 (&dev->mutex){....}-{4:4}, at: device_suspend+0x1d8/0x630
> [   46.726173]  #5: ffff00008504dc68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24


max77686 only disables/enables interrupts in suspend path, but max77620
is doing also I2C transfers, but above is regmap_irq_lock, not regmap
lock. Maybe this is not really max77620/77686 related at all? None of
these do anything weird (or different than last 5 years), so missing
nesting could be result of changes in other parts...


> [   46.732435]
>                stack backtrace:
> [   46.734019] CPU: 3 UID: 0 PID: 3984 Comm: rtcwake Not tainted 6.16.0-rc7-net-next+ #432 PREEMPT
> [   46.734029] Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> [   46.734033] Call trace:
> [   46.734036]  show_stack+0x18/0x24 (C)
> [   46.734070]  dump_stack_lvl+0x90/0xd0
> [   46.734080]  dump_stack+0x18/0x24
> [   46.734107]  print_deadlock_bug+0x260/0x350
> [   46.734114]  __lock_acquire+0xf28/0x2088
> [   46.734120]  lock_acquire+0x19c/0x33c
> [   46.734126]  __mutex_lock+0x84/0x530
> [   46.734135]  mutex_lock_nested+0x24/0x30
> [   46.734155]  regmap_irq_lock+0x18/0x24
> [   46.734161]  __irq_get_desc_lock+0x8c/0x9c
> [   46.734170]  irq_set_irq_wake+0x5c/0x1ac	<== I guess IRQ 94

...like changes in irqchip.

> [   46.734176]  regmap_irq_sync_unlock+0x314/0x4f4
> [   46.734182]  __irq_put_desc_unlock+0x48/0x4c
> [   46.734190]  irq_set_irq_wake+0x88/0x1ac	<== I guess IRQ 95
> [   46.734195]  max77686_rtc_suspend+0x34/0x74


Because really above part is virtually unchanged since 10 years, except
my commit d8f090dbeafdcc3d30761aa0062f19d1adf9ef08 (you can try
reverting it... but it still could be correct/needed and just irqchip
changed something around locking).

Best regards,
Krzysztof

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-30 19:43 ` Krzysztof Kozlowski
@ 2025-07-31 12:18   ` Russell King (Oracle)
  2025-07-31 12:31     ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2025-07-31 12:18 UTC (permalink / raw)
  To: Krzysztof Kozlowski, Mark Brown
  Cc: Chanwoo Choi, Alexandre Belloni, Linus Walleij,
	Bartosz Golaszewski, linux-gpio, linux-rtc, linux-kernel,
	Thierry Reding

On Wed, Jul 30, 2025 at 09:43:02PM +0200, Krzysztof Kozlowski wrote:
> On 30/07/2025 19:58, Russell King (Oracle) wrote:
> > Hi,
> > 
> > First, I'm not sure who is responsible for the max77620-gpio driver
> 
> 77620 is only for nvidia platforms and nvidia was upstreaming it,
> although it shares the RTC driver part with max77686. You should Cc
> nvidia SoC maintainers, maybe Thierry has someone around who could
> investigate it.
> 
> > (it's not in MAINTAINERS) but this bug points towards a problem with
> > one or other of these drivers.
> > 
> > Here is /proc/interrupts which may help debug this:
> > 
> >            CPU0       CPU1       CPU2       CPU3       CPU4       CPU5
> >  94:          1          0          0          0          0          0 max77620-
> > top   4 Edge      max77686-rtc
> >  95:          1          0          0          0          0          0 max77686-rtc   1 Edge      rtc-alarm1
> > 
> > While running 6.16-rc7 on the Jetson Xavier NX platform, upon suspend,
> > I receive the following lockdep splat. I've added some instrumentation
> > into irq_set_irq_wake() which appears twice in the calltrace to print
> > the IRQ number and the "on" parameter to locate which interrupts are
> > involved in this splat. This splat is 100% reproducable.
> > 
> > [   46.721367] irq_set_irq_wake: irq=95 on=1
> > [   46.722067] irq_set_irq_wake: irq=94 on=1
> > [   46.722181] ============================================
> > [   46.722578] WARNING: possible recursive locking detected
> > [   46.722852] 6.16.0-rc7-net-next+ #432 Not tainted
> > [   46.722965] --------------------------------------------
> > [   46.723127] rtcwake/3984 is trying to acquire lock:
> > [   46.723235] ffff0000813b2c68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
> > [   46.723452]
> >                but task is already holding lock:
> > [   46.723556] ffff00008504dc68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
> > [   46.723780]
> >                other info that might help us debug this:
> > [   46.723903]  Possible unsafe locking scenario:
> > 
> > [   46.724015]        CPU0
> > [   46.724067]        ----
> > [   46.724119]   lock(&d->lock);
> > [   46.724212]   lock(&d->lock);
> > [   46.724282]
> >                 *** DEADLOCK ***
> > 
> > [   46.724348]  May be due to missing lock nesting notation
> > 
> > [   46.724492] 6 locks held by rtcwake/3984:
> > [   46.724576]  #0: ffff0000825693f8 (sb_writers#3){.+.+}-{0:0}, at: vfs_write+0x184/0x350
> > [   46.724902]  #1: ffff00008fd7fa88 (&of->mutex#2){+.+.}-{4:4}, at: kernfs_fop_write_iter+0x104/0x1c8
> > [   46.725258]  #2: ffff000080a64588 (kn->active#87){.+.+}-{0:0}, at: kernfs_fop_write_iter+0x10c/0x1c8
> > [   46.725609]  #3: ffff8000815d4fb8 (system_transition_mutex){+.+.}-{4:4}, at: pm_suspend+0x220/0x300
> > [   46.725897]  #4: ffff00008500a8f8 (&dev->mutex){....}-{4:4}, at: device_suspend+0x1d8/0x630
> > [   46.726173]  #5: ffff00008504dc68 (&d->lock){+.+.}-{4:4}, at: regmap_irq_lock+0x18/0x24
> 
> 
> max77686 only disables/enables interrupts in suspend path, but max77620
> is doing also I2C transfers, but above is regmap_irq_lock, not regmap
> lock. Maybe this is not really max77620/77686 related at all? None of
> these do anything weird (or different than last 5 years), so missing
> nesting could be result of changes in other parts...
> 
> 
> > [   46.732435]
> >                stack backtrace:
> > [   46.734019] CPU: 3 UID: 0 PID: 3984 Comm: rtcwake Not tainted 6.16.0-rc7-net-next+ #432 PREEMPT
> > [   46.734029] Hardware name: NVIDIA NVIDIA Jetson Xavier NX Developer Kit/Jetson, BIOS 6.0-37391689 08/28/2024
> > [   46.734033] Call trace:
> > [   46.734036]  show_stack+0x18/0x24 (C)
> > [   46.734070]  dump_stack_lvl+0x90/0xd0
> > [   46.734080]  dump_stack+0x18/0x24
> > [   46.734107]  print_deadlock_bug+0x260/0x350
> > [   46.734114]  __lock_acquire+0xf28/0x2088
> > [   46.734120]  lock_acquire+0x19c/0x33c
> > [   46.734126]  __mutex_lock+0x84/0x530
> > [   46.734135]  mutex_lock_nested+0x24/0x30
> > [   46.734155]  regmap_irq_lock+0x18/0x24
> > [   46.734161]  __irq_get_desc_lock+0x8c/0x9c
> > [   46.734170]  irq_set_irq_wake+0x5c/0x1ac	<== I guess IRQ 94
> 
> ...like changes in irqchip.
> 
> > [   46.734176]  regmap_irq_sync_unlock+0x314/0x4f4
> > [   46.734182]  __irq_put_desc_unlock+0x48/0x4c
> > [   46.734190]  irq_set_irq_wake+0x88/0x1ac	<== I guess IRQ 95
> > [   46.734195]  max77686_rtc_suspend+0x34/0x74
> 
> 
> Because really above part is virtually unchanged since 10 years, except
> my commit d8f090dbeafdcc3d30761aa0062f19d1adf9ef08 (you can try
> reverting it... but it still could be correct/needed and just irqchip
> changed something around locking).

Thanks. I can say that reverting this has no effect on lockdep's splat,
so your change is in the clear.

However, there's also regmap-irq stuff to consider in the backtrace,
and looking at this today, I can't see how regmap-irq can be nested.

drivers/rtc/rtc-max77686.c makes use of regmap-irq since commit
f3937549a975 ("rtc: max77686: move initialisation of rtc regmap, irq
chip locally") in 2016.

drivers/mfd/max77620.c also makes use of regmap-irq since commit
327156c59360 ("mfd: max77620: Add core driver for MAX77620/MAX20024")
also in 2016.

Looking at the regmap-irq code, not much has changed in
regmap_irq_lock() and regmap_irq_sync_unlock(). The same seems true of
the ordering in irq_set_irq_wake(). So, I don't think this is a
regression as such, but a latent bug that either no one has bothered
to report, or no one bothers to test with lockdep enabled anymore.

I think the sequence here is:

irq_set_irq_wake() for IRQ 95 (rtc-alarm1) is called.
 + regmap_irq_lock() is called, which takes d->lock for IRQ 95
 + ... irq_set_irq_wake() does stuff
 ` regmap_irq_sync_unlock() is then called
   + this synchronises the wake state with the parent by calling
   | disable_irq_wake() or enable_irq_wake() as appropriate.
   | This calls irq_set_irq_wake(), causing recursion but on a
   | different IRQ, which also uses regmap-irq.
   | + regmap_irq_lock() is called, which takes d->lock for IRQ 94
   | |    * SPLAT *
   | + ...
   ` d->lock is released

This highlights the problem with "d->lock" - using "d" for a variable
name, while short, doesn't actually tell us what lock it is - is it
the irqdesc lock in kernel/irq ? Is it the regmap_irq_chip_data mutex
called "lock" in regmap_irq? It looks to me like it's the mutex.

So, I'd like to start a campaign against single-letter variables,
especially when it comes to code that takes locks! We should have
something in the kernel coding style which prevents single-letter
variable names when locks are taken!

I can't see that anything has changed in the code with regards to the
locking, so I think this is a bug that's been present ever since these
drivers were introduced, and regmap-irq is deficient in that it causes
the same lockdep lock class to be taken recursively when the IRQ wake
state changes.

From what I can see, irq wake support for regmap-irq was added in
commit a43fd50dc99a5 ("regmap: Implement support for wake IRQs") and
this is the only operation that is propagated to the parent
interupt(s). Thus, the above splat is unlikely to occur unless one
makes use of wake support on a regmap-irq based interrupt whose
parent is also regmap-irq based.

So, adding Mark Brown.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 12:18   ` Russell King (Oracle)
@ 2025-07-31 12:31     ` Mark Brown
  2025-07-31 12:43       ` Russell King (Oracle)
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2025-07-31 12:31 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

[-- Attachment #1: Type: text/plain, Size: 861 bytes --]

On Thu, Jul 31, 2025 at 01:18:19PM +0100, Russell King (Oracle) wrote:

> I can't see that anything has changed in the code with regards to the
> locking, so I think this is a bug that's been present ever since these
> drivers were introduced, and regmap-irq is deficient in that it causes
> the same lockdep lock class to be taken recursively when the IRQ wake
> state changes.

> From what I can see, irq wake support for regmap-irq was added in
> commit a43fd50dc99a5 ("regmap: Implement support for wake IRQs") and
> this is the only operation that is propagated to the parent
> interupt(s). Thus, the above splat is unlikely to occur unless one
> makes use of wake support on a regmap-irq based interrupt whose
> parent is also regmap-irq based.

Yes, your analysis is right here - it's not come up before because it's
very rare to chain regmap-irq chips.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 12:31     ` Mark Brown
@ 2025-07-31 12:43       ` Russell King (Oracle)
  2025-07-31 13:18         ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2025-07-31 12:43 UTC (permalink / raw)
  To: Mark Brown
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

On Thu, Jul 31, 2025 at 01:31:32PM +0100, Mark Brown wrote:
> On Thu, Jul 31, 2025 at 01:18:19PM +0100, Russell King (Oracle) wrote:
> 
> > I can't see that anything has changed in the code with regards to the
> > locking, so I think this is a bug that's been present ever since these
> > drivers were introduced, and regmap-irq is deficient in that it causes
> > the same lockdep lock class to be taken recursively when the IRQ wake
> > state changes.
> 
> > From what I can see, irq wake support for regmap-irq was added in
> > commit a43fd50dc99a5 ("regmap: Implement support for wake IRQs") and
> > this is the only operation that is propagated to the parent
> > interupt(s). Thus, the above splat is unlikely to occur unless one
> > makes use of wake support on a regmap-irq based interrupt whose
> > parent is also regmap-irq based.
> 
> Yes, your analysis is right here - it's not come up before because it's
> very rare to chain regmap-irq chips.

Yep, I just changed all the "d" variables in regmap-irq to "ricd"
(first letter of the each word of the struct name), and lockdep
confirms that it's the mutex.

I'm not familiar enough with lockdep to know how to fix this, so what's
the solution here?

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 12:43       ` Russell King (Oracle)
@ 2025-07-31 13:18         ` Mark Brown
  2025-07-31 15:57           ` Russell King (Oracle)
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2025-07-31 13:18 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

[-- Attachment #1: Type: text/plain, Size: 671 bytes --]

On Thu, Jul 31, 2025 at 01:43:14PM +0100, Russell King (Oracle) wrote:
> On Thu, Jul 31, 2025 at 01:31:32PM +0100, Mark Brown wrote:

> > Yes, your analysis is right here - it's not come up before because it's
> > very rare to chain regmap-irq chips.

> Yep, I just changed all the "d" variables in regmap-irq to "ricd"
> (first letter of the each word of the struct name), and lockdep
> confirms that it's the mutex.

> I'm not familiar enough with lockdep to know how to fix this, so what's
> the solution here?

I *think* mutex_lock_nested() is what we're looking for here, with the
depth information from the irq_desc but I'm also not super familiar with
this stuff.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 13:18         ` Mark Brown
@ 2025-07-31 15:57           ` Russell King (Oracle)
  2025-07-31 16:16             ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2025-07-31 15:57 UTC (permalink / raw)
  To: Mark Brown
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

On Thu, Jul 31, 2025 at 02:18:24PM +0100, Mark Brown wrote:
> On Thu, Jul 31, 2025 at 01:43:14PM +0100, Russell King (Oracle) wrote:
> > On Thu, Jul 31, 2025 at 01:31:32PM +0100, Mark Brown wrote:
> 
> > > Yes, your analysis is right here - it's not come up before because it's
> > > very rare to chain regmap-irq chips.
> 
> > Yep, I just changed all the "d" variables in regmap-irq to "ricd"
> > (first letter of the each word of the struct name), and lockdep
> > confirms that it's the mutex.
> 
> > I'm not familiar enough with lockdep to know how to fix this, so what's
> > the solution here?
> 
> I *think* mutex_lock_nested() is what we're looking for here, with the
> depth information from the irq_desc but I'm also not super familiar with
> this stuff.

I'm not sure about that, because the irq_desc locks don't nest:

        raw_spin_lock_init(&desc->lock);
        lockdep_set_class(&desc->lock, &irq_desc_lock_class);

What saves irq_desc lock nesting in this case is that
__irq_put_desc_unlock() unlocks desc->lock calling the
irq_bus_sync_unlock() method. So, I don't think we have anything at
the irq_desc level which deals with lock-nesting.

I guess I'll just ignore the lockdep warning or turn lockdep off,
one or other is probably like everyone else does.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 15:57           ` Russell King (Oracle)
@ 2025-07-31 16:16             ` Mark Brown
  2025-07-31 16:28               ` Russell King (Oracle)
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2025-07-31 16:16 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

[-- Attachment #1: Type: text/plain, Size: 935 bytes --]

On Thu, Jul 31, 2025 at 04:57:42PM +0100, Russell King (Oracle) wrote:
> On Thu, Jul 31, 2025 at 02:18:24PM +0100, Mark Brown wrote:

> > I *think* mutex_lock_nested() is what we're looking for here, with the
> > depth information from the irq_desc but I'm also not super familiar with
> > this stuff.

> I'm not sure about that, because the irq_desc locks don't nest:

>         raw_spin_lock_init(&desc->lock);
>         lockdep_set_class(&desc->lock, &irq_desc_lock_class);

> What saves irq_desc lock nesting in this case is that
> __irq_put_desc_unlock() unlocks desc->lock calling the
> irq_bus_sync_unlock() method. So, I don't think we have anything at
> the irq_desc level which deals with lock-nesting.

Yeah, and that's all internals which we're not super encouraged to peer
at.  There should be something that'll give us a nesting level
somewhere...  

Lockdep's handling of nesting is generally fun.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 16:16             ` Mark Brown
@ 2025-07-31 16:28               ` Russell King (Oracle)
  2025-07-31 17:03                 ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2025-07-31 16:28 UTC (permalink / raw)
  To: Mark Brown
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

On Thu, Jul 31, 2025 at 05:16:13PM +0100, Mark Brown wrote:
> On Thu, Jul 31, 2025 at 04:57:42PM +0100, Russell King (Oracle) wrote:
> > On Thu, Jul 31, 2025 at 02:18:24PM +0100, Mark Brown wrote:
> 
> > > I *think* mutex_lock_nested() is what we're looking for here, with the
> > > depth information from the irq_desc but I'm also not super familiar with
> > > this stuff.
> 
> > I'm not sure about that, because the irq_desc locks don't nest:
> 
> >         raw_spin_lock_init(&desc->lock);
> >         lockdep_set_class(&desc->lock, &irq_desc_lock_class);
> 
> > What saves irq_desc lock nesting in this case is that
> > __irq_put_desc_unlock() unlocks desc->lock calling the
> > irq_bus_sync_unlock() method. So, I don't think we have anything at
> > the irq_desc level which deals with lock-nesting.
> 
> Yeah, and that's all internals which we're not super encouraged to peer
> at.  There should be something that'll give us a nesting level
> somewhere...  
> 
> Lockdep's handling of nesting is generally fun.

As I said, I'm just going to disable lockdep to shut up the warning and
not pursue any further time on this. If someone else cares about it
(which I doubt) they can try to come up with a solution. I suspect
nested regmap-irq is extremely rare.

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 16:28               ` Russell King (Oracle)
@ 2025-07-31 17:03                 ` Mark Brown
  2025-07-31 19:20                   ` Russell King (Oracle)
  0 siblings, 1 reply; 12+ messages in thread
From: Mark Brown @ 2025-07-31 17:03 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

[-- Attachment #1: Type: text/plain, Size: 1112 bytes --]

On Thu, Jul 31, 2025 at 05:28:39PM +0100, Russell King (Oracle) wrote:
> On Thu, Jul 31, 2025 at 05:16:13PM +0100, Mark Brown wrote:

> > Yeah, and that's all internals which we're not super encouraged to peer
> > at.  There should be something that'll give us a nesting level
> > somewhere...  

> > Lockdep's handling of nesting is generally fun.

> As I said, I'm just going to disable lockdep to shut up the warning and
> not pursue any further time on this. If someone else cares about it
> (which I doubt) they can try to come up with a solution. I suspect
> nested regmap-irq is extremely rare.

I'm pretty sure it's extremely rare, and I'll have to construct a
virtual setup to actually test.  After poking at it some more I think
we're actually going to need an explicit lock_class_key for each
regmap-irq rather than relying on the default lockdep one.  I'll try to
send out a patch for that today or tomorrow but likely not really tested
- if you could find time to give it a spin on the affected system that'd
be good, but if not no worries.  Thanks for the report and analysis.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 17:03                 ` Mark Brown
@ 2025-07-31 19:20                   ` Russell King (Oracle)
  2025-07-31 19:50                     ` Mark Brown
  0 siblings, 1 reply; 12+ messages in thread
From: Russell King (Oracle) @ 2025-07-31 19:20 UTC (permalink / raw)
  To: Mark Brown
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

On Thu, Jul 31, 2025 at 06:03:43PM +0100, Mark Brown wrote:
> On Thu, Jul 31, 2025 at 05:28:39PM +0100, Russell King (Oracle) wrote:
> > On Thu, Jul 31, 2025 at 05:16:13PM +0100, Mark Brown wrote:
> 
> > > Yeah, and that's all internals which we're not super encouraged to peer
> > > at.  There should be something that'll give us a nesting level
> > > somewhere...  
> 
> > > Lockdep's handling of nesting is generally fun.
> 
> > As I said, I'm just going to disable lockdep to shut up the warning and
> > not pursue any further time on this. If someone else cares about it
> > (which I doubt) they can try to come up with a solution. I suspect
> > nested regmap-irq is extremely rare.
> 
> I'm pretty sure it's extremely rare, and I'll have to construct a
> virtual setup to actually test.  After poking at it some more I think
> we're actually going to need an explicit lock_class_key for each
> regmap-irq rather than relying on the default lockdep one.  I'll try to
> send out a patch for that today or tomorrow but likely not really tested
> - if you could find time to give it a spin on the affected system that'd
> be good, but if not no worries.  Thanks for the report and analysis.

I hope we don't have too many regmap-irq's in a system - see the
section on "Troubleshooting" in the lockdep documentation. There's
a limit on the numbe of classes over the entire kernel.

For reference, on the platform which provokes this lockdep splat,
we already have 1518 lock classes:

# grep "lock-classes" /proc/lockdep_stats
 lock-classes:                         1518 [max: 8192]

As I understand from the documentation, lock classes are create-only,
there's no way of "freeing" them later, so we better not get into a
situation where the number of classes steadily increase while the
system is running!

-- 
RMK's Patch system: https://www.armlinux.org.uk/developer/patches/
FTTP is here! 80Mbps down 10Mbps up. Decent connectivity at last!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc
  2025-07-31 19:20                   ` Russell King (Oracle)
@ 2025-07-31 19:50                     ` Mark Brown
  0 siblings, 0 replies; 12+ messages in thread
From: Mark Brown @ 2025-07-31 19:50 UTC (permalink / raw)
  To: Russell King (Oracle)
  Cc: Krzysztof Kozlowski, Chanwoo Choi, Alexandre Belloni,
	Linus Walleij, Bartosz Golaszewski, linux-gpio, linux-rtc,
	linux-kernel, Thierry Reding

[-- Attachment #1: Type: text/plain, Size: 1183 bytes --]

On Thu, Jul 31, 2025 at 08:20:51PM +0100, Russell King (Oracle) wrote:
> On Thu, Jul 31, 2025 at 06:03:43PM +0100, Mark Brown wrote:

> > I'm pretty sure it's extremely rare, and I'll have to construct a
> > virtual setup to actually test.  After poking at it some more I think
> > we're actually going to need an explicit lock_class_key for each
> > regmap-irq rather than relying on the default lockdep one.  I'll try to

> I hope we don't have too many regmap-irq's in a system - see the
> section on "Troubleshooting" in the lockdep documentation. There's
> a limit on the numbe of classes over the entire kernel.

Yeah, we shouldn't I'd hope but obviously there could be some use case
I'm not aware of that results in huge numbers in normal operation.

> As I understand from the documentation, lock classes are create-only,
> there's no way of "freeing" them later, so we better not get into a
> situation where the number of classes steadily increase while the
> system is running!

There is a free function, and it does actually seem to do something
useful these days - looking at the code and changelog the documentation
is bitrotted there, dynamic keys were added in 2019.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2025-07-31 19:50 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-07-30 17:58 [BUG] 6.16-rc7: lockdep failure with max77620-gpio/max77686-rtc Russell King (Oracle)
2025-07-30 19:43 ` Krzysztof Kozlowski
2025-07-31 12:18   ` Russell King (Oracle)
2025-07-31 12:31     ` Mark Brown
2025-07-31 12:43       ` Russell King (Oracle)
2025-07-31 13:18         ` Mark Brown
2025-07-31 15:57           ` Russell King (Oracle)
2025-07-31 16:16             ` Mark Brown
2025-07-31 16:28               ` Russell King (Oracle)
2025-07-31 17:03                 ` Mark Brown
2025-07-31 19:20                   ` Russell King (Oracle)
2025-07-31 19:50                     ` Mark Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).