Lockdep problem

All of lore.kernel.org
 help / color / mirror / Atom feed

* Lockdep problem
@ 2008-03-08 17:10 Dmitry
  2008-03-08 17:30 ` Peter Zijlstra
  0 siblings, 1 reply; 6+ messages in thread
From: Dmitry @ 2008-03-08 17:10 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel

Hi,

I've ran into the lockdep problem: I'm hitting the condition at the
lockdep.c:2437:
       if (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
                return 0;

What does it mean and how to overcome it? Dumbly increasing constants
at kernel/lockdep_internals.h didn't help.

The warning is pretty fatal for me because it happens when I'm locking
clocks lock, so when serial console output calls
clk_enable/clk_disable the kernel deadlocks.

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep problem
  2008-03-08 17:10 Dmitry
@ 2008-03-08 17:30 ` Peter Zijlstra
  2008-03-08 19:00   ` Dmitry
  0 siblings, 1 reply; 6+ messages in thread
From: Peter Zijlstra @ 2008-03-08 17:30 UTC (permalink / raw)
  To: Dmitry; +Cc: mingo, linux-kernel

On Sat, 2008-03-08 at 20:10 +0300, Dmitry wrote:
> Hi,
> 
> I've ran into the lockdep problem: I'm hitting the condition at the
> lockdep.c:2437:
>        if (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
>                 return 0;
> 
> What does it mean and how to overcome it? Dumbly increasing constants
> at kernel/lockdep_internals.h didn't help.

It means you have more lockdep classes (lock instantiation sites,
explicit class keys etc..) that we have static storage reserved for.

One way to quickly achieve this is load/unload modules. Zapping the
classes on module unload does not make the space available again.

Sadly most setups these days try to load every module under the sun,
wasting valuable boot time, and causing this head-ache.

Someone reported to me increasing MAX_LOCKDEP_KEYS_BITS to 15 (IIRC)
made his distro build work again. Of course, cycle reloading a module
will eventually run you out of space anyway.

> The warning is pretty fatal for me because it happens when I'm locking
> clocks lock, so when serial console output calls
> clk_enable/clk_disable the kernel deadlocks.

Not sure I fully understand what you mean here, but I've never seen this
result in anything but a warning, the kernel will just continue..

Or are you trying to debug a deadlock and can't because lockdep runs out
of classes before it gets to the offending code?

Something you can do is build a kernel without (or very few modules) - a
good idea anyway as turning off all that unused stuff will significantly
improve build times too :-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep problem
  2008-03-08 17:30 ` Peter Zijlstra
@ 2008-03-08 19:00   ` Dmitry
  0 siblings, 0 replies; 6+ messages in thread
From: Dmitry @ 2008-03-08 19:00 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: mingo, linux-kernel

Hi,

2008/3/8, Peter Zijlstra <peterz@infradead.org>:
>
>  On Sat, 2008-03-08 at 20:10 +0300, Dmitry wrote:
>  > Hi,
>  >
>  > I've ran into the lockdep problem: I'm hitting the condition at the
>  > lockdep.c:2437:
>  >        if (DEBUG_LOCKS_WARN_ON(id >= MAX_LOCKDEP_KEYS))
>  >                 return 0;
>  >
>  > What does it mean and how to overcome it? Dumbly increasing constants
>  > at kernel/lockdep_internals.h didn't help.
>
>
> It means you have more lockdep classes (lock instantiation sites,
>  explicit class keys etc..) that we have static storage reserved for.
>
>  One way to quickly achieve this is load/unload modules. Zapping the
>  classes on module unload does not make the space available again.
>
>  Sadly most setups these days try to load every module under the sun,
>  wasting valuable boot time, and causing this head-ache.

Unfortunately this happens before modules loading --- it's kernel itself.

>
>  Someone reported to me increasing MAX_LOCKDEP_KEYS_BITS to 15 (IIRC)
>  made his distro build work again. Of course, cycle reloading a module
>  will eventually run you out of space anyway.
>
>
>  > The warning is pretty fatal for me because it happens when I'm locking
>  > clocks lock, so when serial console output calls
>  > clk_enable/clk_disable the kernel deadlocks.
>
>
> Not sure I fully understand what you mean here, but I've never seen this
>  result in anything but a warning, the kernel will just continue..
>
>  Or are you trying to debug a deadlock and can't because lockdep runs out
>  of classes before it gets to the offending code?

Actually no and yes. IIUC, the call chain looks like this (it's with
modified clk's functions, so
it may be a bit hard to understand. If you wish, search for "clocklib"
patches posted to LKML
nearly a month ago).
kernel -> clk_enable() takes global clock spinlock -> my function
which calls another spin_lock which causes lockdep warning.
If I disable global clocks spinlock locking, I get the lockdep
warning. Otherwise, I see no output. This seems to be related to the
fact that I have pxa serial console enabled which in turn also uses
clk_enable/clk_disable to manage its serial port.


>
>  Something you can do is build a kernel without (or very few modules) - a
>  good idea anyway as turning off all that unused stuff will significantly
>  improve build times too :-)

I have a plenty of modules configured to build, but as I hope this
doesn't count, since they aren't loaded

-- 
With best wishes
Dmitry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Lockdep problem
@ 2014-01-07 18:28 Larry Finger
  2014-01-09 16:24 ` Stanislaw Gruszka
  0 siblings, 1 reply; 6+ messages in thread
From: Larry Finger @ 2014-01-07 18:28 UTC (permalink / raw)
  To: linux-wireless, LKML

A locking dependency problem has been reported for b43 at 
https://bugzilla.kernel.org/show_bug.cgi?id=67561. I was able to duplicate with 
the following:

1. Set network under manual, not NetworkManager, control.
2. Start hostapd
3. Use 'rfkill block all' to disable the access point

As my skills in fixing locking dependencies are minimal, I am hoping that 
someone here can help me. The resulting splat is as follows:

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.13.0-rc6-wl+ #34 Tainted: G           O
  -------------------------------------------------------
  rfkill/15379 is trying to acquire lock:
   (rtnl_mutex){+.+.+.}, at: [<ffffffff813d3232>] rtnl_lock+0x12/0x20

but task is already holding lock:
   (rfkill_global_mutex){+.+.+.}, at: [<ffffffff8145dd5a>] 
rfkill_fop_write+0x6a/0x180

which lock already depends on the new lock.

the existing dependency chain (in reverse order) is:

-> #4 (rfkill_global_mutex){+.+.+.}:
         [<ffffffff810994ea>] lock_acquire+0x9a/0x1d0
         [<ffffffff8146cd12>] mutex_lock_nested+0x62/0x3f0
         [<ffffffff8145e4e9>] rfkill_fop_open+0x89/0x200
         [<ffffffff812d9753>] misc_open+0xb3/0x170
         [<ffffffff8116827a>] chrdev_open+0x9a/0x1d0
         [<ffffffff81161e92>] do_dentry_open.isra.18+0x1a2/0x2a0
         [<ffffffff8116208b>] finish_open+0x2b/0x40
         [<ffffffff81172792>] do_last+0x572/0xdd0
         [<ffffffff811730a6>] path_openat+0xb6/0x6d0
         [<ffffffff81174785>] do_filp_open+0x35/0x80
         [<ffffffff811631b9>] do_sys_open+0x129/0x220
         [<ffffffff811632c9>] SyS_open+0x19/0x20
         [<ffffffff81471b34>] tracesys+0xdd/0xe2

-> #3 (misc_mtx){+.+.+.}:
         [<ffffffff810994ea>] lock_acquire+0x9a/0x1d0
         [<ffffffff8146cd12>] mutex_lock_nested+0x62/0x3f0
         [<ffffffff812d94a4>] misc_register+0x24/0x120
         [<ffffffffa000e139>] hwrng_register+0x109/0x1d0 [rng_core]
         [<ffffffffa0719990>] b43_wireless_core_init+0xdb0/0x1270 [b43]
         [<ffffffffa071a6c0>] b43_op_start+0x210/0x230 [b43]
         [<ffffffffa065c210>] ieee80211_do_open+0x350/0x17a0 [mac80211]
         [<ffffffffa065d6c1>] ieee80211_open+0x61/0x70 [mac80211]
         [<ffffffff813c6d0f>] __dev_open+0xbf/0x140
         [<ffffffff813c6ff8>] __dev_change_flags+0x98/0x170
         [<ffffffff813c70f4>] dev_change_flags+0x24/0x60
         [<ffffffff813d6251>] do_setlink+0x321/0x9a0
         [<ffffffff813d6d6f>] rtnl_newlink+0x37f/0x560
         [<ffffffff813d5cac>] rtnetlink_rcv_msg+0x8c/0x240
         [<ffffffff813eecf9>] netlink_rcv_skb+0xa9/0xc0
         [<ffffffff813d5c15>] rtnetlink_rcv+0x25/0x30
         [<ffffffff813ee5da>] netlink_unicast+0x13a/0x200
         [<ffffffff813eea7b>] netlink_sendmsg+0x33b/0x410
         [<ffffffff813a6a8a>] sock_sendmsg+0x6a/0x90
         [<ffffffff813a94d9>] ___sys_sendmsg+0x389/0x3a0
         [<ffffffff813aa26d>] __sys_sendmsg+0x3d/0x80
         [<ffffffff813aa2bd>] SyS_sendmsg+0xd/0x20
         [<ffffffff81471b34>] tracesys+0xdd/0xe2

-> #2 (rng_mutex){+.+.+.}:
         [<ffffffff810994ea>] lock_acquire+0x9a/0x1d0
         [<ffffffff8146cd12>] mutex_lock_nested+0x62/0x3f0
         [<ffffffffa000e05f>] hwrng_register+0x2f/0x1d0 [rng_core]
         [<ffffffffa0719990>] b43_wireless_core_init+0xdb0/0x1270 [b43]
         [<ffffffffa071a6c0>] b43_op_start+0x210/0x230 [b43]
         [<ffffffffa065c210>] ieee80211_do_open+0x350/0x17a0 [mac80211]
         [<ffffffffa065d6c1>] ieee80211_open+0x61/0x70 [mac80211]
         [<ffffffff813c6d0f>] __dev_open+0xbf/0x140
         [<ffffffff813c6ff8>] __dev_change_flags+0x98/0x170
         [<ffffffff813c70f4>] dev_change_flags+0x24/0x60
         [<ffffffff813d6251>] do_setlink+0x321/0x9a0
         [<ffffffff813d6d6f>] rtnl_newlink+0x37f/0x560
         [<ffffffff813d5cac>] rtnetlink_rcv_msg+0x8c/0x240
         [<ffffffff813eecf9>] netlink_rcv_skb+0xa9/0xc0
         [<ffffffff813d5c15>] rtnetlink_rcv+0x25/0x30
         [<ffffffff813ee5da>] netlink_unicast+0x13a/0x200
         [<ffffffff813eea7b>] netlink_sendmsg+0x33b/0x410
         [<ffffffff813a6a8a>] sock_sendmsg+0x6a/0x90
         [<ffffffff813a94d9>] ___sys_sendmsg+0x389/0x3a0
         [<ffffffff813aa26d>] __sys_sendmsg+0x3d/0x80
         [<ffffffff813aa2bd>] SyS_sendmsg+0xd/0x20
         [<ffffffff81471b34>] tracesys+0xdd/0xe2

-> #1 (&wl->mutex){+.+.+.}:
         [<ffffffff810994ea>] lock_acquire+0x9a/0x1d0
         [<ffffffff8146cd12>] mutex_lock_nested+0x62/0x3f0
         [<ffffffffa071a5a8>] b43_op_start+0xf8/0x230 [b43]
         [<ffffffffa065c210>] ieee80211_do_open+0x350/0x17a0 [mac80211]
         [<ffffffffa065d6c1>] ieee80211_open+0x61/0x70 [mac80211]
         [<ffffffff813c6d0f>] __dev_open+0xbf/0x140
         [<ffffffff813c6ff8>] __dev_change_flags+0x98/0x170
         [<ffffffff813c70f4>] dev_change_flags+0x24/0x60
         [<ffffffff813d6251>] do_setlink+0x321/0x9a0
         [<ffffffff813d6d6f>] rtnl_newlink+0x37f/0x560
         [<ffffffff813d5cac>] rtnetlink_rcv_msg+0x8c/0x240
         [<ffffffff813eecf9>] netlink_rcv_skb+0xa9/0xc0
         [<ffffffff813d5c15>] rtnetlink_rcv+0x25/0x30
         [<ffffffff813ee5da>] netlink_unicast+0x13a/0x200
         [<ffffffff813eea7b>] netlink_sendmsg+0x33b/0x410
         [<ffffffff813a6a8a>] sock_sendmsg+0x6a/0x90
         [<ffffffff813a94d9>] ___sys_sendmsg+0x389/0x3a0
         [<ffffffff813aa26d>] __sys_sendmsg+0x3d/0x80
         [<ffffffff813aa2bd>] SyS_sendmsg+0xd/0x20
         [<ffffffff81471b34>] tracesys+0xdd/0xe2

-> #0 (rtnl_mutex){+.+.+.}:
         [<ffffffff8109883a>] __lock_acquire+0x1a3a/0x1e60
         [<ffffffff810994ea>] lock_acquire+0x9a/0x1d0
         [<ffffffff8146cd12>] mutex_lock_nested+0x62/0x3f0
         [<ffffffff813d3232>] rtnl_lock+0x12/0x20
         [<ffffffffa056bb88>] cfg80211_rfkill_set_block.part.11+0x18/0x80 [cfg80211]
         [<ffffffffa056bc09>] cfg80211_rfkill_set_block+0x19/0x20 [cfg80211]
         [<ffffffff8145d84b>] rfkill_set_block+0x8b/0x140
         [<ffffffff8145ddc7>] rfkill_fop_write+0xd7/0x180
         [<ffffffff81163cdb>] vfs_write+0xab/0x1c0
         [<ffffffff81164634>] SyS_write+0x44/0xa0
         [<ffffffff81471b34>] tracesys+0xdd/0xe2

other info that might help us debug this:

  Chain exists of:
   rtnl_mutex --> misc_mtx --> rfkill_global_mutex

   Possible unsafe locking scenario:

         CPU0                    CPU1
         ----                    ----
    lock(rfkill_global_mutex);
                                 lock(misc_mtx);
                                 lock(rfkill_global_mutex);
    lock(rtnl_mutex);

  *** DEADLOCK ***

  1 lock held by rfkill/15379:
   #0:  (rfkill_global_mutex){+.+.+.}, at: [<ffffffff8145dd5a>] 
rfkill_fop_write+0x6a/0x180

stack backtrace:
  CPU: 0 PID: 15379 Comm: rfkill Tainted: G           O 3.13.0-rc6-wl+ #34
  Hardware name: Hewlett-Packard HP Pavilion dv2700 Notebook PC/30D6, BIOS F.27 
11/27/2008
   ffffffff81f857a0 ffff88005df3bc88 ffffffff8146a1a4 ffffffff81f7dac0
   ffff88005df3bcc8 ffffffff814672e4 ffff88005df3bd20 ffff8800b560d340
   0000000000000000 ffff8800b560d318 ffff8800b560d340 ffff8800b560cce0
  Call Trace:
   [<ffffffff8146a1a4>] dump_stack+0x4e/0x7a
   [<ffffffff814672e4>] print_circular_bug+0x2b0/0x2bf
   [<ffffffff8109883a>] __lock_acquire+0x1a3a/0x1e60
   [<ffffffff810994ea>] lock_acquire+0x9a/0x1d0
   [<ffffffff813d3232>] ? rtnl_lock+0x12/0x20
   [<ffffffff813d3232>] ? rtnl_lock+0x12/0x20
   [<ffffffff8146cd12>] mutex_lock_nested+0x62/0x3f0
   [<ffffffff813d3232>] ? rtnl_lock+0x12/0x20
   [<ffffffff810968ba>] ? mark_held_locks+0x8a/0x130
   [<ffffffff813d3232>] rtnl_lock+0x12/0x20
   [<ffffffffa056bb88>] cfg80211_rfkill_set_block.part.11+0x18/0x80 [cfg80211]
   [<ffffffffa056bc09>] cfg80211_rfkill_set_block+0x19/0x20 [cfg80211]
   [<ffffffff8145d84b>] rfkill_set_block+0x8b/0x140
   [<ffffffff8145ddc7>] rfkill_fop_write+0xd7/0x180
   [<ffffffff81163cdb>] vfs_write+0xab/0x1c0
   [<ffffffff81183220>] ? fget_light+0x320/0x4b0
   [<ffffffff81164634>] SyS_write+0x44/0xa0
   [<ffffffff81471b34>] tracesys+0xdd/0xe2

Thanks,

Larry

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Lockdep problem
  2014-01-07 18:28 Lockdep problem Larry Finger
@ 2014-01-09 16:24 ` Stanislaw Gruszka
  2014-01-09 19:14   ` Larry Finger
  0 siblings, 1 reply; 6+ messages in thread
From: Stanislaw Gruszka @ 2014-01-09 16:24 UTC (permalink / raw)
  To: Larry Finger; +Cc: linux-wireless, LKML

On Tue, Jan 07, 2014 at 12:28:26PM -0600, Larry Finger wrote:
>   rtnl_mutex --> misc_mtx --> rfkill_global_mutex
> 
>   Possible unsafe locking scenario:
> 
>         CPU0                    CPU1
>         ----                    ----
>    lock(rfkill_global_mutex);
>                                 lock(misc_mtx);
>                                 lock(rfkill_global_mutex);
>    lock(rtnl_mutex);
> 
>  *** DEADLOCK ***

There are 3 mutexes dependency. The deadlock can happen if on another
cpu, let say CPU2, there will be sequence:

lock(rtnl_mutex);
lock(misc_mtx);

Then on deadlock scenario:
CPU0 waits for rtnl_mutex to unlock, keep rfkill_global_mutex locked
CPU1 waits for rfkill_global_mutex to unlock, keep misc_mtx locked
CPU2 waits for misc_mtx to unlock, keep rtnl_mutex locked.

This dependency can be broken by moving b43_rng_init() outside from
rtnl_mutex scope, like on below patch. IIUC b43 random number generator
works only if we already started network connection, but this should
not be a problem, as b43_rng_read() do not return any data if device
is not prepared.

You could also remove whole b43 rnd. Everyone know, that HW vendors
are influenced by NSA and they random generators do not provide truly
random numbers (just kidding ;-)

Stanislaw

diff --git a/drivers/net/wireless/b43/main.c b/drivers/net/wireless/b43/main.c
index ccd24f0a..2c153f9 100644
--- a/drivers/net/wireless/b43/main.c
+++ b/drivers/net/wireless/b43/main.c
@@ -2475,6 +2475,10 @@ start_ieee80211:
 		goto err_one_core_detach;
 	wl->hw_registred = true;
 	b43_leds_register(wl->current_dev);
+
+	/* Register HW RNG driver */
+	b43_rng_init(wl);
+
 	goto out;
 
 err_one_core_detach:
@@ -4636,9 +4640,6 @@ static void b43_wireless_core_exit(struct b43_wldev *dev)
 	if (!dev || b43_status(dev) != B43_STAT_INITIALIZED)
 		return;
 
-	/* Unregister HW RNG driver */
-	b43_rng_exit(dev->wl);
-
 	b43_set_status(dev, B43_STAT_UNINIT);
 
 	/* Stop the microcode PSM. */
@@ -4795,9 +4796,6 @@ static int b43_wireless_core_init(struct b43_wldev *dev)
 
 	b43_set_status(dev, B43_STAT_INITIALIZED);
 
-	/* Register HW RNG driver */
-	b43_rng_init(dev->wl);
-
 out:
 	return err;
 
@@ -5464,6 +5462,9 @@ static void b43_bcma_remove(struct bcma_device *core)
 
 	b43_one_core_detach(wldev->dev);
 
+	/* Unregister HW RNG driver */
+	b43_rng_exit(wl);
+
 	b43_leds_unregister(wl);
 
 	ieee80211_free_hw(wl->hw);



^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: Lockdep problem
  2014-01-09 16:24 ` Stanislaw Gruszka
@ 2014-01-09 19:14   ` Larry Finger
  0 siblings, 0 replies; 6+ messages in thread
From: Larry Finger @ 2014-01-09 19:14 UTC (permalink / raw)
  To: Stanislaw Gruszka; +Cc: linux-wireless, LKML

On 01/09/2014 10:24 AM, Stanislaw Gruszka wrote:
> On Tue, Jan 07, 2014 at 12:28:26PM -0600, Larry Finger wrote:
>>    rtnl_mutex --> misc_mtx --> rfkill_global_mutex
>>
>>    Possible unsafe locking scenario:
>>
>>          CPU0                    CPU1
>>          ----                    ----
>>     lock(rfkill_global_mutex);
>>                                  lock(misc_mtx);
>>                                  lock(rfkill_global_mutex);
>>     lock(rtnl_mutex);
>>
>>   *** DEADLOCK ***
>
> There are 3 mutexes dependency. The deadlock can happen if on another
> cpu, let say CPU2, there will be sequence:
>
> lock(rtnl_mutex);
> lock(misc_mtx);
>
> Then on deadlock scenario:
> CPU0 waits for rtnl_mutex to unlock, keep rfkill_global_mutex locked
> CPU1 waits for rfkill_global_mutex to unlock, keep misc_mtx locked
> CPU2 waits for misc_mtx to unlock, keep rtnl_mutex locked.
>
> This dependency can be broken by moving b43_rng_init() outside from
> rtnl_mutex scope, like on below patch. IIUC b43 random number generator
> works only if we already started network connection, but this should
> not be a problem, as b43_rng_read() do not return any data if device
> is not prepared.
>
> You could also remove whole b43 rnd. Everyone know, that HW vendors
> are influenced by NSA and they random generators do not provide truly
> random numbers (just kidding ;-)

Well, I have no idea to what extent NSA affects the RNGs in computers; however, 
I do like having that additional source of entropy. :)

Your analysis was spot on, and I will be submitting a patch shortly with you as 
author. I'll give the reporter on b.k.o a chance to test it, but it removed the 
splat here.

Thanks,

Larry



^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2014-01-09 19:14 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-01-07 18:28 Lockdep problem Larry Finger
2014-01-09 16:24 ` Stanislaw Gruszka
2014-01-09 19:14   ` Larry Finger
  -- strict thread matches above, loose matches on Subject: below --
2008-03-08 17:10 Dmitry
2008-03-08 17:30 ` Peter Zijlstra
2008-03-08 19:00   ` Dmitry

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.