* Question on lockdep and MAX_LOCK_DEPTH
@ 2013-02-06 1:10 Ben Greear
2013-02-06 1:54 ` Steven Rostedt
2013-02-06 4:26 ` Steven Rostedt
0 siblings, 2 replies; 17+ messages in thread
From: Ben Greear @ 2013-02-06 1:10 UTC (permalink / raw)
To: Linux Kernel Mailing List
I'm debugging something that could be my own bug in my wanlink module (but then
again, haven't seen this on 3.5 kernels, and my module is
basically un-changed since then).
I'm using lockdep, and keep seeing "BUG: MAX_LOCK_DEPTH too low".
I tried changing the depth to 96 from 48, but it still hits.
I also added debug_show_all_locks(); when this error
case hits. I was assuming that if lockdep depth is large,
there would be lots of locks held..but that does not
seem to be the case??
===============================
[ INFO: suspicious RCU usage. ]
3.7.6+ #4 Tainted: G C O
-------------------------------
/home/greearb/git/linux-3.7.dev.y/kernel/rcutree.c:360 Illegal idle entry in RCU read-side critical section.!
other info that might help us debug this:
RCU used illegally from idle CPU!
rcu_scheduler_active = 1, debug_locks = 1
RCU used illegally from extended quiescent state!
1 lock held by swapper/1/0:
#0: (rcu_read_lock){.+.+..}, at: [<ffffffff81489632>] rcu_read_lock+0x0/0x6f
stack backtrace:
Pid: 0, comm: swapper/1 Tainted: G C O 3.7.6+ #4
Call Trace:
<IRQ> [<ffffffff810cecbd>] lockdep_rcu_suspicious+0xfc/0x105
[<ffffffff810fe146>] rcu_eqs_enter_common+0xef/0x178
[<ffffffff810fe24e>] rcu_irq_exit+0x7f/0xb0
[<ffffffff8108edf5>] irq_exit+0xc0/0xcc
[<ffffffff8100b635>] do_IRQ+0x97/0xae
[<ffffffff815543b2>] common_interrupt+0x72/0x72
<EOI> [<ffffffff81011082>] ? mwait_idle+0x77/0x9f
[<ffffffff81011079>] ? mwait_idle+0x6e/0x9f
[<ffffffff81011379>] cpu_idle+0x68/0xbc
[<ffffffff8154ce84>] start_secondary+0x1fc/0x203
Showing all locks held in the system:
1 lock held by bash/2481:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
1 lock held by mingetty/2853:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
1 lock held by mingetty/2857:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
1 lock held by mingetty/2862:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
1 lock held by mingetty/2865:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
1 lock held by mingetty/2867:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
1 lock held by mingetty/2869:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
1 lock held by bash/3017:
#0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785
=============================================
BUG: MAX_LOCK_DEPTH too low, depth: 96 max: 96!
turning off the locking correctness validator.
Pid: 0, comm: swapper/1 Tainted: G C O 3.7.6+ #4
Call Trace:
<IRQ> [<ffffffff810d1e31>] __lock_acquire+0xe80/0xf06
[<ffffffff810cfbbe>] ? check_irq_usage+0xad/0xbe
[<ffffffff810d1f1b>] lock_acquire+0x64/0x81
[<ffffffff810acbf3>] ? cpu_cgroup_can_attach+0x66/0x66
[<ffffffff810acc26>] rcu_read_lock+0x33/0x6f
[<ffffffff810acbf3>] ? cpu_cgroup_can_attach+0x66/0x66
[<ffffffff810ad1a8>] ttwu_stat+0x60/0x11c
[<ffffffff810b1cbb>] try_to_wake_up+0x1e6/0x20b
[<ffffffff810b1ced>] default_wake_function+0xd/0xf
[<ffffffff810ab4c1>] __wake_up_common+0x49/0x7f
[<ffffffff810ac4e5>] __wake_up+0x34/0x48
[<ffffffffa05fce11>] handleRcvWlp+0xf4d/0xfd5 [wanlink]
[<ffffffff810d133a>] ? __lock_acquire+0x389/0xf06
[<ffffffffa05fd325>] wp_handle_rcv_pkt+0x1f5/0x243 [wanlink]
[<ffffffffa05fd51c>] handle_rcv+0x1a9/0x1dd [wanlink]
[<ffffffff8148a5f1>] deliver_skb+0x58/0x60
[<ffffffff8148ab00>] __netif_receive_skb+0x507/0x63c
[<ffffffff810100b6>] ? read_tsc+0x9/0x1b
[<ffffffff8148c939>] netif_receive_skb+0x56/0x8b
[<ffffffff8148cf64>] ? __napi_gro_receive+0xf3/0x102
[<ffffffff8148ca51>] napi_skb_finish+0x24/0x57
[<ffffffff8148d199>] napi_gro_receive+0x91/0x96
[<ffffffffa0231d31>] e1000_receive_skb+0x50/0x5b [e1000e]
[<ffffffffa023500c>] e1000_clean_rx_irq+0x29f/0x341 [e1000e]
[<ffffffffa02355c6>] e1000e_poll+0x7d/0x239 [e1000e]
[<ffffffff8148d31f>] net_rx_action+0xb2/0x1e2
[<ffffffff8108eeea>] ? __do_softirq+0x6d/0x17b
[<ffffffff8108ef28>] __do_softirq+0xab/0x17b
[<ffffffff810a6ed6>] ? hrtimer_interrupt+0x127/0x1e2
[<ffffffff8155a7fc>] call_softirq+0x1c/0x30
[<ffffffff8100bd46>] do_softirq+0x46/0x9e
[<ffffffff8108ed83>] irq_exit+0x4e/0xcc
[<ffffffff81027f73>] smp_apic_timer_interrupt+0x85/0x93
[<ffffffff8155a0b2>] apic_timer_interrupt+0x72/0x80
<EOI> [<ffffffff81554477>] ? retint_restore_args+0x13/0x13
[<ffffffff81011082>] ? mwait_idle+0x77/0x9f
[<ffffffff81011079>] ? mwait_idle+0x6e/0x9f
[<ffffffff81011379>] cpu_idle+0x68/0xbc
[<ffffffff8154ce84>] start_secondary+0x1fc/0x203
BUG: sleeping function called from invalid context at /home/greearb/git/linux-3.7.dev.y/kernel/mutex.c:269
in_atomic(): 0, irqs_disabled(): 0, pid: 1375, name: rs:main Q:Reg
INFO: lockdep is turned off.
Pid: 1375, comm: rs:main Q:Reg Tainted: G C O 3.7.6+ #4
Call Trace:
[<ffffffff810adc9d>] __might_sleep+0x1a2/0x1ab
[<ffffffff81551969>] mutex_lock_nested+0x20/0x3b
[<ffffffff8133e243>] process_output+0x24/0x52
[<ffffffff8133fba3>] n_tty_write+0x273/0x356
[<ffffffff810b1ce0>] ? try_to_wake_up+0x20b/0x20b
[<ffffffff8133db38>] tty_write+0x19b/0x23a
[<ffffffff8133f930>] ? n_tty_ioctl+0xc6/0xc6
[<ffffffff8133dc61>] redirected_tty_write+0x8a/0x95
[<ffffffff81158d6c>] vfs_write+0xa9/0x105
[<ffffffff81158ead>] sys_write+0x54/0x89
[<ffffffff81559569>] system_call_fastpath+0x16/0x1b
--
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc http://www.candelatech.com
^ permalink raw reply [flat|nested] 17+ messages in thread* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 1:10 Question on lockdep and MAX_LOCK_DEPTH Ben Greear @ 2013-02-06 1:54 ` Steven Rostedt 2013-02-06 2:17 ` Ben Greear 2013-02-06 2:26 ` Ben Greear 2013-02-06 4:26 ` Steven Rostedt 1 sibling, 2 replies; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 1:54 UTC (permalink / raw) To: Ben Greear; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On Tue, Feb 05, 2013 at 05:10:37PM -0800, Ben Greear wrote: > I'm debugging something that could be my own bug in my wanlink module (but then > again, haven't seen this on 3.5 kernels, and my module is > basically un-changed since then). > > I'm using lockdep, and keep seeing "BUG: MAX_LOCK_DEPTH too low". > I tried changing the depth to 96 from 48, but it still hits. > > I also added debug_show_all_locks(); when this error > case hits. I was assuming that if lockdep depth is large, > there would be lots of locks held..but that does not > seem to be the case?? > > > > =============================== > [ INFO: suspicious RCU usage. ] > 3.7.6+ #4 Tainted: G C O > ------------------------------- > /home/greearb/git/linux-3.7.dev.y/kernel/rcutree.c:360 Illegal idle entry in RCU read-side critical section.! > > other info that might help us debug this: > > > RCU used illegally from idle CPU! > rcu_scheduler_active = 1, debug_locks = 1 > RCU used illegally from extended quiescent state! > 1 lock held by swapper/1/0: > #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81489632>] rcu_read_lock+0x0/0x6f > > stack backtrace: > Pid: 0, comm: swapper/1 Tainted: G C O 3.7.6+ #4 current is the swapper task. > Call Trace: > <IRQ> [<ffffffff810cecbd>] lockdep_rcu_suspicious+0xfc/0x105 > [<ffffffff810fe146>] rcu_eqs_enter_common+0xef/0x178 > [<ffffffff810fe24e>] rcu_irq_exit+0x7f/0xb0 > [<ffffffff8108edf5>] irq_exit+0xc0/0xcc > [<ffffffff8100b635>] do_IRQ+0x97/0xae > [<ffffffff815543b2>] common_interrupt+0x72/0x72 > <EOI> [<ffffffff81011082>] ? mwait_idle+0x77/0x9f > [<ffffffff81011079>] ? mwait_idle+0x6e/0x9f > [<ffffffff81011379>] cpu_idle+0x68/0xbc > [<ffffffff8154ce84>] start_secondary+0x1fc/0x203 > > Showing all locks held in the system: > 1 lock held by bash/2481: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 > 1 lock held by mingetty/2853: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 > 1 lock held by mingetty/2857: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 > 1 lock held by mingetty/2862: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 > 1 lock held by mingetty/2865: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 > 1 lock held by mingetty/2867: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 > 1 lock held by mingetty/2869: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 > 1 lock held by bash/3017: > #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff813400a3>] n_tty_read+0x28d/0x785 I'm not sure the swapper task is part of the 'do_each_thread()' loop. Perhaps what you want to do is add a: lockdep_print_held_locks(current); > > ============================================= > > BUG: MAX_LOCK_DEPTH too low, depth: 96 max: 96! > turning off the locking correctness validator. > Pid: 0, comm: swapper/1 Tainted: G C O 3.7.6+ #4 > Call Trace: > <IRQ> [<ffffffff810d1e31>] __lock_acquire+0xe80/0xf06 > [<ffffffff810cfbbe>] ? check_irq_usage+0xad/0xbe > [<ffffffff810d1f1b>] lock_acquire+0x64/0x81 > [<ffffffff810acbf3>] ? cpu_cgroup_can_attach+0x66/0x66 > [<ffffffff810acc26>] rcu_read_lock+0x33/0x6f > [<ffffffff810acbf3>] ? cpu_cgroup_can_attach+0x66/0x66 > [<ffffffff810ad1a8>] ttwu_stat+0x60/0x11c > [<ffffffff810b1cbb>] try_to_wake_up+0x1e6/0x20b > [<ffffffff810b1ced>] default_wake_function+0xd/0xf > [<ffffffff810ab4c1>] __wake_up_common+0x49/0x7f > [<ffffffff810ac4e5>] __wake_up+0x34/0x48 > [<ffffffffa05fce11>] handleRcvWlp+0xf4d/0xfd5 [wanlink] > [<ffffffff810d133a>] ? __lock_acquire+0x389/0xf06 I'm curious. Does your code grab a read lock? If you grab the same read lock multiple times it adds to the list each time. Thus if you have anything like: for (i = 0; i < 100; i++ ) { read_lock(&lock); } for (i = 0; i < 100; i++) { read_unlock(&lock); } That will fill up the held locks quite a bit. The above code I showed is ridiculous and I doubt you have it, but if you have something that does lots of recursive reads for some reason, that could be an issue. -- Steve > [<ffffffffa05fd325>] wp_handle_rcv_pkt+0x1f5/0x243 [wanlink] > [<ffffffffa05fd51c>] handle_rcv+0x1a9/0x1dd [wanlink] > [<ffffffff8148a5f1>] deliver_skb+0x58/0x60 > [<ffffffff8148ab00>] __netif_receive_skb+0x507/0x63c > [<ffffffff810100b6>] ? read_tsc+0x9/0x1b > [<ffffffff8148c939>] netif_receive_skb+0x56/0x8b > [<ffffffff8148cf64>] ? __napi_gro_receive+0xf3/0x102 > [<ffffffff8148ca51>] napi_skb_finish+0x24/0x57 > [<ffffffff8148d199>] napi_gro_receive+0x91/0x96 > [<ffffffffa0231d31>] e1000_receive_skb+0x50/0x5b [e1000e] > [<ffffffffa023500c>] e1000_clean_rx_irq+0x29f/0x341 [e1000e] > [<ffffffffa02355c6>] e1000e_poll+0x7d/0x239 [e1000e] > [<ffffffff8148d31f>] net_rx_action+0xb2/0x1e2 > [<ffffffff8108eeea>] ? __do_softirq+0x6d/0x17b > [<ffffffff8108ef28>] __do_softirq+0xab/0x17b > [<ffffffff810a6ed6>] ? hrtimer_interrupt+0x127/0x1e2 > [<ffffffff8155a7fc>] call_softirq+0x1c/0x30 > [<ffffffff8100bd46>] do_softirq+0x46/0x9e > [<ffffffff8108ed83>] irq_exit+0x4e/0xcc > [<ffffffff81027f73>] smp_apic_timer_interrupt+0x85/0x93 > [<ffffffff8155a0b2>] apic_timer_interrupt+0x72/0x80 > <EOI> [<ffffffff81554477>] ? retint_restore_args+0x13/0x13 > [<ffffffff81011082>] ? mwait_idle+0x77/0x9f > [<ffffffff81011079>] ? mwait_idle+0x6e/0x9f > [<ffffffff81011379>] cpu_idle+0x68/0xbc > [<ffffffff8154ce84>] start_secondary+0x1fc/0x203 > BUG: sleeping function called from invalid context at /home/greearb/git/linux-3.7.dev.y/kernel/mutex.c:269 > in_atomic(): 0, irqs_disabled(): 0, pid: 1375, name: rs:main Q:Reg > INFO: lockdep is turned off. > Pid: 1375, comm: rs:main Q:Reg Tainted: G C O 3.7.6+ #4 > Call Trace: > [<ffffffff810adc9d>] __might_sleep+0x1a2/0x1ab > [<ffffffff81551969>] mutex_lock_nested+0x20/0x3b > [<ffffffff8133e243>] process_output+0x24/0x52 > [<ffffffff8133fba3>] n_tty_write+0x273/0x356 > [<ffffffff810b1ce0>] ? try_to_wake_up+0x20b/0x20b > [<ffffffff8133db38>] tty_write+0x19b/0x23a > [<ffffffff8133f930>] ? n_tty_ioctl+0xc6/0xc6 > [<ffffffff8133dc61>] redirected_tty_write+0x8a/0x95 > [<ffffffff81158d6c>] vfs_write+0xa9/0x105 > [<ffffffff81158ead>] sys_write+0x54/0x89 > [<ffffffff81559569>] system_call_fastpath+0x16/0x1b > > -- > Ben Greear <greearb@candelatech.com> > Candela Technologies Inc http://www.candelatech.com > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 1:54 ` Steven Rostedt @ 2013-02-06 2:17 ` Ben Greear 2013-02-06 2:26 ` Ben Greear 1 sibling, 0 replies; 17+ messages in thread From: Ben Greear @ 2013-02-06 2:17 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On 02/05/2013 05:54 PM, Steven Rostedt wrote: > I'm not sure the swapper task is part of the 'do_each_thread()' loop. > > Perhaps what you want to do is add a: > > lockdep_print_held_locks(current); I'll add that and test... > I'm curious. Does your code grab a read lock? If you grab the same read > lock multiple times it adds to the list each time. Thus if you have > anything like: > > for (i = 0; i < 100; i++ ) { > read_lock(&lock); > } > > for (i = 0; i < 100; i++) { > read_unlock(&lock); > } > > That will fill up the held locks quite a bit. > > The above code I showed is ridiculous and I doubt you have it, but if > you have something that does lots of recursive reads for some reason, > that could be an issue. I have only one read/write lock in my module, and it looks like I always lock it as a writer (will fix that soon for performance reasons, but it should be valid locking I think). I have no rcu locks at all in my module currently. I've seen similar lockups on another machine that does not use this module, but it uses a hacked up pktgen. I haven't found a test case that reproduces this on a clean upstream build, but I am still fairly suspicious that it isn't my code. Famous last words I'm sure :) Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 1:54 ` Steven Rostedt 2013-02-06 2:17 ` Ben Greear @ 2013-02-06 2:26 ` Ben Greear 2013-02-06 2:52 ` Steven Rostedt 1 sibling, 1 reply; 17+ messages in thread From: Ben Greear @ 2013-02-06 2:26 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On 02/05/2013 05:54 PM, Steven Rostedt wrote: > On Tue, Feb 05, 2013 at 05:10:37PM -0800, Ben Greear wrote: >> I'm debugging something that could be my own bug in my wanlink module (but then >> again, haven't seen this on 3.5 kernels, and my module is >> basically un-changed since then). >> >> I'm using lockdep, and keep seeing "BUG: MAX_LOCK_DEPTH too low". >> I tried changing the depth to 96 from 48, but it still hits. >> >> I also added debug_show_all_locks(); when this error >> case hits. I was assuming that if lockdep depth is large, >> there would be lots of locks held..but that does not >> seem to be the case?? > Perhaps what you want to do is add a: > > lockdep_print_held_locks(current); Well, here it is..something is calling rcu_read_lock lots and lots, it seems. Any way to get a better idea of where those calls are made? Showing all locks held in the system: 1 lock held by mingetty/2868: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 1 lock held by mingetty/2870: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 1 lock held by mingetty/2872: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 1 lock held by mingetty/2874: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 1 lock held by mingetty/2876: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 1 lock held by mingetty/2878: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 1 lock held by bash/3337: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 1 lock held by bash/3369: #0: (&tty->atomic_read_lock){+.+...}, at: [<ffffffff8132d503>] n_tty_read+0x28d/0x785 ============================================= 96 locks held by swapper/0/0: #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #2: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #3: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #4: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #5: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #6: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #7: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #8: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #9: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #10: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #11: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #12: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #13: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #14: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #15: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #16: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #17: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #18: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #19: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #20: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #21: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #22: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #23: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #24: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #25: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #26: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #27: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #28: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #29: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #30: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #31: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #32: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #33: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #34: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #35: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #36: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #37: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #38: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #39: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #40: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #41: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #42: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #43: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #44: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #45: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #46: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #47: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #48: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #49: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #50: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #51: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #52: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #53: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #54: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #55: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #56: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #57: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #58: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #59: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #60: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #61: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #62: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #63: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #64: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #65: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #66: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #67: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #68: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #69: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #70: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #71: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #72: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #73: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #74: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #75: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #76: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #77: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #78: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #79: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #80: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #81: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #82: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #83: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #84: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #85: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #86: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #87: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #88: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #89: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #90: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #91: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #92: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f #93: (&(&wl->cfg_spin_lock)->rlock){..-...}, at: [<ffffffffa07bc4d0>] handle_rcv+0x15d/0x1dd [wanlink] #94: (&wl_threads[q].my_wq){..-...}, at: [<ffffffff810abe4d>] __wake_up+0x1d/0x48 #95: (&p->pi_lock){-.-.-.}, at: [<ffffffff810b09c1>] try_to_wake_up+0x29/0x20b BUG: MAX_LOCK_DEPTH too low, depth: 96 max: 96! turning off the locking correctness validator. Pid: 0, comm: swapper/0 Tainted: G C O 3.7.6+ #6 Call Trace: <IRQ> [<ffffffff810cda86>] __lock_acquire+0xe8d/0xf13 [<ffffffff810cc3e8>] ? mark_lock+0x2d/0x231 [<ffffffff810cdb70>] lock_acquire+0x64/0x81 [<ffffffff810b09c1>] ? try_to_wake_up+0x29/0x20b [<ffffffff81540435>] _raw_spin_lock_irqsave+0x4e/0x60 [<ffffffff810b09c1>] ? try_to_wake_up+0x29/0x20b [<ffffffff810b09c1>] try_to_wake_up+0x29/0x20b [<ffffffff810b0bb0>] default_wake_function+0xd/0xf [<ffffffff810ab34d>] __wake_up_common+0x49/0x7f [<ffffffff810abe64>] __wake_up+0x34/0x48 [<ffffffffa07bbe11>] handleRcvWlp+0xf4d/0xfd5 [wanlink] [<ffffffff810ccf7c>] ? __lock_acquire+0x383/0xf13 [<ffffffffa07bc325>] wp_handle_rcv_pkt+0x1f5/0x243 [wanlink] [<ffffffffa07bc51c>] handle_rcv+0x1a9/0x1dd [wanlink] [<ffffffff814777f5>] deliver_skb+0x58/0x60 [<ffffffff81477d04>] __netif_receive_skb+0x507/0x63c [<ffffffff810100b6>] ? read_tsc+0x9/0x1b [<ffffffff81479b3d>] netif_receive_skb+0x56/0x8b [<ffffffff8147a168>] ? __napi_gro_receive+0xf3/0x102 [<ffffffff81479c55>] napi_skb_finish+0x24/0x57 [<ffffffff8147a39d>] napi_gro_receive+0x91/0x96 [<ffffffffa0218d31>] e1000_receive_skb+0x50/0x5b [e1000e] [<ffffffffa021c00c>] e1000_clean_rx_irq+0x29f/0x341 [e1000e] [<ffffffffa021c5c6>] e1000e_poll+0x7d/0x239 [e1000e] [<ffffffff8147a523>] net_rx_action+0xb2/0x1e2 [<ffffffff8108ee06>] ? __do_softirq+0x6d/0x17b [<ffffffff8108ee44>] __do_softirq+0xab/0x17b [<ffffffff815473bc>] call_softirq+0x1c/0x30 [<ffffffff8100bd46>] do_softirq+0x46/0x9e [<ffffffff8108ec9f>] irq_exit+0x4e/0xcc [<ffffffff8100b635>] do_IRQ+0x97/0xae [<ffffffff81540f72>] common_interrupt+0x72/0x72 <EOI> [<ffffffff81011082>] ? mwait_idle+0x77/0x9f [<ffffffff81011079>] ? mwait_idle+0x6e/0x9f [<ffffffff81011379>] cpu_idle+0x68/0xbc [<ffffffff81522b4c>] rest_init+0x130/0x137 [<ffffffff81522a1c>] ? csum_partial_copy_generic+0x16c/0x16c [<ffffffff81ac3c87>] start_kernel+0x37f/0x38c [<ffffffff81ac373d>] ? repair_env_string+0x58/0x58 [<ffffffff81ac32dd>] x86_64_start_reservations+0xb8/0xbd [<ffffffff81ac33e3>] x86_64_start_kernel+0x101/0x110 BUG: sleeping function called from invalid context at /home/greearb/git/linux-3.7.dev.y/kernel/mutex.c:269 in_atomic(): 0, irqs_disabled(): 0, pid: 2476, name: kwanlink/0 INFO: lockdep is turned off. Pid: 2476, comm: kwanlink/0 Tainted: G C O 3.7.6+ #6 Call Trace: [<ffffffff810ac9b0>] __might_sleep+0x1a2/0x1ab [<ffffffff8153e549>] mutex_lock_nested+0x20/0x3b [<ffffffffa07bd4ac>] do_task+0x6d/0x248 [wanlink] [<ffffffffa07bd7e0>] wl_thread_task+0x159/0x202 [wanlink] [<ffffffffa07bd687>] ? do_task+0x248/0x248 [wanlink] [<ffffffff810a350b>] kthread+0xc2/0xca [<ffffffff810a3449>] ? __init_kthread_worker+0x56/0x56 [<ffffffff8154607c>] ret_from_fork+0x7c/0xb0 [<ffffffff810a3449>] ? __init_kthread_worker+0x56/0x56 BUG: sleeping function called from invalid context at /home/greearb/git/linux-3.7.dev.y/kernel/mutex.c:269 -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 2:26 ` Ben Greear @ 2013-02-06 2:52 ` Steven Rostedt 2013-02-06 3:30 ` Ben Greear 0 siblings, 1 reply; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 2:52 UTC (permalink / raw) To: Ben Greear; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On Tue, 2013-02-05 at 18:26 -0800, Ben Greear wrote: > Well, here it is..something is calling rcu_read_lock lots and lots, Or a bug in the way lockdep handles rcu mappings. > it seems. Any way to get a better idea of where those calls are > made? Yeah, with ftrace. > 96 locks held by swapper/0/0: > #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f > #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f [...] > #92: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f > #93: (&(&wl->cfg_spin_lock)->rlock){..-...}, at: [<ffffffffa07bc4d0>] handle_rcv+0x15d/0x1dd [wanlink] > #94: (&wl_threads[q].my_wq){..-...}, at: [<ffffffff810abe4d>] __wake_up+0x1d/0x48 > #95: (&p->pi_lock){-.-.-.}, at: [<ffffffff810b09c1>] try_to_wake_up+0x29/0x20b If you haven't already configured ftrace into your kernel, can you please do so. Specifically: CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y CONFIG_DYNAMIC_FTRACE=y Then, before triggering this, run the following as root: # mount -t debugfs nodev /sys/kernel/debug # cd /sys/kernel/debug/tracing # echo net_rx_action > set_graph_function # echo function_graph > current_tracer In the kernel, where you added the above dump, before any of the printks happen, add this too: trace_printk("BUG\n"); tracing_off(); This will stop the trace at the point of the error. The trace_printk() is a nice way to see the trace too. Then after you trigger the bug, do the following: cat /sys/kernel/debug/tracing/trace and reply with that. To turn tracing on again (because the tracing_off() disabled it), you can do: echo 1 > /sys/kernel/debug/tracing/tracing_on Careful not to do "echo 1> ..." ;-) -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 2:52 ` Steven Rostedt @ 2013-02-06 3:30 ` Ben Greear 2013-02-06 4:36 ` Steven Rostedt 0 siblings, 1 reply; 17+ messages in thread From: Ben Greear @ 2013-02-06 3:30 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On 02/05/2013 06:52 PM, Steven Rostedt wrote: > On Tue, 2013-02-05 at 18:26 -0800, Ben Greear wrote: >> Well, here it is..something is calling rcu_read_lock lots and lots, > > Or a bug in the way lockdep handles rcu mappings. > >> it seems. Any way to get a better idea of where those calls are >> made? > > Yeah, with ftrace. > >> 96 locks held by swapper/0/0: >> #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f >> #1: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f > [...] >> #92: (rcu_read_lock){.+.+..}, at: [<ffffffff81476836>] rcu_read_lock+0x0/0x6f >> #93: (&(&wl->cfg_spin_lock)->rlock){..-...}, at: [<ffffffffa07bc4d0>] handle_rcv+0x15d/0x1dd [wanlink] >> #94: (&wl_threads[q].my_wq){..-...}, at: [<ffffffff810abe4d>] __wake_up+0x1d/0x48 >> #95: (&p->pi_lock){-.-.-.}, at: [<ffffffff810b09c1>] try_to_wake_up+0x29/0x20b > > If you haven't already configured ftrace into your kernel, can you > please do so. Specifically: > > CONFIG_FUNCTION_TRACER=y > CONFIG_FUNCTION_GRAPH_TRACER=y > CONFIG_DYNAMIC_FTRACE=y > > Then, before triggering this, run the following as root: > > # mount -t debugfs nodev /sys/kernel/debug > # cd /sys/kernel/debug/tracing > # echo net_rx_action > set_graph_function > # echo function_graph > current_tracer > > In the kernel, where you added the above dump, before any of the printks > happen, add this too: > > trace_printk("BUG\n"); > tracing_off(); > > This will stop the trace at the point of the error. The trace_printk() > is a nice way to see the trace too. > > Then after you trigger the bug, do the following: > > cat /sys/kernel/debug/tracing/trace > > and reply with that. It's huge, so here's a link: http://www.candelatech.com/~greearb/debug.tgz Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 3:30 ` Ben Greear @ 2013-02-06 4:36 ` Steven Rostedt 2013-02-06 6:23 ` Ben Greear 0 siblings, 1 reply; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 4:36 UTC (permalink / raw) To: Ben Greear; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On Tue, 2013-02-05 at 19:30 -0800, Ben Greear wrote: > It's huge, so here's a link: > > http://www.candelatech.com/~greearb/debug.tgz > The trace shows that __netif_receive_skb() is grabbing an rcu_read_lock() but never releasing it. But I don't see any possible way that can be true in the code. Can you apply the following patch and run the trace again. I'd like to see if the code is going in the path I expect it is. Thanks, -- Steve diff --git a/net/core/dev.c b/net/core/dev.c index e5942bf..5b12ff5 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3260,6 +3260,7 @@ static int __netif_receive_skb(struct sk_buff *skb) pt_prev = NULL; rcu_read_lock(); + trace_printk("lock\n"); another_round: skb->skb_iif = skb->dev->ifindex; @@ -3314,6 +3315,7 @@ ncls: } rx_handler = rcu_dereference(skb->dev->rx_handler); + trace_printk("handler %pS\n", rx_handler); if (rx_handler) { if (pt_prev) { ret = deliver_skb(skb, pt_prev, orig_dev); @@ -3359,6 +3361,7 @@ ncls: } else { drop: atomic_long_inc(&skb->dev->rx_dropped); + trace_printk("kfree\n"); kfree_skb(skb); /* Jamal, now you will not able to escape explaining * me how you were going to use this. :-) @@ -3367,6 +3370,7 @@ drop: } unlock: + trace_printk("unlock\n"); rcu_read_unlock(); out: tsk_restore_flags(current, pflags, PF_MEMALLOC); ^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 4:36 ` Steven Rostedt @ 2013-02-06 6:23 ` Ben Greear 2013-02-06 13:21 ` Steven Rostedt 0 siblings, 1 reply; 17+ messages in thread From: Ben Greear @ 2013-02-06 6:23 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On 02/05/2013 08:36 PM, Steven Rostedt wrote: > On Tue, 2013-02-05 at 19:30 -0800, Ben Greear wrote: > >> It's huge, so here's a link: >> >> http://www.candelatech.com/~greearb/debug.tgz >> > > The trace shows that __netif_receive_skb() is grabbing an > rcu_read_lock() but never releasing it. But I don't see any possible way > that can be true in the code. > > Can you apply the following patch and run the trace again. I'd like to > see if the code is going in the path I expect it is. I'll look at this in detail tomorrow. I have a few patches in dev.c code, maybe I merged something badly.... Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 6:23 ` Ben Greear @ 2013-02-06 13:21 ` Steven Rostedt 2013-02-06 15:56 ` Ben Greear 0 siblings, 1 reply; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 13:21 UTC (permalink / raw) To: Ben Greear; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On Tue, 2013-02-05 at 22:23 -0800, Ben Greear wrote: > On 02/05/2013 08:36 PM, Steven Rostedt wrote: > > On Tue, 2013-02-05 at 19:30 -0800, Ben Greear wrote: > > > >> It's huge, so here's a link: > >> > >> http://www.candelatech.com/~greearb/debug.tgz > >> > > > > The trace shows that __netif_receive_skb() is grabbing an > > rcu_read_lock() but never releasing it. But I don't see any possible way > > that can be true in the code. > > > > Can you apply the following patch and run the trace again. I'd like to > > see if the code is going in the path I expect it is. > > I'll look at this in detail tomorrow. I have a few patches in dev.c code, maybe > I merged something badly.... I wish you told me that in the beginning. That you modified the core code. In that case I would have asked to see your modifications before anything else. I'm 99% sure that the bug is in your modifications. -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 13:21 ` Steven Rostedt @ 2013-02-06 15:56 ` Ben Greear 2013-02-06 16:07 ` Steven Rostedt 0 siblings, 1 reply; 17+ messages in thread From: Ben Greear @ 2013-02-06 15:56 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On 02/06/2013 05:21 AM, Steven Rostedt wrote: > On Tue, 2013-02-05 at 22:23 -0800, Ben Greear wrote: >> On 02/05/2013 08:36 PM, Steven Rostedt wrote: >>> On Tue, 2013-02-05 at 19:30 -0800, Ben Greear wrote: >>> >>>> It's huge, so here's a link: >>>> >>>> http://www.candelatech.com/~greearb/debug.tgz >>>> >>> >>> The trace shows that __netif_receive_skb() is grabbing an >>> rcu_read_lock() but never releasing it. But I don't see any possible way >>> that can be true in the code. >>> >>> Can you apply the following patch and run the trace again. I'd like to >>> see if the code is going in the path I expect it is. >> >> I'll look at this in detail tomorrow. I have a few patches in dev.c code, maybe >> I merged something badly.... > > I wish you told me that in the beginning. That you modified the core > code. In that case I would have asked to see your modifications before > anything else. > > I'm 99% sure that the bug is in your modifications. I'm sorry, I tried to make that clear. My tree is here, minus a few debugging patches related to this bug: http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.7.dev.y/.git;a=summary It's rebased on top of 3.7.6. The wanlink module I'm using is outside the tree. I recently decided to release this as GPL, so ignore any conflicting copyright notices to the contrary in the code: http://www.candelatech.com/~greearb/wanlink.tgz The wanlink module is a network impairment tool that acts like a 2-port bridge. In my current test case, I'm bridging about 300Mbps of traffic across two ethernet ports. It blows up immediately after starting the emulation. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 15:56 ` Ben Greear @ 2013-02-06 16:07 ` Steven Rostedt 2013-02-06 16:39 ` Ingo Molnar 2013-02-06 16:58 ` Ben Greear 0 siblings, 2 replies; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 16:07 UTC (permalink / raw) To: Ben Greear; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On Wed, 2013-02-06 at 07:56 -0800, Ben Greear wrote: > > I'm 99% sure that the bug is in your modifications. > > I'm sorry, I tried to make that clear. You said it was an out of tree module, I didn't realize it had changes to the core Linux as well. > My tree is here, minus a few debugging patches related to this bug: > > http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.7.dev.y/.git;a=summary Here's your bug: 3319 list_for_each_entry_rcu(ptype, &ptype_all, list) { 3320 if (!ptype->dev || ptype->dev == skb->dev) { 3321 if (pt_prev) { 3322 ret = deliver_skb(skb, pt_prev, orig_dev); 3323 if (ret == NET_RX_CONSUMED) { 3324 kfree_skb(skb); /* we made a copy in deliver_skb */ 3325 goto out; 3326 } 3327 } One of many. All those "goto out;"s skip the rcu_read_unlock(). -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 16:07 ` Steven Rostedt @ 2013-02-06 16:39 ` Ingo Molnar 2013-02-06 16:46 ` Steven Rostedt 2013-02-06 16:58 ` Ben Greear 1 sibling, 1 reply; 17+ messages in thread From: Ingo Molnar @ 2013-02-06 16:39 UTC (permalink / raw) To: Steven Rostedt; +Cc: Ben Greear, Linux Kernel Mailing List, Peter Zijlstra * Steven Rostedt <rostedt@goodmis.org> wrote: > On Wed, 2013-02-06 at 07:56 -0800, Ben Greear wrote: > > > > I'm 99% sure that the bug is in your modifications. > > > > I'm sorry, I tried to make that clear. > > You said it was an out of tree module, I didn't realize it had changes > to the core Linux as well. > > > My tree is here, minus a few debugging patches related to > > this bug: > > > > http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.7.dev.y/.git;a=summary > > Here's your bug: > > 3319 list_for_each_entry_rcu(ptype, &ptype_all, list) { > 3320 if (!ptype->dev || ptype->dev == skb->dev) { > 3321 if (pt_prev) { > 3322 ret = deliver_skb(skb, pt_prev, orig_dev); > 3323 if (ret == NET_RX_CONSUMED) { > 3324 kfree_skb(skb); /* we made a copy in deliver_skb */ > 3325 goto out; > 3326 } > 3327 } > > One of many. All those "goto out;"s skip the rcu_read_unlock(). Wondering whether we could improve lockdep to detect 'excessive' lock nesting and treat it as a bug - the 'ran out of lockdep entries' assert might be too detached and it might even trigger in innocent, low lesting count contexts. Thanks, Ingo ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 16:39 ` Ingo Molnar @ 2013-02-06 16:46 ` Steven Rostedt 0 siblings, 0 replies; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 16:46 UTC (permalink / raw) To: Ingo Molnar; +Cc: Ben Greear, Linux Kernel Mailing List, Peter Zijlstra On Wed, 2013-02-06 at 17:39 +0100, Ingo Molnar wrote: > * Steven Rostedt <rostedt@goodmis.org> wrote: > Wondering whether we could improve lockdep to detect 'excessive' > lock nesting and treat it as a bug - the 'ran out of lockdep > entries' assert might be too detached and it might even trigger > in innocent, low lesting count contexts. Probably adding a call to lockdep_printk_held_locks(current) when it triggers, would be a good idea. -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 16:07 ` Steven Rostedt 2013-02-06 16:39 ` Ingo Molnar @ 2013-02-06 16:58 ` Ben Greear 2013-02-06 17:03 ` Steven Rostedt 1 sibling, 1 reply; 17+ messages in thread From: Ben Greear @ 2013-02-06 16:58 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On 02/06/2013 08:07 AM, Steven Rostedt wrote: > On Wed, 2013-02-06 at 07:56 -0800, Ben Greear wrote: > >>> I'm 99% sure that the bug is in your modifications. >> >> I'm sorry, I tried to make that clear. > > You said it was an out of tree module, I didn't realize it had changes > to the core Linux as well. > > >> My tree is here, minus a few debugging patches related to this bug: >> >> http://dmz2.candelatech.com/git/gitweb.cgi?p=linux-3.7.dev.y/.git;a=summary > > Here's your bug: > > 3319 list_for_each_entry_rcu(ptype, &ptype_all, list) { > 3320 if (!ptype->dev || ptype->dev == skb->dev) { > 3321 if (pt_prev) { > 3322 ret = deliver_skb(skb, pt_prev, orig_dev); > 3323 if (ret == NET_RX_CONSUMED) { > 3324 kfree_skb(skb); /* we made a copy in deliver_skb */ > 3325 goto out; > 3326 } > 3327 } > > One of many. All those "goto out;"s skip the rcu_read_unlock(). Thank you so much! I'll fix this and continue testing... I can also post a patch to print the held locks when the max-lock-depth overflows unless someone else prefers to... Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 16:58 ` Ben Greear @ 2013-02-06 17:03 ` Steven Rostedt 0 siblings, 0 replies; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 17:03 UTC (permalink / raw) To: Ben Greear; +Cc: Linux Kernel Mailing List, Peter Zijlstra, Ingo Molnar On Wed, 2013-02-06 at 08:58 -0800, Ben Greear wrote: > > One of many. All those "goto out;"s skip the rcu_read_unlock(). > > Thank you so much! You're welcome ;-) > > I'll fix this and continue testing... > > I can also post a patch to print the held locks when > the max-lock-depth overflows unless someone else prefers > to... Sure, go ahead. Send it to LKML and Cc Peter, Ingo and myself. Thanks, -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 1:10 Question on lockdep and MAX_LOCK_DEPTH Ben Greear 2013-02-06 1:54 ` Steven Rostedt @ 2013-02-06 4:26 ` Steven Rostedt 2013-02-06 6:20 ` Ben Greear 1 sibling, 1 reply; 17+ messages in thread From: Steven Rostedt @ 2013-02-06 4:26 UTC (permalink / raw) To: Ben Greear; +Cc: Linux Kernel Mailing List On Tue, Feb 05, 2013 at 05:10:37PM -0800, Ben Greear wrote: > > > =============================== > [ INFO: suspicious RCU usage. ] > 3.7.6+ #4 Tainted: G C O > ------------------------------- > /home/greearb/git/linux-3.7.dev.y/kernel/rcutree.c:360 Illegal idle entry in RCU read-side critical section.! > > other info that might help us debug this: > > > RCU used illegally from idle CPU! > rcu_scheduler_active = 1, debug_locks = 1 > RCU used illegally from extended quiescent state! > 1 lock held by swapper/1/0: > #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81489632>] rcu_read_lock+0x0/0x6f > > stack backtrace: > Pid: 0, comm: swapper/1 Tainted: G C O 3.7.6+ #4 > Call Trace: > <IRQ> [<ffffffff810cecbd>] lockdep_rcu_suspicious+0xfc/0x105 > [<ffffffff810fe146>] rcu_eqs_enter_common+0xef/0x178 > [<ffffffff810fe24e>] rcu_irq_exit+0x7f/0xb0 > [<ffffffff8108edf5>] irq_exit+0xc0/0xcc > [<ffffffff8100b635>] do_IRQ+0x97/0xae > [<ffffffff815543b2>] common_interrupt+0x72/0x72 > <EOI> [<ffffffff81011082>] ? mwait_idle+0x77/0x9f > [<ffffffff81011079>] ? mwait_idle+0x6e/0x9f > [<ffffffff81011379>] cpu_idle+0x68/0xbc > [<ffffffff8154ce84>] start_secondary+0x1fc/0x203 BTW, did the above error only happen when you added the dump of the locks, or was it always there? Is this the bug you were trying to debug? -- Steve ^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: Question on lockdep and MAX_LOCK_DEPTH 2013-02-06 4:26 ` Steven Rostedt @ 2013-02-06 6:20 ` Ben Greear 0 siblings, 0 replies; 17+ messages in thread From: Ben Greear @ 2013-02-06 6:20 UTC (permalink / raw) To: Steven Rostedt; +Cc: Linux Kernel Mailing List On 02/05/2013 08:26 PM, Steven Rostedt wrote: > On Tue, Feb 05, 2013 at 05:10:37PM -0800, Ben Greear wrote: >> >> >> =============================== >> [ INFO: suspicious RCU usage. ] >> 3.7.6+ #4 Tainted: G C O >> ------------------------------- >> /home/greearb/git/linux-3.7.dev.y/kernel/rcutree.c:360 Illegal idle entry in RCU read-side critical section.! >> >> other info that might help us debug this: >> >> >> RCU used illegally from idle CPU! >> rcu_scheduler_active = 1, debug_locks = 1 >> RCU used illegally from extended quiescent state! >> 1 lock held by swapper/1/0: >> #0: (rcu_read_lock){.+.+..}, at: [<ffffffff81489632>] rcu_read_lock+0x0/0x6f >> >> stack backtrace: >> Pid: 0, comm: swapper/1 Tainted: G C O 3.7.6+ #4 >> Call Trace: >> <IRQ> [<ffffffff810cecbd>] lockdep_rcu_suspicious+0xfc/0x105 >> [<ffffffff810fe146>] rcu_eqs_enter_common+0xef/0x178 >> [<ffffffff810fe24e>] rcu_irq_exit+0x7f/0xb0 >> [<ffffffff8108edf5>] irq_exit+0xc0/0xcc >> [<ffffffff8100b635>] do_IRQ+0x97/0xae >> [<ffffffff815543b2>] common_interrupt+0x72/0x72 >> <EOI> [<ffffffff81011082>] ? mwait_idle+0x77/0x9f >> [<ffffffff81011079>] ? mwait_idle+0x6e/0x9f >> [<ffffffff81011379>] cpu_idle+0x68/0xbc >> [<ffffffff8154ce84>] start_secondary+0x1fc/0x203 > > BTW, did the above error only happen when you added the dump of the > locks, or was it always there? Is this the bug you were trying to debug? It's often there, at least..and was from the beginning, and I am interested in debugging it. Thanks, Ben -- Ben Greear <greearb@candelatech.com> Candela Technologies Inc http://www.candelatech.com ^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2013-02-06 17:03 UTC | newest] Thread overview: 17+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-02-06 1:10 Question on lockdep and MAX_LOCK_DEPTH Ben Greear 2013-02-06 1:54 ` Steven Rostedt 2013-02-06 2:17 ` Ben Greear 2013-02-06 2:26 ` Ben Greear 2013-02-06 2:52 ` Steven Rostedt 2013-02-06 3:30 ` Ben Greear 2013-02-06 4:36 ` Steven Rostedt 2013-02-06 6:23 ` Ben Greear 2013-02-06 13:21 ` Steven Rostedt 2013-02-06 15:56 ` Ben Greear 2013-02-06 16:07 ` Steven Rostedt 2013-02-06 16:39 ` Ingo Molnar 2013-02-06 16:46 ` Steven Rostedt 2013-02-06 16:58 ` Ben Greear 2013-02-06 17:03 ` Steven Rostedt 2013-02-06 4:26 ` Steven Rostedt 2013-02-06 6:20 ` Ben Greear
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox