* [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
@ 2010-08-16 17:07 Steven Rostedt
2010-08-16 17:31 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2010-08-16 17:07 UTC (permalink / raw)
To: netdev; +Cc: LKML, David S. Miller, Eric Dumazet, Patrick McHardy
Hi, I hit this when booting 2.6.36-rc1:
=================================
[ INFO: inconsistent lock state ]
2.6.36-rc1 #2937
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes:
(&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff8107b08e>] __lock_acquire+0x756/0x93c
[<ffffffff8107b374>] lock_acquire+0x100/0x12d
[<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
[<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables]
[<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables]
[<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables]
[<ffffffff813863a2>] nf_sockopt+0x5a/0x86
[<ffffffff813863e6>] nf_getsockopt+0x18/0x1a
[<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6]
[<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6]
[<ffffffff81355571>] sock_common_getsockopt+0x14/0x16
[<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b
[<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b
irq event stamp: 40
hardirqs last enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79
hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92
softirqs last enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce
softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
other info that might help us debug this:
3 locks held by ifup-eth/3288:
#0: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6
#1: (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6]
#2: (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119
stack backtrace:
Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937
Call Trace:
<IRQ> [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5
[<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c
[<ffffffff8106cd0c>] ? local_clock+0x40/0x59
[<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf
[<ffffffff81077de1>] mark_lock+0x2ea/0x51f
[<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c
[<ffffffff8106cd0c>] ? local_clock+0x40/0x59
[<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffff8107b374>] lock_acquire+0x100/0x12d
[<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffff81011149>] ? sched_clock+0x9/0xd
[<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
[<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e
[<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119
[<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter]
[<ffffffff81384f40>] nf_iterate+0x46/0x89
[<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
[<ffffffff8138501b>] nf_hook_slow+0x98/0x119
[<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
[<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6]
[<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6]
[<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb
[<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6]
[<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6
[<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6
[<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6]
[<ffffffff8105169e>] ? __do_softirq+0x79/0x247
[<ffffffff81051763>] __do_softirq+0x13e/0x247
[<ffffffff8100bc9c>] call_softirq+0x1c/0x30
[<ffffffff8100d32f>] do_softirq+0x4b/0xa3
[<ffffffff8105120a>] irq_exit+0x4a/0x95
[<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a
[<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32
[<ffffffff81031fec>] kernel_map_pages+0x12c/0x142
[<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d
[<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165
[<ffffffff810ccb2b>] __free_pages+0x2b/0x34
[<ffffffff810ccb7d>] free_pages+0x49/0x4e
[<ffffffff81033871>] pgd_free+0x71/0x79
[<ffffffff81048b1b>] __mmdrop+0x27/0x54
[<ffffffff81042c62>] finish_task_switch+0xb4/0xe4
[<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4
[<ffffffff810096e7>] ? __switch_to+0x1a9/0x297
[<ffffffff81042df3>] schedule_tail+0x30/0xa7
[<ffffffff8100ac33>] ret_from_fork+0x13/0x80
I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with
other "get_counters()" functions do not block bottom halves anymore as
to this commit:
commit 24b36f0193467fa727b85b4c004016a8dae999b9
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon Aug 2 16:49:01 2010 +0200
netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
We currently disable BH for the whole duration of get_counters()
Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock
even has the following comment:
/*
* The "writer" side needs to get exclusive access to the lock,
* regardless of readers. This must be called with bottom half
* processing (and thus also preemption) disabled.
*/
static inline void xt_info_wrlock(unsigned int cpu)
As lockdep has proven, this is not satisfied.
-- Steve
^ permalink raw reply [flat|nested] 13+ messages in thread* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 17:07 [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? Steven Rostedt @ 2010-08-16 17:31 ` Eric Dumazet 2010-08-16 17:55 ` Steven Rostedt 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2010-08-16 17:31 UTC (permalink / raw) To: Steven Rostedt; +Cc: netdev, LKML, David S. Miller, Patrick McHardy Le lundi 16 août 2010 à 13:07 -0400, Steven Rostedt a écrit : > Hi, I hit this when booting 2.6.36-rc1: > > ================================= > [ INFO: inconsistent lock state ] > 2.6.36-rc1 #2937 > --------------------------------- > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes: > (&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables] > {SOFTIRQ-ON-W} state was registered at: > [<ffffffff8107b08e>] __lock_acquire+0x756/0x93c > [<ffffffff8107b374>] lock_acquire+0x100/0x12d > [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73 > [<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables] > [<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables] > [<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables] > [<ffffffff813863a2>] nf_sockopt+0x5a/0x86 > [<ffffffff813863e6>] nf_getsockopt+0x18/0x1a > [<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6] > [<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6] > [<ffffffff81355571>] sock_common_getsockopt+0x14/0x16 > [<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b > [<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b > irq event stamp: 40 > hardirqs last enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79 > hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92 > softirqs last enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce > softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30 > > other info that might help us debug this: > 3 locks held by ifup-eth/3288: > #0: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6 > #1: (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6] > #2: (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119 > > stack backtrace: > Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937 > Call Trace: > <IRQ> [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5 > [<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c > [<ffffffff8106cd0c>] ? local_clock+0x40/0x59 > [<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf > [<ffffffff81077de1>] mark_lock+0x2ea/0x51f > [<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c > [<ffffffff8106cd0c>] ? local_clock+0x40/0x59 > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables] > [<ffffffff8107b374>] lock_acquire+0x100/0x12d > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables] > [<ffffffff81011149>] ? sched_clock+0x9/0xd > [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73 > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables] > [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables] > [<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e > [<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119 > [<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter] > [<ffffffff81384f40>] nf_iterate+0x46/0x89 > [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6] > [<ffffffff8138501b>] nf_hook_slow+0x98/0x119 > [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6] > [<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6] > [<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6] > [<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb > [<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6] > [<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6 > [<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6 > [<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6] > [<ffffffff8105169e>] ? __do_softirq+0x79/0x247 > [<ffffffff81051763>] __do_softirq+0x13e/0x247 > [<ffffffff8100bc9c>] call_softirq+0x1c/0x30 > [<ffffffff8100d32f>] do_softirq+0x4b/0xa3 > [<ffffffff8105120a>] irq_exit+0x4a/0x95 > [<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a > [<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20 > <EOI> [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32 > [<ffffffff81031fec>] kernel_map_pages+0x12c/0x142 > [<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d > [<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165 > [<ffffffff810ccb2b>] __free_pages+0x2b/0x34 > [<ffffffff810ccb7d>] free_pages+0x49/0x4e > [<ffffffff81033871>] pgd_free+0x71/0x79 > [<ffffffff81048b1b>] __mmdrop+0x27/0x54 > [<ffffffff81042c62>] finish_task_switch+0xb4/0xe4 > [<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4 > [<ffffffff810096e7>] ? __switch_to+0x1a9/0x297 > [<ffffffff81042df3>] schedule_tail+0x30/0xa7 > [<ffffffff8100ac33>] ret_from_fork+0x13/0x80 > > I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with > other "get_counters()" functions do not block bottom halves anymore as > to this commit: > > commit 24b36f0193467fa727b85b4c004016a8dae999b9 > Author: Eric Dumazet <eric.dumazet@gmail.com> > Date: Mon Aug 2 16:49:01 2010 +0200 > > netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary > > We currently disable BH for the whole duration of get_counters() > > Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock > even has the following comment: > > /* > * The "writer" side needs to get exclusive access to the lock, > * regardless of readers. This must be called with bottom half > * processing (and thus also preemption) disabled. > */ > static inline void xt_info_wrlock(unsigned int cpu) > > > As lockdep has proven, this is not satisfied. > > -- Steve > > This is a false positive, and a patch was sent yesterday http://patchwork.ozlabs.org/patch/61750/ Thanks ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 17:31 ` Eric Dumazet @ 2010-08-16 17:55 ` Steven Rostedt 2010-08-16 18:16 ` Steven Rostedt 2010-08-16 19:44 ` David Miller 0 siblings, 2 replies; 13+ messages in thread From: Steven Rostedt @ 2010-08-16 17:55 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar, Peter Zijlstra On Mon, 2010-08-16 at 19:31 +0200, Eric Dumazet wrote: > Le lundi 16 août 2010 à 13:07 -0400, Steven Rostedt a écrit : > > Hi, I hit this when booting 2.6.36-rc1: > > > > ================================= > > [ INFO: inconsistent lock state ] > > 2.6.36-rc1 #2937 > > --------------------------------- > > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. > > ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes: > > (&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables] > > {SOFTIRQ-ON-W} state was registered at: > > [<ffffffff8107b08e>] __lock_acquire+0x756/0x93c > > [<ffffffff8107b374>] lock_acquire+0x100/0x12d > > [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73 > > [<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables] > > [<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables] > > [<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables] > > [<ffffffff813863a2>] nf_sockopt+0x5a/0x86 > > [<ffffffff813863e6>] nf_getsockopt+0x18/0x1a > > [<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6] > > [<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6] > > [<ffffffff81355571>] sock_common_getsockopt+0x14/0x16 > > [<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b > > [<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b > > irq event stamp: 40 > > hardirqs last enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79 > > hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92 > > softirqs last enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce > > softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30 > > > > other info that might help us debug this: > > 3 locks held by ifup-eth/3288: > > #0: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6 > > #1: (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6] > > #2: (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119 > > > > stack backtrace: > > Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937 > > Call Trace: > > <IRQ> [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5 > > [<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c > > [<ffffffff8106cd0c>] ? local_clock+0x40/0x59 > > [<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf > > [<ffffffff81077de1>] mark_lock+0x2ea/0x51f > > [<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c > > [<ffffffff8106cd0c>] ? local_clock+0x40/0x59 > > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables] > > [<ffffffff8107b374>] lock_acquire+0x100/0x12d > > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables] > > [<ffffffff81011149>] ? sched_clock+0x9/0xd > > [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73 > > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables] > > [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables] > > [<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e > > [<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119 > > [<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter] > > [<ffffffff81384f40>] nf_iterate+0x46/0x89 > > [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6] > > [<ffffffff8138501b>] nf_hook_slow+0x98/0x119 > > [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6] > > [<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6] > > [<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6] > > [<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb > > [<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6] > > [<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6 > > [<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6 > > [<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6] > > [<ffffffff8105169e>] ? __do_softirq+0x79/0x247 > > [<ffffffff81051763>] __do_softirq+0x13e/0x247 > > [<ffffffff8100bc9c>] call_softirq+0x1c/0x30 > > [<ffffffff8100d32f>] do_softirq+0x4b/0xa3 > > [<ffffffff8105120a>] irq_exit+0x4a/0x95 > > [<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a > > [<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20 > > <EOI> [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32 > > [<ffffffff81031fec>] kernel_map_pages+0x12c/0x142 > > [<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d > > [<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165 > > [<ffffffff810ccb2b>] __free_pages+0x2b/0x34 > > [<ffffffff810ccb7d>] free_pages+0x49/0x4e > > [<ffffffff81033871>] pgd_free+0x71/0x79 > > [<ffffffff81048b1b>] __mmdrop+0x27/0x54 > > [<ffffffff81042c62>] finish_task_switch+0xb4/0xe4 > > [<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4 > > [<ffffffff810096e7>] ? __switch_to+0x1a9/0x297 > > [<ffffffff81042df3>] schedule_tail+0x30/0xa7 > > [<ffffffff8100ac33>] ret_from_fork+0x13/0x80 > > > > I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with > > other "get_counters()" functions do not block bottom halves anymore as > > to this commit: > > > > commit 24b36f0193467fa727b85b4c004016a8dae999b9 > > Author: Eric Dumazet <eric.dumazet@gmail.com> > > Date: Mon Aug 2 16:49:01 2010 +0200 > > > > netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary > > > > We currently disable BH for the whole duration of get_counters() > > > > Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock > > even has the following comment: > > > > /* > > * The "writer" side needs to get exclusive access to the lock, > > * regardless of readers. This must be called with bottom half > > * processing (and thus also preemption) disabled. > > */ > > static inline void xt_info_wrlock(unsigned int cpu) > > > > > > As lockdep has proven, this is not satisfied. > > > > -- Steve > > > > > > > This is a false positive, and a patch was sent yesterday > > http://patchwork.ozlabs.org/patch/61750/ > I still do not see how this is a false positive. Per-cpu locks do not solve the issue. Please tell me what prevents an interrupt going off after we grab the xt_info_wrlock(cpu) in get_counters(). IOW, what prevents this: get_counters() { xt_info_wrlock(cpu); <interrupt> --> softirq xt_info_rblock_bh(); /* which grabs the writer lock */ DEADLOCK!! -- Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 17:55 ` Steven Rostedt @ 2010-08-16 18:16 ` Steven Rostedt 2010-08-16 18:36 ` Peter Zijlstra 2010-08-16 18:48 ` Eric Dumazet 2010-08-16 19:44 ` David Miller 1 sibling, 2 replies; 13+ messages in thread From: Steven Rostedt @ 2010-08-16 18:16 UTC (permalink / raw) To: Eric Dumazet Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar, Peter Zijlstra On Mon, 2010-08-16 at 13:55 -0400, Steven Rostedt wrote: Re-looking at the code, I think I figured it out. But it should really be documented better. > > Please tell me what prevents an interrupt going off after we grab the > xt_info_wrlock(cpu) in get_counters(). > > IOW, what prevents this: > > get_counters() { I left out here: for_each_possible_cpu(cpu) { if (cpu == curcpu) continue; which means that we are grabbing the lock for other CPUs, and that a softirq would not be a problem. > xt_info_wrlock(cpu); > > <interrupt> --> softirq > > xt_info_rblock_bh(); > /* which grabs the writer lock */ > DEADLOCK!! > Your patchwork patch has: @@ -729,8 +729,10 @@ static void get_counters(const struct xt_table_info *t, local_bh_enable(); /* Processing counters from other cpus, we can let bottom half enabled, * (preemption is disabled) + * We must turn off lockdep to avoid a false positive. */ + lockdep_off(); for_each_possible_cpu(cpu) { We need a better comment than that. Could that be changed to something like: /* * lockdep tests if we grab a lock and can be preempted by * a softirq and that softirq grabs the same lock causing a * deadlock. * This is a special case because this is a per-cpu lock, * and we are only grabbing the lock for other CPUs. A softirq * will only takes its local CPU lock thus, if we are preempted * by a softirq, then it will grab the current CPU lock which * we do not take here. * * Simply disable lockdep here until it can handle this situation. */ Thanks, -- Steve ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 18:16 ` Steven Rostedt @ 2010-08-16 18:36 ` Peter Zijlstra 2010-08-16 18:48 ` Eric Dumazet 2010-08-16 18:48 ` Eric Dumazet 1 sibling, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2010-08-16 18:36 UTC (permalink / raw) To: Steven Rostedt Cc: Eric Dumazet, netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote: > @@ -729,8 +729,10 @@ static void get_counters(const struct > xt_table_info *t, > local_bh_enable(); > /* Processing counters from other cpus, we can let bottom half > enabled, > * (preemption is disabled) > + * We must turn off lockdep to avoid a false positive. > */ > > + lockdep_off(); > for_each_possible_cpu(cpu) { > nack! ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 18:36 ` Peter Zijlstra @ 2010-08-16 18:48 ` Eric Dumazet 2010-08-16 19:16 ` Peter Zijlstra 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2010-08-16 18:48 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar Le lundi 16 août 2010 à 20:36 +0200, Peter Zijlstra a écrit : > On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote: > > @@ -729,8 +729,10 @@ static void get_counters(const struct > > xt_table_info *t, > > local_bh_enable(); > > /* Processing counters from other cpus, we can let bottom half > > enabled, > > * (preemption is disabled) > > + * We must turn off lockdep to avoid a false positive. > > */ > > > > + lockdep_off(); > > for_each_possible_cpu(cpu) { > > > nack! Interesting. Care to elaborate ? ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 18:48 ` Eric Dumazet @ 2010-08-16 19:16 ` Peter Zijlstra 2010-08-16 19:35 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2010-08-16 19:16 UTC (permalink / raw) To: Eric Dumazet Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar On Mon, 2010-08-16 at 20:48 +0200, Eric Dumazet wrote: > Le lundi 16 août 2010 à 20:36 +0200, Peter Zijlstra a écrit : > > On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote: > > > @@ -729,8 +729,10 @@ static void get_counters(const struct > > > xt_table_info *t, > > > local_bh_enable(); > > > /* Processing counters from other cpus, we can let bottom half > > > enabled, > > > * (preemption is disabled) > > > + * We must turn off lockdep to avoid a false positive. > > > */ > > > > > > + lockdep_off(); > > > for_each_possible_cpu(cpu) { > > > > > nack! > > > Interesting. > > Care to elaborate ? Adding lockdep_off() is just plain wrong, if you cannot describe the locking there's a fair chance its wrong anyway. As it stands there's only a single lockdep_off(), and that lives in NTFS it looks like it could be annotated differently, but then, who cares about NTFS anyway ;-) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 19:16 ` Peter Zijlstra @ 2010-08-16 19:35 ` Eric Dumazet 2010-08-16 20:01 ` Peter Zijlstra 0 siblings, 1 reply; 13+ messages in thread From: Eric Dumazet @ 2010-08-16 19:35 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit : > Adding lockdep_off() is just plain wrong, if you cannot describe the > locking there's a fair chance its wrong anyway. > I see. I described the fine locking after Steven comment, adding a long Changelog. http://patchwork.ozlabs.org/patch/61827/ If someone thinks this locking is buggy, please speak now ;) ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 19:35 ` Eric Dumazet @ 2010-08-16 20:01 ` Peter Zijlstra 2010-08-16 20:17 ` Eric Dumazet 0 siblings, 1 reply; 13+ messages in thread From: Peter Zijlstra @ 2010-08-16 20:01 UTC (permalink / raw) To: Eric Dumazet Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar On Mon, 2010-08-16 at 21:35 +0200, Eric Dumazet wrote: > Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit : > > > Adding lockdep_off() is just plain wrong, if you cannot describe the > > locking there's a fair chance its wrong anyway. > > > > I see. > > I described the fine locking after Steven comment, adding a long > Changelog. > > http://patchwork.ozlabs.org/patch/61827/ > > If someone thinks this locking is buggy, please speak now ;) Urgh,.. I think it might be correct, but wtf! Wasn't this originally RCU code, why not go back to using RCU now that we have synchronize_rcu_expedited()? As to the original issue, why not keep that bh stuff disabled for CONFIG_PROVE_LOCKING instead, that will at least let you keep lock coverage, adding lockdep_off() will hide any cycles that would involve this lock (even though its currently a leaf lock, you never know what creative things the future brings). This fancy open coded lock looks like utter fail for -rt though.. please use RCU if at all possible. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 20:01 ` Peter Zijlstra @ 2010-08-16 20:17 ` Eric Dumazet 0 siblings, 0 replies; 13+ messages in thread From: Eric Dumazet @ 2010-08-16 20:17 UTC (permalink / raw) To: Peter Zijlstra Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar Le lundi 16 août 2010 à 22:01 +0200, Peter Zijlstra a écrit : > On Mon, 2010-08-16 at 21:35 +0200, Eric Dumazet wrote: > > Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit : > > > > > Adding lockdep_off() is just plain wrong, if you cannot describe the > > > locking there's a fair chance its wrong anyway. > > > > > > > I see. > > > > I described the fine locking after Steven comment, adding a long > > Changelog. > > > > http://patchwork.ozlabs.org/patch/61827/ > > > > If someone thinks this locking is buggy, please speak now ;) > > Urgh,.. I think it might be correct, but wtf! Wasn't this originally RCU > code, why not go back to using RCU now that we have > synchronize_rcu_expedited()? > > As to the original issue, why not keep that bh stuff disabled for > CONFIG_PROVE_LOCKING instead, that will at least let you keep lock > coverage, adding lockdep_off() will hide any cycles that would involve > this lock (even though its currently a leaf lock, you never know what > creative things the future brings). > > This fancy open coded lock looks like utter fail for -rt though.. please > use RCU if at all possible. > > Please read history of why RCU failed in this context. I believe I did most of RCU conversions in kernel, you dont need to shout on me. And its a bit late in 2.6.36 to even think about it. I am happy that you volunteer for next RCU conversion, thanks Peter ! In the mean time, we just are going to disable BH again, I'll post a patch. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 18:16 ` Steven Rostedt 2010-08-16 18:36 ` Peter Zijlstra @ 2010-08-16 18:48 ` Eric Dumazet 1 sibling, 0 replies; 13+ messages in thread From: Eric Dumazet @ 2010-08-16 18:48 UTC (permalink / raw) To: Steven Rostedt Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar, Peter Zijlstra Le lundi 16 août 2010 à 14:16 -0400, Steven Rostedt a écrit : > We need a better comment than that. Could that be changed to something like: > > /* > * lockdep tests if we grab a lock and can be preempted by > * a softirq and that softirq grabs the same lock causing a > * deadlock. > * This is a special case because this is a per-cpu lock, > * and we are only grabbing the lock for other CPUs. A softirq > * will only takes its local CPU lock thus, if we are preempted > * by a softirq, then it will grab the current CPU lock which > * we do not take here. > * > * Simply disable lockdep here until it can handle this situation. > */ > You mean duplicating this long comment in three files, or only once in Changelog ? My choice was to document the lockdep_off() use (very seldom used in kernel) in the Changelog. Hopefully, people messing with this code know about git ;) I agree I didnt document how netfilter locks work in this Changelog. And in original commit (24b36f019) I forgot to state that get_counters() is guarded by a mutex, so that no more than one cpu runs in get_counters(). What about following ? [PATCH] netfilter: {ip,ip6,arp}_tables: avoid lockdep false positive After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary), lockdep can raise a warning because we attempt to lock a spinlock with BH enabled, while the same lock is usually locked by another cpu in a softirq context. In this use case, the lockdep splat is a false positive, because the BH disabling only matters for one cpu and its associated. 1) We use one spinlock per cpu. 2) A softirq will only lock the lock associated to current cpu. 3) get_counters() disables sofirqs while fetching data of current cpu. (to avoid a deadlock if a softirq comes and try to lock same lock) 4) other locks are locked without blocking softirq (as a softirq will lock another lock) 5) get_counter() calls are serialized by a mutex. This to avoid a deadlock if two cpus were doing : CPU1 CPU2 lock lock#1 lock lock#2 copy data#1 copy data#2 unlock lock#1 unlock lock#2 lock#2 lock#1 softirq lock#1 softirq, attempt to lock lock#2 <deadlock> <deadlock> Use lockdep_off()/lockdep_on() around the problematic section to avoid the splat. Reported-by: Linus Torvalds <torvalds@linux-foundation.org> Diagnosed-by: David S. Miller <davem@davemloft.net> Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com> CC: Patrick McHardy <kaber@trash.net> --- net/ipv4/netfilter/arp_tables.c | 3 +++ net/ipv4/netfilter/ip_tables.c | 3 +++ net/ipv6/netfilter/ip6_tables.c | 3 +++ 3 files changed, 9 insertions(+) diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c index 6bccba3..b4f7ebf 100644 --- a/net/ipv4/netfilter/arp_tables.c +++ b/net/ipv4/netfilter/arp_tables.c @@ -729,8 +729,10 @@ static void get_counters(const struct xt_table_info *t, local_bh_enable(); /* Processing counters from other cpus, we can let bottom half enabled, * (preemption is disabled) + * We must turn off lockdep to avoid a false positive. */ + lockdep_off(); for_each_possible_cpu(cpu) { if (cpu == curcpu) continue; @@ -743,6 +745,7 @@ static void get_counters(const struct xt_table_info *t, } xt_info_wrunlock(cpu); } + lockdep_on(); put_cpu(); } diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c index c439721..dc5b2fd 100644 --- a/net/ipv4/netfilter/ip_tables.c +++ b/net/ipv4/netfilter/ip_tables.c @@ -903,8 +903,10 @@ get_counters(const struct xt_table_info *t, local_bh_enable(); /* Processing counters from other cpus, we can let bottom half enabled, * (preemption is disabled) + * We must turn off lockdep to avoid a false positive. */ + lockdep_off(); for_each_possible_cpu(cpu) { if (cpu == curcpu) continue; @@ -917,6 +919,7 @@ get_counters(const struct xt_table_info *t, } xt_info_wrunlock(cpu); } + lockdep_on(); put_cpu(); } diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c index 5359ef4..fb55443 100644 --- a/net/ipv6/netfilter/ip6_tables.c +++ b/net/ipv6/netfilter/ip6_tables.c @@ -916,8 +916,10 @@ get_counters(const struct xt_table_info *t, local_bh_enable(); /* Processing counters from other cpus, we can let bottom half enabled, * (preemption is disabled) + * We must turn off lockdep to avoid a false positive. */ + lockdep_off(); for_each_possible_cpu(cpu) { if (cpu == curcpu) continue; @@ -930,6 +932,7 @@ get_counters(const struct xt_table_info *t, } xt_info_wrunlock(cpu); } + lockdep_on(); put_cpu(); } ^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 17:55 ` Steven Rostedt 2010-08-16 18:16 ` Steven Rostedt @ 2010-08-16 19:44 ` David Miller 2010-08-16 20:04 ` Peter Zijlstra 1 sibling, 1 reply; 13+ messages in thread From: David Miller @ 2010-08-16 19:44 UTC (permalink / raw) To: rostedt; +Cc: eric.dumazet, netdev, linux-kernel, kaber, mingo, peterz From: Steven Rostedt <rostedt@goodmis.org> Date: Mon, 16 Aug 2010 13:55:01 -0400 > Please tell me what prevents an interrupt going off after we grab the > xt_info_wrlock(cpu) in get_counters(). He's only accessing the per-cpu counter locks of other cpus. The per-cpu lock is only locally accessed by a cpu in software interrupt context. That is why his transformation is legal. Lockdep simply hasn't been informed of this invariant and has to assume the worst. ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? 2010-08-16 19:44 ` David Miller @ 2010-08-16 20:04 ` Peter Zijlstra 0 siblings, 0 replies; 13+ messages in thread From: Peter Zijlstra @ 2010-08-16 20:04 UTC (permalink / raw) To: David Miller; +Cc: rostedt, eric.dumazet, netdev, linux-kernel, kaber, mingo On Mon, 2010-08-16 at 12:44 -0700, David Miller wrote: > He's only accessing the per-cpu counter locks of other cpus. > > The per-cpu lock is only locally accessed by a cpu in software > interrupt context. > > That is why his transformation is legal. > > Lockdep simply hasn't been informed of this invariant and has > to assume the worst. Something like the below will keep lockdep coverage, still going back to RCU sounds like the best option. --- include/linux/netfilter/x_tables.h | 9 +++++++++ 1 files changed, 9 insertions(+), 0 deletions(-) diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h index 24e5d01..a195feb 100644 --- a/include/linux/netfilter/x_tables.h +++ b/include/linux/netfilter/x_tables.h @@ -511,12 +511,21 @@ static inline void xt_info_rdunlock_bh(void) */ static inline void xt_info_wrlock(unsigned int cpu) { +#ifdef CONFIG_PROVE_LOCKING + /* + * XXX foo + */ + local_bh_disable(); +#endif spin_lock(&per_cpu(xt_info_locks, cpu).lock); } static inline void xt_info_wrunlock(unsigned int cpu) { spin_unlock(&per_cpu(xt_info_locks, cpu).lock); +#ifdef CONFIG_PROVE_LOCKING + local_bh_enable(); +#endif } /* ^ permalink raw reply related [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-08-16 20:17 UTC | newest] Thread overview: 13+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2010-08-16 17:07 [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? Steven Rostedt 2010-08-16 17:31 ` Eric Dumazet 2010-08-16 17:55 ` Steven Rostedt 2010-08-16 18:16 ` Steven Rostedt 2010-08-16 18:36 ` Peter Zijlstra 2010-08-16 18:48 ` Eric Dumazet 2010-08-16 19:16 ` Peter Zijlstra 2010-08-16 19:35 ` Eric Dumazet 2010-08-16 20:01 ` Peter Zijlstra 2010-08-16 20:17 ` Eric Dumazet 2010-08-16 18:48 ` Eric Dumazet 2010-08-16 19:44 ` David Miller 2010-08-16 20:04 ` Peter Zijlstra
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).