* [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
@ 2010-08-16 17:07 Steven Rostedt
2010-08-16 17:31 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2010-08-16 17:07 UTC (permalink / raw)
To: netdev; +Cc: LKML, David S. Miller, Eric Dumazet, Patrick McHardy
Hi, I hit this when booting 2.6.36-rc1:
=================================
[ INFO: inconsistent lock state ]
2.6.36-rc1 #2937
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes:
(&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
{SOFTIRQ-ON-W} state was registered at:
[<ffffffff8107b08e>] __lock_acquire+0x756/0x93c
[<ffffffff8107b374>] lock_acquire+0x100/0x12d
[<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
[<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables]
[<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables]
[<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables]
[<ffffffff813863a2>] nf_sockopt+0x5a/0x86
[<ffffffff813863e6>] nf_getsockopt+0x18/0x1a
[<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6]
[<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6]
[<ffffffff81355571>] sock_common_getsockopt+0x14/0x16
[<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b
[<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b
irq event stamp: 40
hardirqs last enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79
hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92
softirqs last enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce
softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
other info that might help us debug this:
3 locks held by ifup-eth/3288:
#0: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6
#1: (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6]
#2: (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119
stack backtrace:
Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937
Call Trace:
<IRQ> [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5
[<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c
[<ffffffff8106cd0c>] ? local_clock+0x40/0x59
[<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf
[<ffffffff81077de1>] mark_lock+0x2ea/0x51f
[<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c
[<ffffffff8106cd0c>] ? local_clock+0x40/0x59
[<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffff8107b374>] lock_acquire+0x100/0x12d
[<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffff81011149>] ? sched_clock+0x9/0xd
[<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
[<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
[<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e
[<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119
[<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter]
[<ffffffff81384f40>] nf_iterate+0x46/0x89
[<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
[<ffffffff8138501b>] nf_hook_slow+0x98/0x119
[<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
[<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6]
[<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6]
[<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb
[<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6]
[<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6
[<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6
[<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6]
[<ffffffff8105169e>] ? __do_softirq+0x79/0x247
[<ffffffff81051763>] __do_softirq+0x13e/0x247
[<ffffffff8100bc9c>] call_softirq+0x1c/0x30
[<ffffffff8100d32f>] do_softirq+0x4b/0xa3
[<ffffffff8105120a>] irq_exit+0x4a/0x95
[<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a
[<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20
<EOI> [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32
[<ffffffff81031fec>] kernel_map_pages+0x12c/0x142
[<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d
[<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165
[<ffffffff810ccb2b>] __free_pages+0x2b/0x34
[<ffffffff810ccb7d>] free_pages+0x49/0x4e
[<ffffffff81033871>] pgd_free+0x71/0x79
[<ffffffff81048b1b>] __mmdrop+0x27/0x54
[<ffffffff81042c62>] finish_task_switch+0xb4/0xe4
[<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4
[<ffffffff810096e7>] ? __switch_to+0x1a9/0x297
[<ffffffff81042df3>] schedule_tail+0x30/0xa7
[<ffffffff8100ac33>] ret_from_fork+0x13/0x80
I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with
other "get_counters()" functions do not block bottom halves anymore as
to this commit:
commit 24b36f0193467fa727b85b4c004016a8dae999b9
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon Aug 2 16:49:01 2010 +0200
netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
We currently disable BH for the whole duration of get_counters()
Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock
even has the following comment:
/*
* The "writer" side needs to get exclusive access to the lock,
* regardless of readers. This must be called with bottom half
* processing (and thus also preemption) disabled.
*/
static inline void xt_info_wrlock(unsigned int cpu)
As lockdep has proven, this is not satisfied.
-- Steve
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 17:07 [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? Steven Rostedt
@ 2010-08-16 17:31 ` Eric Dumazet
2010-08-16 17:55 ` Steven Rostedt
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 17:31 UTC (permalink / raw)
To: Steven Rostedt; +Cc: netdev, LKML, David S. Miller, Patrick McHardy
Le lundi 16 août 2010 à 13:07 -0400, Steven Rostedt a écrit :
> Hi, I hit this when booting 2.6.36-rc1:
>
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.36-rc1 #2937
> ---------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes:
> (&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> {SOFTIRQ-ON-W} state was registered at:
> [<ffffffff8107b08e>] __lock_acquire+0x756/0x93c
> [<ffffffff8107b374>] lock_acquire+0x100/0x12d
> [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
> [<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables]
> [<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables]
> [<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables]
> [<ffffffff813863a2>] nf_sockopt+0x5a/0x86
> [<ffffffff813863e6>] nf_getsockopt+0x18/0x1a
> [<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6]
> [<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6]
> [<ffffffff81355571>] sock_common_getsockopt+0x14/0x16
> [<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b
> [<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b
> irq event stamp: 40
> hardirqs last enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79
> hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92
> softirqs last enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce
> softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
>
> other info that might help us debug this:
> 3 locks held by ifup-eth/3288:
> #0: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6
> #1: (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6]
> #2: (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119
>
> stack backtrace:
> Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937
> Call Trace:
> <IRQ> [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5
> [<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c
> [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
> [<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf
> [<ffffffff81077de1>] mark_lock+0x2ea/0x51f
> [<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c
> [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
> [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> [<ffffffff8107b374>] lock_acquire+0x100/0x12d
> [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> [<ffffffff81011149>] ? sched_clock+0x9/0xd
> [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
> [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> [<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e
> [<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119
> [<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter]
> [<ffffffff81384f40>] nf_iterate+0x46/0x89
> [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
> [<ffffffff8138501b>] nf_hook_slow+0x98/0x119
> [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
> [<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6]
> [<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6]
> [<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb
> [<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6]
> [<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6
> [<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6
> [<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6]
> [<ffffffff8105169e>] ? __do_softirq+0x79/0x247
> [<ffffffff81051763>] __do_softirq+0x13e/0x247
> [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
> [<ffffffff8100d32f>] do_softirq+0x4b/0xa3
> [<ffffffff8105120a>] irq_exit+0x4a/0x95
> [<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a
> [<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20
> <EOI> [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32
> [<ffffffff81031fec>] kernel_map_pages+0x12c/0x142
> [<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d
> [<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165
> [<ffffffff810ccb2b>] __free_pages+0x2b/0x34
> [<ffffffff810ccb7d>] free_pages+0x49/0x4e
> [<ffffffff81033871>] pgd_free+0x71/0x79
> [<ffffffff81048b1b>] __mmdrop+0x27/0x54
> [<ffffffff81042c62>] finish_task_switch+0xb4/0xe4
> [<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4
> [<ffffffff810096e7>] ? __switch_to+0x1a9/0x297
> [<ffffffff81042df3>] schedule_tail+0x30/0xa7
> [<ffffffff8100ac33>] ret_from_fork+0x13/0x80
>
> I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with
> other "get_counters()" functions do not block bottom halves anymore as
> to this commit:
>
> commit 24b36f0193467fa727b85b4c004016a8dae999b9
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon Aug 2 16:49:01 2010 +0200
>
> netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
>
> We currently disable BH for the whole duration of get_counters()
>
> Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock
> even has the following comment:
>
> /*
> * The "writer" side needs to get exclusive access to the lock,
> * regardless of readers. This must be called with bottom half
> * processing (and thus also preemption) disabled.
> */
> static inline void xt_info_wrlock(unsigned int cpu)
>
>
> As lockdep has proven, this is not satisfied.
>
> -- Steve
>
>
This is a false positive, and a patch was sent yesterday
http://patchwork.ozlabs.org/patch/61750/
Thanks
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 17:31 ` Eric Dumazet
@ 2010-08-16 17:55 ` Steven Rostedt
2010-08-16 18:16 ` Steven Rostedt
2010-08-16 19:44 ` David Miller
0 siblings, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2010-08-16 17:55 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar,
Peter Zijlstra
On Mon, 2010-08-16 at 19:31 +0200, Eric Dumazet wrote:
> Le lundi 16 août 2010 à 13:07 -0400, Steven Rostedt a écrit :
> > Hi, I hit this when booting 2.6.36-rc1:
> >
> > =================================
> > [ INFO: inconsistent lock state ]
> > 2.6.36-rc1 #2937
> > ---------------------------------
> > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> > ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes:
> > (&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> > {SOFTIRQ-ON-W} state was registered at:
> > [<ffffffff8107b08e>] __lock_acquire+0x756/0x93c
> > [<ffffffff8107b374>] lock_acquire+0x100/0x12d
> > [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
> > [<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables]
> > [<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables]
> > [<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables]
> > [<ffffffff813863a2>] nf_sockopt+0x5a/0x86
> > [<ffffffff813863e6>] nf_getsockopt+0x18/0x1a
> > [<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6]
> > [<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6]
> > [<ffffffff81355571>] sock_common_getsockopt+0x14/0x16
> > [<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b
> > [<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b
> > irq event stamp: 40
> > hardirqs last enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79
> > hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92
> > softirqs last enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce
> > softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
> >
> > other info that might help us debug this:
> > 3 locks held by ifup-eth/3288:
> > #0: (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6
> > #1: (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6]
> > #2: (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119
> >
> > stack backtrace:
> > Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937
> > Call Trace:
> > <IRQ> [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5
> > [<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c
> > [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
> > [<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf
> > [<ffffffff81077de1>] mark_lock+0x2ea/0x51f
> > [<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c
> > [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
> > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> > [<ffffffff8107b374>] lock_acquire+0x100/0x12d
> > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> > [<ffffffff81011149>] ? sched_clock+0x9/0xd
> > [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
> > [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> > [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> > [<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e
> > [<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119
> > [<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter]
> > [<ffffffff81384f40>] nf_iterate+0x46/0x89
> > [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
> > [<ffffffff8138501b>] nf_hook_slow+0x98/0x119
> > [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
> > [<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6]
> > [<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6]
> > [<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb
> > [<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6]
> > [<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6
> > [<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6
> > [<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6]
> > [<ffffffff8105169e>] ? __do_softirq+0x79/0x247
> > [<ffffffff81051763>] __do_softirq+0x13e/0x247
> > [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
> > [<ffffffff8100d32f>] do_softirq+0x4b/0xa3
> > [<ffffffff8105120a>] irq_exit+0x4a/0x95
> > [<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a
> > [<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20
> > <EOI> [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32
> > [<ffffffff81031fec>] kernel_map_pages+0x12c/0x142
> > [<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d
> > [<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165
> > [<ffffffff810ccb2b>] __free_pages+0x2b/0x34
> > [<ffffffff810ccb7d>] free_pages+0x49/0x4e
> > [<ffffffff81033871>] pgd_free+0x71/0x79
> > [<ffffffff81048b1b>] __mmdrop+0x27/0x54
> > [<ffffffff81042c62>] finish_task_switch+0xb4/0xe4
> > [<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4
> > [<ffffffff810096e7>] ? __switch_to+0x1a9/0x297
> > [<ffffffff81042df3>] schedule_tail+0x30/0xa7
> > [<ffffffff8100ac33>] ret_from_fork+0x13/0x80
> >
> > I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with
> > other "get_counters()" functions do not block bottom halves anymore as
> > to this commit:
> >
> > commit 24b36f0193467fa727b85b4c004016a8dae999b9
> > Author: Eric Dumazet <eric.dumazet@gmail.com>
> > Date: Mon Aug 2 16:49:01 2010 +0200
> >
> > netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
> >
> > We currently disable BH for the whole duration of get_counters()
> >
> > Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock
> > even has the following comment:
> >
> > /*
> > * The "writer" side needs to get exclusive access to the lock,
> > * regardless of readers. This must be called with bottom half
> > * processing (and thus also preemption) disabled.
> > */
> > static inline void xt_info_wrlock(unsigned int cpu)
> >
> >
> > As lockdep has proven, this is not satisfied.
> >
> > -- Steve
> >
> >
>
>
> This is a false positive, and a patch was sent yesterday
>
> http://patchwork.ozlabs.org/patch/61750/
>
I still do not see how this is a false positive. Per-cpu locks do not
solve the issue.
Please tell me what prevents an interrupt going off after we grab the
xt_info_wrlock(cpu) in get_counters().
IOW, what prevents this:
get_counters() {
xt_info_wrlock(cpu);
<interrupt> --> softirq
xt_info_rblock_bh();
/* which grabs the writer lock */
DEADLOCK!!
-- Steve
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 17:55 ` Steven Rostedt
@ 2010-08-16 18:16 ` Steven Rostedt
2010-08-16 18:36 ` Peter Zijlstra
2010-08-16 18:48 ` Eric Dumazet
2010-08-16 19:44 ` David Miller
1 sibling, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2010-08-16 18:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar,
Peter Zijlstra
On Mon, 2010-08-16 at 13:55 -0400, Steven Rostedt wrote:
Re-looking at the code, I think I figured it out. But it should really
be documented better.
>
> Please tell me what prevents an interrupt going off after we grab the
> xt_info_wrlock(cpu) in get_counters().
>
> IOW, what prevents this:
>
> get_counters() {
I left out here:
for_each_possible_cpu(cpu) {
if (cpu == curcpu)
continue;
which means that we are grabbing the lock for other CPUs, and that a
softirq would not be a problem.
> xt_info_wrlock(cpu);
>
> <interrupt> --> softirq
>
> xt_info_rblock_bh();
> /* which grabs the writer lock */
> DEADLOCK!!
>
Your patchwork patch has:
@@ -729,8 +729,10 @@ static void get_counters(const struct
xt_table_info *t,
local_bh_enable();
/* Processing counters from other cpus, we can let bottom half enabled,
* (preemption is disabled)
+ * We must turn off lockdep to avoid a false positive.
*/
+ lockdep_off();
for_each_possible_cpu(cpu) {
We need a better comment than that. Could that be changed to something like:
/*
* lockdep tests if we grab a lock and can be preempted by
* a softirq and that softirq grabs the same lock causing a
* deadlock.
* This is a special case because this is a per-cpu lock,
* and we are only grabbing the lock for other CPUs. A softirq
* will only takes its local CPU lock thus, if we are preempted
* by a softirq, then it will grab the current CPU lock which
* we do not take here.
*
* Simply disable lockdep here until it can handle this situation.
*/
Thanks,
-- Steve
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 18:16 ` Steven Rostedt
@ 2010-08-16 18:36 ` Peter Zijlstra
2010-08-16 18:48 ` Eric Dumazet
2010-08-16 18:48 ` Eric Dumazet
1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 18:36 UTC (permalink / raw)
To: Steven Rostedt
Cc: Eric Dumazet, netdev, LKML, David S. Miller, Patrick McHardy,
Ingo Molnar
On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote:
> @@ -729,8 +729,10 @@ static void get_counters(const struct
> xt_table_info *t,
> local_bh_enable();
> /* Processing counters from other cpus, we can let bottom half
> enabled,
> * (preemption is disabled)
> + * We must turn off lockdep to avoid a false positive.
> */
>
> + lockdep_off();
> for_each_possible_cpu(cpu) {
>
nack!
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 18:16 ` Steven Rostedt
2010-08-16 18:36 ` Peter Zijlstra
@ 2010-08-16 18:48 ` Eric Dumazet
1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 18:48 UTC (permalink / raw)
To: Steven Rostedt
Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar,
Peter Zijlstra
Le lundi 16 août 2010 à 14:16 -0400, Steven Rostedt a écrit :
> We need a better comment than that. Could that be changed to something like:
>
> /*
> * lockdep tests if we grab a lock and can be preempted by
> * a softirq and that softirq grabs the same lock causing a
> * deadlock.
> * This is a special case because this is a per-cpu lock,
> * and we are only grabbing the lock for other CPUs. A softirq
> * will only takes its local CPU lock thus, if we are preempted
> * by a softirq, then it will grab the current CPU lock which
> * we do not take here.
> *
> * Simply disable lockdep here until it can handle this situation.
> */
>
You mean duplicating this long comment in three files, or only once in
Changelog ?
My choice was to document the lockdep_off() use (very seldom used in
kernel) in the Changelog. Hopefully, people messing with this code know
about git ;)
I agree I didnt document how netfilter locks work in this Changelog.
And in original commit (24b36f019) I forgot to state that get_counters()
is guarded by a mutex, so that no more than one cpu runs in
get_counters().
What about following ?
[PATCH] netfilter: {ip,ip6,arp}_tables: avoid lockdep false positive
After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block
bottom half more than necessary), lockdep can raise a warning
because we attempt to lock a spinlock with BH enabled, while
the same lock is usually locked by another cpu in a softirq context.
In this use case, the lockdep splat is a false positive, because
the BH disabling only matters for one cpu and its associated.
1) We use one spinlock per cpu.
2) A softirq will only lock the lock associated to current cpu.
3) get_counters() disables sofirqs while fetching data of current cpu.
(to avoid a deadlock if a softirq comes and try to lock same lock)
4) other locks are locked without blocking softirq
(as a softirq will lock another lock)
5) get_counter() calls are serialized by a mutex.
This to avoid a deadlock if two cpus were doing :
CPU1 CPU2
lock lock#1 lock lock#2
copy data#1 copy data#2
unlock lock#1 unlock lock#2
lock#2 lock#1
softirq lock#1 softirq, attempt to lock lock#2
<deadlock> <deadlock>
Use lockdep_off()/lockdep_on() around the problematic section to
avoid the splat.
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Diagnosed-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
---
net/ipv4/netfilter/arp_tables.c | 3 +++
net/ipv4/netfilter/ip_tables.c | 3 +++
net/ipv6/netfilter/ip6_tables.c | 3 +++
3 files changed, 9 insertions(+)
diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 6bccba3..b4f7ebf 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -729,8 +729,10 @@ static void get_counters(const struct xt_table_info *t,
local_bh_enable();
/* Processing counters from other cpus, we can let bottom half enabled,
* (preemption is disabled)
+ * We must turn off lockdep to avoid a false positive.
*/
+ lockdep_off();
for_each_possible_cpu(cpu) {
if (cpu == curcpu)
continue;
@@ -743,6 +745,7 @@ static void get_counters(const struct xt_table_info *t,
}
xt_info_wrunlock(cpu);
}
+ lockdep_on();
put_cpu();
}
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index c439721..dc5b2fd 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -903,8 +903,10 @@ get_counters(const struct xt_table_info *t,
local_bh_enable();
/* Processing counters from other cpus, we can let bottom half enabled,
* (preemption is disabled)
+ * We must turn off lockdep to avoid a false positive.
*/
+ lockdep_off();
for_each_possible_cpu(cpu) {
if (cpu == curcpu)
continue;
@@ -917,6 +919,7 @@ get_counters(const struct xt_table_info *t,
}
xt_info_wrunlock(cpu);
}
+ lockdep_on();
put_cpu();
}
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 5359ef4..fb55443 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -916,8 +916,10 @@ get_counters(const struct xt_table_info *t,
local_bh_enable();
/* Processing counters from other cpus, we can let bottom half enabled,
* (preemption is disabled)
+ * We must turn off lockdep to avoid a false positive.
*/
+ lockdep_off();
for_each_possible_cpu(cpu) {
if (cpu == curcpu)
continue;
@@ -930,6 +932,7 @@ get_counters(const struct xt_table_info *t,
}
xt_info_wrunlock(cpu);
}
+ lockdep_on();
put_cpu();
}
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 18:36 ` Peter Zijlstra
@ 2010-08-16 18:48 ` Eric Dumazet
2010-08-16 19:16 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 18:48 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
Ingo Molnar
Le lundi 16 août 2010 à 20:36 +0200, Peter Zijlstra a écrit :
> On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote:
> > @@ -729,8 +729,10 @@ static void get_counters(const struct
> > xt_table_info *t,
> > local_bh_enable();
> > /* Processing counters from other cpus, we can let bottom half
> > enabled,
> > * (preemption is disabled)
> > + * We must turn off lockdep to avoid a false positive.
> > */
> >
> > + lockdep_off();
> > for_each_possible_cpu(cpu) {
> >
> nack!
Interesting.
Care to elaborate ?
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 18:48 ` Eric Dumazet
@ 2010-08-16 19:16 ` Peter Zijlstra
2010-08-16 19:35 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 19:16 UTC (permalink / raw)
To: Eric Dumazet
Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
Ingo Molnar
On Mon, 2010-08-16 at 20:48 +0200, Eric Dumazet wrote:
> Le lundi 16 août 2010 à 20:36 +0200, Peter Zijlstra a écrit :
> > On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote:
> > > @@ -729,8 +729,10 @@ static void get_counters(const struct
> > > xt_table_info *t,
> > > local_bh_enable();
> > > /* Processing counters from other cpus, we can let bottom half
> > > enabled,
> > > * (preemption is disabled)
> > > + * We must turn off lockdep to avoid a false positive.
> > > */
> > >
> > > + lockdep_off();
> > > for_each_possible_cpu(cpu) {
> > >
> > nack!
>
>
> Interesting.
>
> Care to elaborate ?
Adding lockdep_off() is just plain wrong, if you cannot describe the
locking there's a fair chance its wrong anyway.
As it stands there's only a single lockdep_off(), and that lives in NTFS
it looks like it could be annotated differently, but then, who cares
about NTFS anyway ;-)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 19:16 ` Peter Zijlstra
@ 2010-08-16 19:35 ` Eric Dumazet
2010-08-16 20:01 ` Peter Zijlstra
0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 19:35 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
Ingo Molnar
Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit :
> Adding lockdep_off() is just plain wrong, if you cannot describe the
> locking there's a fair chance its wrong anyway.
>
I see.
I described the fine locking after Steven comment, adding a long
Changelog.
http://patchwork.ozlabs.org/patch/61827/
If someone thinks this locking is buggy, please speak now ;)
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 17:55 ` Steven Rostedt
2010-08-16 18:16 ` Steven Rostedt
@ 2010-08-16 19:44 ` David Miller
2010-08-16 20:04 ` Peter Zijlstra
1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2010-08-16 19:44 UTC (permalink / raw)
To: rostedt; +Cc: eric.dumazet, netdev, linux-kernel, kaber, mingo, peterz
From: Steven Rostedt <rostedt@goodmis.org>
Date: Mon, 16 Aug 2010 13:55:01 -0400
> Please tell me what prevents an interrupt going off after we grab the
> xt_info_wrlock(cpu) in get_counters().
He's only accessing the per-cpu counter locks of other cpus.
The per-cpu lock is only locally accessed by a cpu in software
interrupt context.
That is why his transformation is legal.
Lockdep simply hasn't been informed of this invariant and has
to assume the worst.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 19:35 ` Eric Dumazet
@ 2010-08-16 20:01 ` Peter Zijlstra
2010-08-16 20:17 ` Eric Dumazet
0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 20:01 UTC (permalink / raw)
To: Eric Dumazet
Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
Ingo Molnar
On Mon, 2010-08-16 at 21:35 +0200, Eric Dumazet wrote:
> Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit :
>
> > Adding lockdep_off() is just plain wrong, if you cannot describe the
> > locking there's a fair chance its wrong anyway.
> >
>
> I see.
>
> I described the fine locking after Steven comment, adding a long
> Changelog.
>
> http://patchwork.ozlabs.org/patch/61827/
>
> If someone thinks this locking is buggy, please speak now ;)
Urgh,.. I think it might be correct, but wtf! Wasn't this originally RCU
code, why not go back to using RCU now that we have
synchronize_rcu_expedited()?
As to the original issue, why not keep that bh stuff disabled for
CONFIG_PROVE_LOCKING instead, that will at least let you keep lock
coverage, adding lockdep_off() will hide any cycles that would involve
this lock (even though its currently a leaf lock, you never know what
creative things the future brings).
This fancy open coded lock looks like utter fail for -rt though.. please
use RCU if at all possible.
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 19:44 ` David Miller
@ 2010-08-16 20:04 ` Peter Zijlstra
0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 20:04 UTC (permalink / raw)
To: David Miller; +Cc: rostedt, eric.dumazet, netdev, linux-kernel, kaber, mingo
On Mon, 2010-08-16 at 12:44 -0700, David Miller wrote:
> He's only accessing the per-cpu counter locks of other cpus.
>
> The per-cpu lock is only locally accessed by a cpu in software
> interrupt context.
>
> That is why his transformation is legal.
>
> Lockdep simply hasn't been informed of this invariant and has
> to assume the worst.
Something like the below will keep lockdep coverage, still going back to
RCU sounds like the best option.
---
include/linux/netfilter/x_tables.h | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)
diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 24e5d01..a195feb 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -511,12 +511,21 @@ static inline void xt_info_rdunlock_bh(void)
*/
static inline void xt_info_wrlock(unsigned int cpu)
{
+#ifdef CONFIG_PROVE_LOCKING
+ /*
+ * XXX foo
+ */
+ local_bh_disable();
+#endif
spin_lock(&per_cpu(xt_info_locks, cpu).lock);
}
static inline void xt_info_wrunlock(unsigned int cpu)
{
spin_unlock(&per_cpu(xt_info_locks, cpu).lock);
+#ifdef CONFIG_PROVE_LOCKING
+ local_bh_enable();
+#endif
}
/*
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
2010-08-16 20:01 ` Peter Zijlstra
@ 2010-08-16 20:17 ` Eric Dumazet
0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 20:17 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
Ingo Molnar
Le lundi 16 août 2010 à 22:01 +0200, Peter Zijlstra a écrit :
> On Mon, 2010-08-16 at 21:35 +0200, Eric Dumazet wrote:
> > Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit :
> >
> > > Adding lockdep_off() is just plain wrong, if you cannot describe the
> > > locking there's a fair chance its wrong anyway.
> > >
> >
> > I see.
> >
> > I described the fine locking after Steven comment, adding a long
> > Changelog.
> >
> > http://patchwork.ozlabs.org/patch/61827/
> >
> > If someone thinks this locking is buggy, please speak now ;)
>
> Urgh,.. I think it might be correct, but wtf! Wasn't this originally RCU
> code, why not go back to using RCU now that we have
> synchronize_rcu_expedited()?
>
> As to the original issue, why not keep that bh stuff disabled for
> CONFIG_PROVE_LOCKING instead, that will at least let you keep lock
> coverage, adding lockdep_off() will hide any cycles that would involve
> this lock (even though its currently a leaf lock, you never know what
> creative things the future brings).
>
> This fancy open coded lock looks like utter fail for -rt though.. please
> use RCU if at all possible.
>
>
Please read history of why RCU failed in this context.
I believe I did most of RCU conversions in kernel, you dont need to
shout on me.
And its a bit late in 2.6.36 to even think about it.
I am happy that you volunteer for next RCU conversion, thanks Peter !
In the mean time, we just are going to disable BH again, I'll post a
patch.
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2010-08-16 20:17 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-16 17:07 [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? Steven Rostedt
2010-08-16 17:31 ` Eric Dumazet
2010-08-16 17:55 ` Steven Rostedt
2010-08-16 18:16 ` Steven Rostedt
2010-08-16 18:36 ` Peter Zijlstra
2010-08-16 18:48 ` Eric Dumazet
2010-08-16 19:16 ` Peter Zijlstra
2010-08-16 19:35 ` Eric Dumazet
2010-08-16 20:01 ` Peter Zijlstra
2010-08-16 20:17 ` Eric Dumazet
2010-08-16 18:48 ` Eric Dumazet
2010-08-16 19:44 ` David Miller
2010-08-16 20:04 ` Peter Zijlstra
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).