[LOCKDEP BUG][2.6.36-rc1] xt_info

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
@ 2010-08-16 17:07 Steven Rostedt
  2010-08-16 17:31 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Steven Rostedt @ 2010-08-16 17:07 UTC (permalink / raw)
  To: netdev; +Cc: LKML, David S. Miller, Eric Dumazet, Patrick McHardy

Hi, I hit this when booting 2.6.36-rc1:

=================================
[ INFO: inconsistent lock state ]
2.6.36-rc1 #2937
---------------------------------
inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes:
 (&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
{SOFTIRQ-ON-W} state was registered at:
  [<ffffffff8107b08e>] __lock_acquire+0x756/0x93c
  [<ffffffff8107b374>] lock_acquire+0x100/0x12d
  [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
  [<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables]
  [<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables]
  [<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables]
  [<ffffffff813863a2>] nf_sockopt+0x5a/0x86
  [<ffffffff813863e6>] nf_getsockopt+0x18/0x1a
  [<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6]
  [<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6]
  [<ffffffff81355571>] sock_common_getsockopt+0x14/0x16
  [<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b
  [<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b
irq event stamp: 40
hardirqs last  enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79
hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92
softirqs last  enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce
softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30

other info that might help us debug this:
3 locks held by ifup-eth/3288:
 #0:  (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6
 #1:  (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6]
 #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119

stack backtrace:
Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937
Call Trace:
 <IRQ>  [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5
 [<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c
 [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
 [<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf
 [<ffffffff81077de1>] mark_lock+0x2ea/0x51f
 [<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c
 [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
 [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
 [<ffffffff8107b374>] lock_acquire+0x100/0x12d
 [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
 [<ffffffff81011149>] ? sched_clock+0x9/0xd
 [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
 [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
 [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
 [<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e
 [<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119
 [<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter]
 [<ffffffff81384f40>] nf_iterate+0x46/0x89
 [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
 [<ffffffff8138501b>] nf_hook_slow+0x98/0x119
 [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
 [<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6]
 [<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6]
 [<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb
 [<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6]
 [<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6
 [<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6
 [<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6]
 [<ffffffff8105169e>] ? __do_softirq+0x79/0x247
 [<ffffffff81051763>] __do_softirq+0x13e/0x247
 [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
 [<ffffffff8100d32f>] do_softirq+0x4b/0xa3
 [<ffffffff8105120a>] irq_exit+0x4a/0x95
 [<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a
 [<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20
 <EOI>  [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32
 [<ffffffff81031fec>] kernel_map_pages+0x12c/0x142
 [<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d
 [<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165
 [<ffffffff810ccb2b>] __free_pages+0x2b/0x34
 [<ffffffff810ccb7d>] free_pages+0x49/0x4e
 [<ffffffff81033871>] pgd_free+0x71/0x79
 [<ffffffff81048b1b>] __mmdrop+0x27/0x54
 [<ffffffff81042c62>] finish_task_switch+0xb4/0xe4
 [<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4
 [<ffffffff810096e7>] ? __switch_to+0x1a9/0x297
 [<ffffffff81042df3>] schedule_tail+0x30/0xa7
 [<ffffffff8100ac33>] ret_from_fork+0x13/0x80

I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with
other "get_counters()" functions do not block bottom halves anymore as
to this commit:

commit 24b36f0193467fa727b85b4c004016a8dae999b9
Author: Eric Dumazet <eric.dumazet@gmail.com>
Date:   Mon Aug 2 16:49:01 2010 +0200

    netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
    
    We currently disable BH for the whole duration of get_counters()

Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock
even has the following comment:

/*
 * The "writer" side needs to get exclusive access to the lock,
 * regardless of readers.  This must be called with bottom half
 * processing (and thus also preemption) disabled.
 */
static inline void xt_info_wrlock(unsigned int cpu)


As lockdep has proven, this is not satisfied.

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 17:07 [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? Steven Rostedt
@ 2010-08-16 17:31 ` Eric Dumazet
  2010-08-16 17:55   ` Steven Rostedt
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 17:31 UTC (permalink / raw)
  To: Steven Rostedt; +Cc: netdev, LKML, David S. Miller, Patrick McHardy

Le lundi 16 août 2010 à 13:07 -0400, Steven Rostedt a écrit :
> Hi, I hit this when booting 2.6.36-rc1:
> 
> =================================
> [ INFO: inconsistent lock state ]
> 2.6.36-rc1 #2937
> ---------------------------------
> inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes:
>  (&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> {SOFTIRQ-ON-W} state was registered at:
>   [<ffffffff8107b08e>] __lock_acquire+0x756/0x93c
>   [<ffffffff8107b374>] lock_acquire+0x100/0x12d
>   [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
>   [<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables]
>   [<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables]
>   [<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables]
>   [<ffffffff813863a2>] nf_sockopt+0x5a/0x86
>   [<ffffffff813863e6>] nf_getsockopt+0x18/0x1a
>   [<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6]
>   [<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6]
>   [<ffffffff81355571>] sock_common_getsockopt+0x14/0x16
>   [<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b
>   [<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b
> irq event stamp: 40
> hardirqs last  enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79
> hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92
> softirqs last  enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce
> softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
> 
> other info that might help us debug this:
> 3 locks held by ifup-eth/3288:
>  #0:  (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6
>  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6]
>  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119
> 
> stack backtrace:
> Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937
> Call Trace:
>  <IRQ>  [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5
>  [<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c
>  [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
>  [<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf
>  [<ffffffff81077de1>] mark_lock+0x2ea/0x51f
>  [<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c
>  [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
>  [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
>  [<ffffffff8107b374>] lock_acquire+0x100/0x12d
>  [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
>  [<ffffffff81011149>] ? sched_clock+0x9/0xd
>  [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
>  [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
>  [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
>  [<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e
>  [<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119
>  [<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter]
>  [<ffffffff81384f40>] nf_iterate+0x46/0x89
>  [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
>  [<ffffffff8138501b>] nf_hook_slow+0x98/0x119
>  [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
>  [<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6]
>  [<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6]
>  [<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb
>  [<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6]
>  [<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6
>  [<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6
>  [<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6]
>  [<ffffffff8105169e>] ? __do_softirq+0x79/0x247
>  [<ffffffff81051763>] __do_softirq+0x13e/0x247
>  [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
>  [<ffffffff8100d32f>] do_softirq+0x4b/0xa3
>  [<ffffffff8105120a>] irq_exit+0x4a/0x95
>  [<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a
>  [<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20
>  <EOI>  [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32
>  [<ffffffff81031fec>] kernel_map_pages+0x12c/0x142
>  [<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d
>  [<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165
>  [<ffffffff810ccb2b>] __free_pages+0x2b/0x34
>  [<ffffffff810ccb7d>] free_pages+0x49/0x4e
>  [<ffffffff81033871>] pgd_free+0x71/0x79
>  [<ffffffff81048b1b>] __mmdrop+0x27/0x54
>  [<ffffffff81042c62>] finish_task_switch+0xb4/0xe4
>  [<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4
>  [<ffffffff810096e7>] ? __switch_to+0x1a9/0x297
>  [<ffffffff81042df3>] schedule_tail+0x30/0xa7
>  [<ffffffff8100ac33>] ret_from_fork+0x13/0x80
> 
> I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with
> other "get_counters()" functions do not block bottom halves anymore as
> to this commit:
> 
> commit 24b36f0193467fa727b85b4c004016a8dae999b9
> Author: Eric Dumazet <eric.dumazet@gmail.com>
> Date:   Mon Aug 2 16:49:01 2010 +0200
> 
>     netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
>     
>     We currently disable BH for the whole duration of get_counters()
> 
> Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock
> even has the following comment:
> 
> /*
>  * The "writer" side needs to get exclusive access to the lock,
>  * regardless of readers.  This must be called with bottom half
>  * processing (and thus also preemption) disabled.
>  */
> static inline void xt_info_wrlock(unsigned int cpu)
> 
> 
> As lockdep has proven, this is not satisfied.
> 
> -- Steve
> 
> 


This is a false positive, and a patch was sent yesterday

http://patchwork.ozlabs.org/patch/61750/

Thanks

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 17:31 ` Eric Dumazet
@ 2010-08-16 17:55   ` Steven Rostedt
  2010-08-16 18:16     ` Steven Rostedt
  2010-08-16 19:44     ` David Miller
  0 siblings, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2010-08-16 17:55 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar,
	Peter Zijlstra

On Mon, 2010-08-16 at 19:31 +0200, Eric Dumazet wrote:
> Le lundi 16 août 2010 à 13:07 -0400, Steven Rostedt a écrit :
> > Hi, I hit this when booting 2.6.36-rc1:
> > 
> > =================================
> > [ INFO: inconsistent lock state ]
> > 2.6.36-rc1 #2937
> > ---------------------------------
> > inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage.
> > ifup-eth/3288 [HC0[0]:SC1[2]:HE1:SE0] takes:
> >  (&(&lock->lock)->rlock){+.?...}, at: [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> > {SOFTIRQ-ON-W} state was registered at:
> >   [<ffffffff8107b08e>] __lock_acquire+0x756/0x93c
> >   [<ffffffff8107b374>] lock_acquire+0x100/0x12d
> >   [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
> >   [<ffffffffa01664b1>] get_counters+0xb2/0x168 [ip6_tables]
> >   [<ffffffffa01665a3>] alloc_counters+0x3c/0x47 [ip6_tables]
> >   [<ffffffffa0167a7b>] do_ip6t_get_ctl+0x10c/0x363 [ip6_tables]
> >   [<ffffffff813863a2>] nf_sockopt+0x5a/0x86
> >   [<ffffffff813863e6>] nf_getsockopt+0x18/0x1a
> >   [<ffffffffa034c1ff>] ipv6_getsockopt+0x84/0xba [ipv6]
> >   [<ffffffffa0353289>] rawv6_getsockopt+0x42/0x4b [ipv6]
> >   [<ffffffff81355571>] sock_common_getsockopt+0x14/0x16
> >   [<ffffffff813525bb>] sys_getsockopt+0x7a/0x9b
> >   [<ffffffff8100ad32>] system_call_fastpath+0x16/0x1b
> > irq event stamp: 40
> > hardirqs last  enabled at (40): [<ffffffff813f5ad5>] _raw_spin_unlock_irqrestore+0x47/0x79
> > hardirqs last disabled at (39): [<ffffffff813f5036>] _raw_spin_lock_irqsave+0x2b/0x92
> > softirqs last  enabled at (0): [<ffffffff8104975a>] copy_process+0x40e/0x11ce
> > softirqs last disabled at (9): [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
> > 
> > other info that might help us debug this:
> > 3 locks held by ifup-eth/3288:
> >  #0:  (&idev->mc_ifc_timer){+.-...}, at: [<ffffffff8105841c>] run_timer_softirq+0x1f5/0x3e6
> >  #1:  (rcu_read_lock){.+.+..}, at: [<ffffffffa03578ca>] mld_sendpack+0x0/0x3ab [ipv6]
> >  #2:  (rcu_read_lock){.+.+..}, at: [<ffffffff81384f83>] nf_hook_slow+0x0/0x119
> > 
> > stack backtrace:
> > Pid: 3288, comm: ifup-eth Not tainted 2.6.36-rc1 #2937
> > Call Trace:
> >  <IRQ>  [<ffffffff81077ae6>] print_usage_bug+0x1a4/0x1b5
> >  [<ffffffff810164fa>] ? save_stack_trace+0x2f/0x4c
> >  [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
> >  [<ffffffff810786b6>] ? check_usage_forwards+0x0/0xcf
> >  [<ffffffff81077de1>] mark_lock+0x2ea/0x51f
> >  [<ffffffff8107b014>] __lock_acquire+0x6dc/0x93c
> >  [<ffffffff8106cd0c>] ? local_clock+0x40/0x59
> >  [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> >  [<ffffffff8107b374>] lock_acquire+0x100/0x12d
> >  [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> >  [<ffffffff81011149>] ? sched_clock+0x9/0xd
> >  [<ffffffff813f4ec3>] _raw_spin_lock+0x40/0x73
> >  [<ffffffffa0166eef>] ? ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> >  [<ffffffffa0166eef>] ip6t_do_table+0x8a/0x3f1 [ip6_tables]
> >  [<ffffffff810771db>] ? trace_hardirqs_off_caller+0x1f/0x9e
> >  [<ffffffff81384f83>] ? nf_hook_slow+0x0/0x119
> >  [<ffffffffa010601c>] ip6table_filter_hook+0x1c/0x20 [ip6table_filter]
> >  [<ffffffff81384f40>] nf_iterate+0x46/0x89
> >  [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
> >  [<ffffffff8138501b>] nf_hook_slow+0x98/0x119
> >  [<ffffffffa035627b>] ? dst_output+0x0/0x5c [ipv6]
> >  [<ffffffffa0349557>] ? icmp6_dst_alloc+0x0/0x1b2 [ipv6]
> >  [<ffffffffa0357b01>] mld_sendpack+0x237/0x3ab [ipv6]
> >  [<ffffffff81051475>] ? local_bh_enable_ip+0xc7/0xeb
> >  [<ffffffffa0358390>] mld_ifc_timer_expire+0x254/0x28d [ipv6]
> >  [<ffffffff810584ed>] run_timer_softirq+0x2c6/0x3e6
> >  [<ffffffff8105841c>] ? run_timer_softirq+0x1f5/0x3e6
> >  [<ffffffffa035813c>] ? mld_ifc_timer_expire+0x0/0x28d [ipv6]
> >  [<ffffffff8105169e>] ? __do_softirq+0x79/0x247
> >  [<ffffffff81051763>] __do_softirq+0x13e/0x247
> >  [<ffffffff8100bc9c>] call_softirq+0x1c/0x30
> >  [<ffffffff8100d32f>] do_softirq+0x4b/0xa3
> >  [<ffffffff8105120a>] irq_exit+0x4a/0x95
> >  [<ffffffff813fc185>] smp_apic_timer_interrupt+0x8c/0x9a
> >  [<ffffffff8100b753>] apic_timer_interrupt+0x13/0x20
> >  <EOI>  [<ffffffff8102c72a>] ? native_flush_tlb_global+0x2b/0x32
> >  [<ffffffff81031fec>] kernel_map_pages+0x12c/0x142
> >  [<ffffffff810cc89a>] free_pages_prepare+0x14c/0x15d
> >  [<ffffffff810cc9c8>] free_hot_cold_page+0x2d/0x165
> >  [<ffffffff810ccb2b>] __free_pages+0x2b/0x34
> >  [<ffffffff810ccb7d>] free_pages+0x49/0x4e
> >  [<ffffffff81033871>] pgd_free+0x71/0x79
> >  [<ffffffff81048b1b>] __mmdrop+0x27/0x54
> >  [<ffffffff81042c62>] finish_task_switch+0xb4/0xe4
> >  [<ffffffff81042bae>] ? finish_task_switch+0x0/0xe4
> >  [<ffffffff810096e7>] ? __switch_to+0x1a9/0x297
> >  [<ffffffff81042df3>] schedule_tail+0x30/0xa7
> >  [<ffffffff8100ac33>] ret_from_fork+0x13/0x80
> > 
> > I noticed in net/ipv6/netfilter/ip6_tables.c in get_counters() as with
> > other "get_counters()" functions do not block bottom halves anymore as
> > to this commit:
> > 
> > commit 24b36f0193467fa727b85b4c004016a8dae999b9
> > Author: Eric Dumazet <eric.dumazet@gmail.com>
> > Date:   Mon Aug 2 16:49:01 2010 +0200
> > 
> >     netfilter: {ip,ip6,arp}_tables: dont block bottom half more than necessary
> >     
> >     We currently disable BH for the whole duration of get_counters()
> > 
> > Now we take xt_info_wrlock(cpu) lock out of BH disabling. And that lock
> > even has the following comment:
> > 
> > /*
> >  * The "writer" side needs to get exclusive access to the lock,
> >  * regardless of readers.  This must be called with bottom half
> >  * processing (and thus also preemption) disabled.
> >  */
> > static inline void xt_info_wrlock(unsigned int cpu)
> > 
> > 
> > As lockdep has proven, this is not satisfied.
> > 
> > -- Steve
> > 
> > 
> 
> 
> This is a false positive, and a patch was sent yesterday
> 
> http://patchwork.ozlabs.org/patch/61750/
> 

I still do not see how this is a false positive. Per-cpu locks do not
solve the issue.


Please tell me what prevents an interrupt going off after we grab the
xt_info_wrlock(cpu) in get_counters().

IOW, what prevents this:

	get_counters() {
		xt_info_wrlock(cpu);

		<interrupt> --> softirq

			xt_info_rblock_bh();
				/* which grabs the writer lock */
			DEADLOCK!!


-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 17:55   ` Steven Rostedt
@ 2010-08-16 18:16     ` Steven Rostedt
  2010-08-16 18:36       ` Peter Zijlstra
  2010-08-16 18:48       ` Eric Dumazet
  2010-08-16 19:44     ` David Miller
  1 sibling, 2 replies; 13+ messages in thread
From: Steven Rostedt @ 2010-08-16 18:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar,
	Peter Zijlstra

On Mon, 2010-08-16 at 13:55 -0400, Steven Rostedt wrote:

Re-looking at the code, I think I figured it out. But it should really
be documented better.


> 
> Please tell me what prevents an interrupt going off after we grab the
> xt_info_wrlock(cpu) in get_counters().
> 
> IOW, what prevents this:
> 
> 	get_counters() {

I left out here:

		for_each_possible_cpu(cpu) {
			if (cpu == curcpu)
				continue;

which means that we are grabbing the lock for other CPUs, and that a
softirq would not be a problem.

> 		xt_info_wrlock(cpu);
> 
> 		<interrupt> --> softirq
> 
> 			xt_info_rblock_bh();
> 				/* which grabs the writer lock */
> 			DEADLOCK!!
> 

Your patchwork patch has:


@@ -729,8 +729,10 @@  static void get_counters(const struct
xt_table_info *t,
 	local_bh_enable();
 	/* Processing counters from other cpus, we can let bottom half enabled,
 	 * (preemption is disabled)
+	 * We must turn off lockdep to avoid a false positive.
 	 */
 
+	lockdep_off();
 	for_each_possible_cpu(cpu) {


We need a better comment than that. Could that be changed to something like:

	/*
	 * lockdep tests if we grab a lock and can be preempted by 
	 * a softirq and that softirq grabs the same lock causing a
	 * deadlock.
	 * This is a special case because this is a per-cpu lock,
	 * and we are only grabbing the lock for other CPUs. A softirq
	 * will only takes its local CPU lock thus, if we are preempted
	 * by a softirq, then it will grab the current CPU lock which
	 * we do not take here.
	 *
	 * Simply disable lockdep here until it can handle this situation.
	 */

Thanks,

-- Steve

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 18:16     ` Steven Rostedt
@ 2010-08-16 18:36       ` Peter Zijlstra
  2010-08-16 18:48         ` Eric Dumazet
  2010-08-16 18:48       ` Eric Dumazet
  1 sibling, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 18:36 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: Eric Dumazet, netdev, LKML, David S. Miller, Patrick McHardy,
	Ingo Molnar

On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote:
> @@ -729,8 +729,10 @@  static void get_counters(const struct
> xt_table_info *t,
>         local_bh_enable();
>         /* Processing counters from other cpus, we can let bottom half
> enabled,
>          * (preemption is disabled)
> +        * We must turn off lockdep to avoid a false positive.
>          */
>  
> +       lockdep_off();
>         for_each_possible_cpu(cpu) {
> 
nack!

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 18:36       ` Peter Zijlstra
@ 2010-08-16 18:48         ` Eric Dumazet
  2010-08-16 19:16           ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 18:48 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
	Ingo Molnar

Le lundi 16 août 2010 à 20:36 +0200, Peter Zijlstra a écrit :
> On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote:
> > @@ -729,8 +729,10 @@  static void get_counters(const struct
> > xt_table_info *t,
> >         local_bh_enable();
> >         /* Processing counters from other cpus, we can let bottom half
> > enabled,
> >          * (preemption is disabled)
> > +        * We must turn off lockdep to avoid a false positive.
> >          */
> >  
> > +       lockdep_off();
> >         for_each_possible_cpu(cpu) {
> > 
> nack!


Interesting.

Care to elaborate ?




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 18:48         ` Eric Dumazet
@ 2010-08-16 19:16           ` Peter Zijlstra
  2010-08-16 19:35             ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 19:16 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
	Ingo Molnar

On Mon, 2010-08-16 at 20:48 +0200, Eric Dumazet wrote:
> Le lundi 16 août 2010 à 20:36 +0200, Peter Zijlstra a écrit :
> > On Mon, 2010-08-16 at 14:16 -0400, Steven Rostedt wrote:
> > > @@ -729,8 +729,10 @@  static void get_counters(const struct
> > > xt_table_info *t,
> > >         local_bh_enable();
> > >         /* Processing counters from other cpus, we can let bottom half
> > > enabled,
> > >          * (preemption is disabled)
> > > +        * We must turn off lockdep to avoid a false positive.
> > >          */
> > >  
> > > +       lockdep_off();
> > >         for_each_possible_cpu(cpu) {
> > > 
> > nack!
> 
> 
> Interesting.
> 
> Care to elaborate ?

Adding lockdep_off() is just plain wrong, if you cannot describe the
locking there's a fair chance its wrong anyway.

As it stands there's only a single lockdep_off(), and that lives in NTFS
it looks like it could be annotated differently, but then, who cares
about NTFS anyway ;-)

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 19:16           ` Peter Zijlstra
@ 2010-08-16 19:35             ` Eric Dumazet
  2010-08-16 20:01               ` Peter Zijlstra
  0 siblings, 1 reply; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 19:35 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
	Ingo Molnar

Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit :

> Adding lockdep_off() is just plain wrong, if you cannot describe the
> locking there's a fair chance its wrong anyway.
> 

I see.

I described the fine locking after Steven comment, adding a long
Changelog.

http://patchwork.ozlabs.org/patch/61827/

If someone thinks this locking is buggy, please speak now ;)




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 19:35             ` Eric Dumazet
@ 2010-08-16 20:01               ` Peter Zijlstra
  2010-08-16 20:17                 ` Eric Dumazet
  0 siblings, 1 reply; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 20:01 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
	Ingo Molnar

On Mon, 2010-08-16 at 21:35 +0200, Eric Dumazet wrote:
> Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit :
> 
> > Adding lockdep_off() is just plain wrong, if you cannot describe the
> > locking there's a fair chance its wrong anyway.
> > 
> 
> I see.
> 
> I described the fine locking after Steven comment, adding a long
> Changelog.
> 
> http://patchwork.ozlabs.org/patch/61827/
> 
> If someone thinks this locking is buggy, please speak now ;)

Urgh,.. I think it might be correct, but wtf! Wasn't this originally RCU
code, why not go back to using RCU now that we have
synchronize_rcu_expedited()?

As to the original issue, why not keep that bh stuff disabled for
CONFIG_PROVE_LOCKING instead, that will at least let you keep lock
coverage, adding lockdep_off() will hide any cycles that would involve
this lock (even though its currently a leaf lock, you never know what
creative things the future brings).

This fancy open coded lock looks like utter fail for -rt though.. please
use RCU if at all possible.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 20:01               ` Peter Zijlstra
@ 2010-08-16 20:17                 ` Eric Dumazet
  0 siblings, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 20:17 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Steven Rostedt, netdev, LKML, David S. Miller, Patrick McHardy,
	Ingo Molnar

Le lundi 16 août 2010 à 22:01 +0200, Peter Zijlstra a écrit :
> On Mon, 2010-08-16 at 21:35 +0200, Eric Dumazet wrote:
> > Le lundi 16 août 2010 à 21:16 +0200, Peter Zijlstra a écrit :
> > 
> > > Adding lockdep_off() is just plain wrong, if you cannot describe the
> > > locking there's a fair chance its wrong anyway.
> > > 
> > 
> > I see.
> > 
> > I described the fine locking after Steven comment, adding a long
> > Changelog.
> > 
> > http://patchwork.ozlabs.org/patch/61827/
> > 
> > If someone thinks this locking is buggy, please speak now ;)
> 
> Urgh,.. I think it might be correct, but wtf! Wasn't this originally RCU
> code, why not go back to using RCU now that we have
> synchronize_rcu_expedited()?
> 
> As to the original issue, why not keep that bh stuff disabled for
> CONFIG_PROVE_LOCKING instead, that will at least let you keep lock
> coverage, adding lockdep_off() will hide any cycles that would involve
> this lock (even though its currently a leaf lock, you never know what
> creative things the future brings).
> 
> This fancy open coded lock looks like utter fail for -rt though.. please
> use RCU if at all possible.
> 
> 

Please read history of why RCU failed in this context.

I believe I did most of RCU conversions in kernel, you dont need to
shout on me.

And its a bit late in 2.6.36 to even think about it.

I am happy that you volunteer for next RCU conversion, thanks Peter !

In the mean time, we just are going to disable BH again, I'll post a
patch.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 18:16     ` Steven Rostedt
  2010-08-16 18:36       ` Peter Zijlstra
@ 2010-08-16 18:48       ` Eric Dumazet
  1 sibling, 0 replies; 13+ messages in thread
From: Eric Dumazet @ 2010-08-16 18:48 UTC (permalink / raw)
  To: Steven Rostedt
  Cc: netdev, LKML, David S. Miller, Patrick McHardy, Ingo Molnar,
	Peter Zijlstra

Le lundi 16 août 2010 à 14:16 -0400, Steven Rostedt a écrit :

> We need a better comment than that. Could that be changed to something like:
> 
> 	/*
> 	 * lockdep tests if we grab a lock and can be preempted by 
> 	 * a softirq and that softirq grabs the same lock causing a
> 	 * deadlock.
> 	 * This is a special case because this is a per-cpu lock,
> 	 * and we are only grabbing the lock for other CPUs. A softirq
> 	 * will only takes its local CPU lock thus, if we are preempted
> 	 * by a softirq, then it will grab the current CPU lock which
> 	 * we do not take here.
> 	 *
> 	 * Simply disable lockdep here until it can handle this situation.
> 	 */
> 

You mean duplicating this long comment in three files, or only once in
Changelog ?

My choice was to document the lockdep_off() use (very seldom used in
kernel) in the Changelog. Hopefully, people messing with this code know
about git ;)

I agree I didnt document how netfilter locks work in this Changelog.

And in original commit (24b36f019) I forgot to state that get_counters()
is guarded by a mutex, so that no more than one cpu runs in
get_counters().

What about following ?

[PATCH] netfilter: {ip,ip6,arp}_tables: avoid lockdep false positive

After commit 24b36f019 (netfilter: {ip,ip6,arp}_tables: dont block
bottom half more than necessary), lockdep can raise a warning
because we attempt to lock a spinlock with BH enabled, while
the same lock is usually locked by another cpu in a softirq context.

In this use case, the lockdep splat is a false positive, because
the BH disabling only matters for one cpu and its associated.

1) We use one spinlock per cpu.

2) A softirq will only lock the lock associated to current cpu.

3) get_counters() disables sofirqs while fetching data of current cpu.
   (to avoid a deadlock if a softirq comes and try to lock same lock)

4) other locks are locked without blocking softirq
   (as a softirq will lock another lock)

5) get_counter() calls are serialized by a mutex.

   This to avoid a deadlock if two cpus were doing :

   CPU1          CPU2
lock lock#1      lock lock#2
copy data#1      copy data#2
unlock lock#1    unlock lock#2

lock#2           lock#1
softirq lock#1   softirq, attempt to lock lock#2
<deadlock>       <deadlock>

Use lockdep_off()/lockdep_on() around the problematic section to
avoid the splat.

Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Diagnosed-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
CC: Patrick McHardy <kaber@trash.net>
---
 net/ipv4/netfilter/arp_tables.c |    3 +++
 net/ipv4/netfilter/ip_tables.c  |    3 +++
 net/ipv6/netfilter/ip6_tables.c |    3 +++
 3 files changed, 9 insertions(+)


diff --git a/net/ipv4/netfilter/arp_tables.c b/net/ipv4/netfilter/arp_tables.c
index 6bccba3..b4f7ebf 100644
--- a/net/ipv4/netfilter/arp_tables.c
+++ b/net/ipv4/netfilter/arp_tables.c
@@ -729,8 +729,10 @@ static void get_counters(const struct xt_table_info *t,
 	local_bh_enable();
 	/* Processing counters from other cpus, we can let bottom half enabled,
 	 * (preemption is disabled)
+	 * We must turn off lockdep to avoid a false positive.
 	 */
 
+	lockdep_off();
 	for_each_possible_cpu(cpu) {
 		if (cpu == curcpu)
 			continue;
@@ -743,6 +745,7 @@ static void get_counters(const struct xt_table_info *t,
 		}
 		xt_info_wrunlock(cpu);
 	}
+	lockdep_on();
 	put_cpu();
 }
 
diff --git a/net/ipv4/netfilter/ip_tables.c b/net/ipv4/netfilter/ip_tables.c
index c439721..dc5b2fd 100644
--- a/net/ipv4/netfilter/ip_tables.c
+++ b/net/ipv4/netfilter/ip_tables.c
@@ -903,8 +903,10 @@ get_counters(const struct xt_table_info *t,
 	local_bh_enable();
 	/* Processing counters from other cpus, we can let bottom half enabled,
 	 * (preemption is disabled)
+	 * We must turn off lockdep to avoid a false positive.
 	 */
 
+	lockdep_off();
 	for_each_possible_cpu(cpu) {
 		if (cpu == curcpu)
 			continue;
@@ -917,6 +919,7 @@ get_counters(const struct xt_table_info *t,
 		}
 		xt_info_wrunlock(cpu);
 	}
+	lockdep_on();
 	put_cpu();
 }
 
diff --git a/net/ipv6/netfilter/ip6_tables.c b/net/ipv6/netfilter/ip6_tables.c
index 5359ef4..fb55443 100644
--- a/net/ipv6/netfilter/ip6_tables.c
+++ b/net/ipv6/netfilter/ip6_tables.c
@@ -916,8 +916,10 @@ get_counters(const struct xt_table_info *t,
 	local_bh_enable();
 	/* Processing counters from other cpus, we can let bottom half enabled,
 	 * (preemption is disabled)
+	 * We must turn off lockdep to avoid a false positive.
 	 */
 
+	lockdep_off();
 	for_each_possible_cpu(cpu) {
 		if (cpu == curcpu)
 			continue;
@@ -930,6 +932,7 @@ get_counters(const struct xt_table_info *t,
 		}
 		xt_info_wrunlock(cpu);
 	}
+	lockdep_on();
 	put_cpu();
 }
 

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 17:55   ` Steven Rostedt
  2010-08-16 18:16     ` Steven Rostedt
@ 2010-08-16 19:44     ` David Miller
  2010-08-16 20:04       ` Peter Zijlstra
  1 sibling, 1 reply; 13+ messages in thread
From: David Miller @ 2010-08-16 19:44 UTC (permalink / raw)
  To: rostedt; +Cc: eric.dumazet, netdev, linux-kernel, kaber, mingo, peterz

From: Steven Rostedt <rostedt@goodmis.org>
Date: Mon, 16 Aug 2010 13:55:01 -0400

> Please tell me what prevents an interrupt going off after we grab the
> xt_info_wrlock(cpu) in get_counters().

He's only accessing the per-cpu counter locks of other cpus.

The per-cpu lock is only locally accessed by a cpu in software
interrupt context.

That is why his transformation is legal.

Lockdep simply hasn't been informed of this invariant and has
to assume the worst.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock?
  2010-08-16 19:44     ` David Miller
@ 2010-08-16 20:04       ` Peter Zijlstra
  0 siblings, 0 replies; 13+ messages in thread
From: Peter Zijlstra @ 2010-08-16 20:04 UTC (permalink / raw)
  To: David Miller; +Cc: rostedt, eric.dumazet, netdev, linux-kernel, kaber, mingo

On Mon, 2010-08-16 at 12:44 -0700, David Miller wrote:
> He's only accessing the per-cpu counter locks of other cpus.
> 
> The per-cpu lock is only locally accessed by a cpu in software
> interrupt context.
> 
> That is why his transformation is legal.
> 
> Lockdep simply hasn't been informed of this invariant and has
> to assume the worst. 

Something like the below will keep lockdep coverage, still going back to
RCU sounds like the best option.

---
 include/linux/netfilter/x_tables.h |    9 +++++++++
 1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/include/linux/netfilter/x_tables.h b/include/linux/netfilter/x_tables.h
index 24e5d01..a195feb 100644
--- a/include/linux/netfilter/x_tables.h
+++ b/include/linux/netfilter/x_tables.h
@@ -511,12 +511,21 @@ static inline void xt_info_rdunlock_bh(void)
  */
 static inline void xt_info_wrlock(unsigned int cpu)
 {
+#ifdef CONFIG_PROVE_LOCKING
+	/*
+	 * XXX foo
+	 */
+	local_bh_disable();
+#endif
 	spin_lock(&per_cpu(xt_info_locks, cpu).lock);
 }
 
 static inline void xt_info_wrunlock(unsigned int cpu)
 {
 	spin_unlock(&per_cpu(xt_info_locks, cpu).lock);
+#ifdef CONFIG_PROVE_LOCKING
+	local_bh_enable();
+#endif
 }
 
 /*

^ permalink raw reply related	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2010-08-16 20:17 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-16 17:07 [LOCKDEP BUG][2.6.36-rc1] xt_info_wrlock? Steven Rostedt
2010-08-16 17:31 ` Eric Dumazet
2010-08-16 17:55   ` Steven Rostedt
2010-08-16 18:16     ` Steven Rostedt
2010-08-16 18:36       ` Peter Zijlstra
2010-08-16 18:48         ` Eric Dumazet
2010-08-16 19:16           ` Peter Zijlstra
2010-08-16 19:35             ` Eric Dumazet
2010-08-16 20:01               ` Peter Zijlstra
2010-08-16 20:17                 ` Eric Dumazet
2010-08-16 18:48       ` Eric Dumazet
2010-08-16 19:44     ` David Miller
2010-08-16 20:04       ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).