netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Locking validator output on DCCP
@ 2006-06-20 23:40 Ian McDonald
  2006-06-21  0:34 ` Herbert Xu
  0 siblings, 1 reply; 8+ messages in thread
From: Ian McDonald @ 2006-06-20 23:40 UTC (permalink / raw)
  To: netdev, DCCP Mailing List, Arnaldo Carvalho de Melo

Folks,

I am getting this when I am using DCCP with 2.6.17-rc6-mm2 with Ingo's
lock dependency patch:

Jun 21 09:38:58 localhost kernel: [  102.068588]
Jun 21 09:38:58 localhost kernel: [  102.068592]
=============================================
Jun 21 09:38:58 localhost kernel: [  102.068602] [ INFO: possible
recursive locking detected ]
Jun 21 09:38:58 localhost kernel: [  102.068608]
---------------------------------------------
Jun 21 09:38:58 localhost kernel: [  102.068615] idle/0 is trying to
acquire lock:
Jun 21 09:38:58 localhost kernel: [  102.068620]
(&sk->sk_lock.slock#3){-+..}, at: [<c0245511>] sk_clone+0x5a/0x190
Jun 21 09:38:58 localhost kernel: [  102.068644]
Jun 21 09:38:58 localhost kernel: [  102.068646] but task is already
holding lock:
Jun 21 09:38:58 localhost kernel: [  102.068651]
(&sk->sk_lock.slock#3){-+..}, at: [<c024650d>]
sk_receive_skb+0xe6/0xfe
Jun 21 09:38:58 localhost kernel: [  102.068668]
Jun 21 09:38:58 localhost kernel: [  102.068670] other info that might
help us debug this:
Jun 21 09:38:58 localhost kernel: [  102.068676] 2 locks held by idle/0:
Jun 21 09:38:58 localhost kernel: [  102.068679]  #0:
(&tp->rx_lock){-+..}, at: [<e087d915>] rtl8139_poll+0x42/0x41c
[8139too]
Jun 21 09:38:58 localhost kernel: [  102.068722]  #1:
(&sk->sk_lock.slock#3){-+..}, at: [<c024650d>]
sk_receive_skb+0xe6/0xfe
Jun 21 09:38:58 localhost kernel: [  102.068739]
Jun 21 09:38:58 localhost kernel: [  102.068741] stack backtrace:
Jun 21 09:38:58 localhost kernel: [  102.069053]  [<c0103a2a>]
show_trace_log_lvl+0x53/0xff
Jun 21 09:38:58 localhost kernel: [  102.069091]  [<c0104078>]
show_trace+0x16/0x19
Jun 21 09:38:58 localhost kernel: [  102.069121]  [<c010411e>]
dump_stack+0x1a/0x1f
Jun 21 09:38:58 localhost kernel: [  102.069151]  [<c012d6cb>]
__lock_acquire+0x8e6/0x902
Jun 21 09:38:58 localhost kernel: [  102.069363]  [<c012d879>]
lock_acquire+0x4e/0x66
Jun 21 09:38:58 localhost kernel: [  102.069562]  [<c029779d>]
_spin_lock+0x24/0x32
Jun 21 09:38:58 localhost kernel: [  102.069777]  [<c0245511>]
sk_clone+0x5a/0x190
Jun 21 09:38:58 localhost kernel: [  102.071244]  [<c026cc93>]
inet_csk_clone+0xf/0x67
Jun 21 09:38:58 localhost kernel: [  102.072932]  [<e0cc7f31>]
dccp_create_openreq_child+0x17/0x2fe [dccp]
Jun 21 09:38:58 localhost kernel: [  102.072993]  [<e0b48c9b>]
dccp_v4_request_recv_sock+0x47/0x260 [dccp_ipv4]
Jun 21 09:38:58 localhost kernel: [  102.073020]  [<e0cc8340>]
dccp_check_req+0x128/0x264 [dccp]
Jun 21 09:38:58 localhost kernel: [  102.073049]  [<e0b48a38>]
dccp_v4_do_rcv+0x74/0x290 [dccp_ipv4]
Jun 21 09:38:58 localhost kernel: [  102.073067]  [<c0246492>]
sk_receive_skb+0x6b/0xfe
Jun 21 09:38:58 localhost kernel: [  102.074607]  [<e0b49e9c>]
dccp_v4_rcv+0x4ea/0x66e [dccp_ipv4]
Jun 21 09:38:58 localhost kernel: [  102.074651]  [<c0265540>]
ip_local_deliver+0x159/0x1f1
Jun 21 09:38:58 localhost kernel: [  102.076322]  [<c02653ba>]
ip_rcv+0x3e9/0x416
Jun 21 09:38:58 localhost kernel: [  102.077995]  [<c024b768>]
netif_receive_skb+0x287/0x317
Jun 21 09:38:58 localhost kernel: [  102.079562]  [<e087db67>]
rtl8139_poll+0x294/0x41c [8139too]
Jun 21 09:38:58 localhost kernel: [  102.079610]  [<c024d149>]
net_rx_action+0x8b/0x17c
Jun 21 09:38:58 localhost kernel: [  102.081181]  [<c011adf6>]
__do_softirq+0x54/0xb3
Jun 21 09:38:58 localhost kernel: [  102.081357]  [<c011ae84>]
do_softirq+0x2f/0x47
Jun 21 09:38:58 localhost kernel: [  102.081482]  [<c011b0a5>]
irq_exit+0x39/0x46
Jun 21 09:38:58 localhost kernel: [  102.081608]  [<c0104f73>] do_IRQ+0x77/0x84
Jun 21 09:38:58 localhost kernel: [  102.081644]  [<c0103561>]
common_interrupt+0x25/0x2c
Jun 21 09:38:58 localhost kernel: [  154.463644] CCID: Registered CCID 3 (ccid3)

The code of sk_clone (net/core/sock.c) is:

struct sock *sk_clone(const struct sock *sk, const gfp_t priority)
{
	struct sock *newsk = sk_alloc(sk->sk_family, priority, sk->sk_prot, 0);

	if (newsk != NULL) {
		struct sk_filter *filter;

		memcpy(newsk, sk, sk->sk_prot->obj_size);

		/* SANITY */
		sk_node_init(&newsk->sk_node);
		sock_lock_init(newsk);

The relevant code is the sock_lock_init

The code of sk_receive_skb (net/core/sock.c) is:

int sk_receive_skb(struct sock *sk, struct sk_buff *skb)
{
	int rc = NET_RX_SUCCESS;

	if (sk_filter(sk, skb, 0))
		goto discard_and_relse;

	skb->dev = NULL;

	bh_lock_sock(sk);

The relevant code is the bh_lock_sock.

As I read this it is not a recursive lock as sk_clone is occurring
second and is actually creating a new socket so they are trying to
lock on different sockets.

Can someone tell me whether I am correct in my thinking or not? If I
am then I will work out how to tell the lock validator not to worry
about it.

Thanks,

Ian
-- 
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Locking validator output on DCCP
  2006-06-20 23:40 Locking validator output on DCCP Ian McDonald
@ 2006-06-21  0:34 ` Herbert Xu
  2006-06-21 10:18   ` Arjan van de Ven
  2006-06-21 10:52   ` Ingo Molnar
  0 siblings, 2 replies; 8+ messages in thread
From: Herbert Xu @ 2006-06-21  0:34 UTC (permalink / raw)
  To: Ian McDonald, Ingo Molnar
  Cc: netdev, DCCP Mailing List, Arnaldo Carvalho de Melo

On Tue, Jun 20, 2006 at 11:40:05PM +0000, Ian McDonald wrote:
> 
> I am getting this when I am using DCCP with 2.6.17-rc6-mm2 with Ingo's
> lock dependency patch:
> 
> Jun 21 09:38:58 localhost kernel: [  102.068588]
> Jun 21 09:38:58 localhost kernel: [  102.068592]
> =============================================
> Jun 21 09:38:58 localhost kernel: [  102.068602] [ INFO: possible
> recursive locking detected ]
> Jun 21 09:38:58 localhost kernel: [  102.068608]
> ---------------------------------------------
> Jun 21 09:38:58 localhost kernel: [  102.068615] idle/0 is trying to
> acquire lock:
> Jun 21 09:38:58 localhost kernel: [  102.068620]
> (&sk->sk_lock.slock#3){-+..}, at: [<c0245511>] sk_clone+0x5a/0x190
> Jun 21 09:38:58 localhost kernel: [  102.068644]
> Jun 21 09:38:58 localhost kernel: [  102.068646] but task is already
> holding lock:
> Jun 21 09:38:58 localhost kernel: [  102.068651]
> (&sk->sk_lock.slock#3){-+..}, at: [<c024650d>]
> sk_receive_skb+0xe6/0xfe

...
 
> As I read this it is not a recursive lock as sk_clone is occurring
> second and is actually creating a new socket so they are trying to
> lock on different sockets.
> 
> Can someone tell me whether I am correct in my thinking or not? If I
> am then I will work out how to tell the lock validator not to worry
> about it.

I agree, this looks bogus.  Ingo, could you please take a look?

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Locking validator output on DCCP
  2006-06-21  0:34 ` Herbert Xu
@ 2006-06-21 10:18   ` Arjan van de Ven
  2006-06-22  4:51     ` Ian McDonald
  2006-06-21 10:52   ` Ingo Molnar
  1 sibling, 1 reply; 8+ messages in thread
From: Arjan van de Ven @ 2006-06-21 10:18 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Arnaldo Carvalho de Melo, DCCP Mailing List, netdev, Ingo Molnar,
	Ian McDonald

On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote:
> > As I read this it is not a recursive lock as sk_clone is occurring
> > second and is actually creating a new socket so they are trying to
> > lock on different sockets.
> > 
> > Can someone tell me whether I am correct in my thinking or not? If I
> > am then I will work out how to tell the lock validator not to worry
> > about it.
> 
> I agree, this looks bogus.  Ingo, could you please take a look?

Fix is relatively easy:


sk_clone creates a new socket, and thus can never deadlock, and in fact
can be called with the original socket locked. This therefore is a
legitimate nesting case; mark it as such.

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>


---
 net/core/sock.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Index: linux-2.6.17-rc6-mm2/net/core/sock.c
===================================================================
--- linux-2.6.17-rc6-mm2.orig/net/core/sock.c
+++ linux-2.6.17-rc6-mm2/net/core/sock.c
@@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock 
 		/* SANITY */
 		sk_node_init(&newsk->sk_node);
 		sock_lock_init(newsk);
-		bh_lock_sock(newsk);
+		bh_lock_sock_nested(newsk);
 
 		atomic_set(&newsk->sk_rmem_alloc, 0);
 		atomic_set(&newsk->sk_wmem_alloc, 0);


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Locking validator output on DCCP
  2006-06-21  0:34 ` Herbert Xu
  2006-06-21 10:18   ` Arjan van de Ven
@ 2006-06-21 10:52   ` Ingo Molnar
  2006-06-22  4:55     ` Ian McDonald
  1 sibling, 1 reply; 8+ messages in thread
From: Ingo Molnar @ 2006-06-21 10:52 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Ian McDonald, netdev, DCCP Mailing List, Arnaldo Carvalho de Melo


* Herbert Xu <herbert@gondor.apana.org.au> wrote:

> > Can someone tell me whether I am correct in my thinking or not? If I 
> > am then I will work out how to tell the lock validator not to worry 
> > about it.
> 
> I agree, this looks bogus.  Ingo, could you please take a look?

sure - Ian, could you try Arjan's fix below?

	Ingo

------------------------------------------------------------
Subject: lock validator: annotate vlan "master" device locks
From: Arjan van de Ven <arjan@linux.intel.com>

vlan devices are "masters" of normal "slave" devices and thus need their 
own lock key. (this will be switched to the reinit_key APIs once they 
are available)

Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>

---
 net/8021q/vlan.c |    2 ++
 1 file changed, 2 insertions(+)

Index: linux-2.6.17-rc6-mm2/net/8021q/vlan.c
===================================================================
--- linux-2.6.17-rc6-mm2.orig/net/8021q/vlan.c
+++ linux-2.6.17-rc6-mm2/net/8021q/vlan.c
@@ -463,6 +463,8 @@ static struct net_device *register_vlan_
 	if (new_dev == NULL)
 		goto out_unlock;
 
+	spin_lock_init(&dev->xmit_lock);
+
 #ifdef VLAN_DEBUG
 	printk(VLAN_DBG "Allocated new name -:%s:-\n", new_dev->name);
 #endif


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Locking validator output on DCCP
  2006-06-21 10:18   ` Arjan van de Ven
@ 2006-06-22  4:51     ` Ian McDonald
  2006-06-23  2:38       ` Ian McDonald
  0 siblings, 1 reply; 8+ messages in thread
From: Ian McDonald @ 2006-06-22  4:51 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Herbert Xu, Arnaldo Carvalho de Melo, DCCP Mailing List, netdev,
	Ingo Molnar

On 6/21/06, Arjan van de Ven <arjan@linux.intel.com> wrote:
> On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote:
> > > As I read this it is not a recursive lock as sk_clone is occurring
> > > second and is actually creating a new socket so they are trying to
> > > lock on different sockets.
> > >
> > > Can someone tell me whether I am correct in my thinking or not? If I
> > > am then I will work out how to tell the lock validator not to worry
> > > about it.
> >
> > I agree, this looks bogus.  Ingo, could you please take a look?
>
> Fix is relatively easy:
>
>
> sk_clone creates a new socket, and thus can never deadlock, and in fact
> can be called with the original socket locked. This therefore is a
> legitimate nesting case; mark it as such.
>
> Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
>
>
> ---
>  net/core/sock.c |    2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> Index: linux-2.6.17-rc6-mm2/net/core/sock.c
> ===================================================================
> --- linux-2.6.17-rc6-mm2.orig/net/core/sock.c
> +++ linux-2.6.17-rc6-mm2/net/core/sock.c
> @@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock
>                 /* SANITY */
>                 sk_node_init(&newsk->sk_node);
>                 sock_lock_init(newsk);
> -               bh_lock_sock(newsk);
> +               bh_lock_sock_nested(newsk);
>
>                 atomic_set(&newsk->sk_rmem_alloc, 0);
>                 atomic_set(&newsk->sk_wmem_alloc, 0);
>
>
When I do this it now shifts around. I'll investigate further
(probably tomorrow).

Now get

Jun 22 14:20:48 localhost kernel: [ 1276.424531]
=============================================
Jun 22 14:20:48 localhost kernel: [ 1276.424541] [ INFO: possible
recursive locking detected ]
Jun 22 14:20:48 localhost kernel: [ 1276.424546]
---------------------------------------------
Jun 22 14:20:48 localhost kernel: [ 1276.424553] idle/0 is trying to
acquire lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424559]
(&sk->sk_lock.slock#5/1){-+..}, at: [<c024594e>] sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.424585]
Jun 22 14:20:48 localhost kernel: [ 1276.424587] but task is already
holding lock:
Jun 22 14:20:48 localhost kernel: [ 1276.424592]
(&sk->sk_lock.slock#5/1){-+..}, at: [<c027cd87>]
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424616]
Jun 22 14:20:48 localhost kernel: [ 1276.424618] other info that might
help us debug this:
Jun 22 14:20:48 localhost kernel: [ 1276.424624] 2 locks held by idle/0:
Jun 22 14:20:48 localhost kernel: [ 1276.424628]  #0:
(&tp->rx_lock){-+..}, at: [<e0898915>] rtl8139_poll+0x42/0x41c
[8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.424666]  #1:
(&sk->sk_lock.slock#5/1){-+..}, at: [<c027cd87>]
tcp_v4_rcv+0x42e/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.424685]
Jun 22 14:20:48 localhost kernel: [ 1276.424686] stack backtrace:
Jun 22 14:20:48 localhost kernel: [ 1276.425002]  [<c0103a2a>]
show_trace_log_lvl+0x53/0xff
Jun 22 14:20:48 localhost kernel: [ 1276.425038]  [<c0104078>]
show_trace+0x16/0x19
Jun 22 14:20:48 localhost kernel: [ 1276.425068]  [<c010411e>]
dump_stack+0x1a/0x1f
Jun 22 14:20:48 localhost kernel: [ 1276.425099]  [<c012d6cb>]
__lock_acquire+0x8e6/0x902
Jun 22 14:20:48 localhost kernel: [ 1276.425311]  [<c012d879>]
lock_acquire+0x4e/0x66
Jun 22 14:20:48 localhost kernel: [ 1276.425510]  [<c02989e1>]
_spin_lock_nested+0x26/0x36
Jun 22 14:20:48 localhost kernel: [ 1276.425726]  [<c024594e>]
sk_clone+0x5f/0x195
Jun 22 14:20:48 localhost kernel: [ 1276.427191]  [<c026d10f>]
inet_csk_clone+0xf/0x67
Jun 22 14:20:48 localhost kernel: [ 1276.428879]  [<c027d3d0>]
tcp_create_openreq_child+0x15/0x32b
Jun 22 14:20:48 localhost kernel: [ 1276.430598]  [<c027b383>]
tcp_v4_syn_recv_sock+0x47/0x29c
Jun 22 14:20:48 localhost kernel: [ 1276.432313]  [<e0fcf440>]
tcp_v6_syn_recv_sock+0x37/0x534 [ipv6]
Jun 22 14:20:48 localhost kernel: [ 1276.432482]  [<c027d886>]
tcp_check_req+0x1a0/0x2db
Jun 22 14:20:48 localhost kernel: [ 1276.434198]  [<c027aecc>]
tcp_v4_do_rcv+0x9f/0x2fe
Jun 22 14:20:48 localhost kernel: [ 1276.435911]  [<c027d28b>]
tcp_v4_rcv+0x932/0x9b3
Jun 22 14:20:48 localhost kernel: [ 1276.437632]  [<c0265980>]
ip_local_deliver+0x159/0x1f1
Jun 22 14:20:48 localhost kernel: [ 1276.439305]  [<c02657fa>]
ip_rcv+0x3e9/0x416
Jun 22 14:20:48 localhost kernel: [ 1276.440977]  [<c024bba4>]
netif_receive_skb+0x287/0x317
Jun 22 14:20:48 localhost kernel: [ 1276.442542]  [<e0898b67>]
rtl8139_poll+0x294/0x41c [8139too]
Jun 22 14:20:48 localhost kernel: [ 1276.442590]  [<c024d585>]
net_rx_action+0x8b/0x17c
Jun 22 14:20:48 localhost kernel: [ 1276.444160]  [<c011adf6>]
__do_softirq+0x54/0xb3
Jun 22 14:20:48 localhost kernel: [ 1276.444335]  [<c011ae84>]
do_softirq+0x2f/0x47
Jun 22 14:20:48 localhost kernel: [ 1276.444460]  [<c011b0a5>]
irq_exit+0x39/0x46
Jun 22 14:20:48 localhost kernel: [ 1276.444585]  [<c0104f73>] do_IRQ+0x77/0x84
Jun 22 14:20:48 localhost kernel: [ 1276.444621]  [<c0103561>]
common_interrupt+0x25/0x2c



-- 
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Locking validator output on DCCP
  2006-06-21 10:52   ` Ingo Molnar
@ 2006-06-22  4:55     ` Ian McDonald
  0 siblings, 0 replies; 8+ messages in thread
From: Ian McDonald @ 2006-06-22  4:55 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Herbert Xu, netdev, DCCP Mailing List, Arnaldo Carvalho de Melo

On 6/21/06, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Herbert Xu <herbert@gondor.apana.org.au> wrote:
>
> > > Can someone tell me whether I am correct in my thinking or not? If I
> > > am then I will work out how to tell the lock validator not to worry
> > > about it.
> >
> > I agree, this looks bogus.  Ingo, could you please take a look?
>
> sure - Ian, could you try Arjan's fix below?
>
>         Ingo
>
> ------------------------------------------------------------
> Subject: lock validator: annotate vlan "master" device locks
> From: Arjan van de Ven <arjan@linux.intel.com>
>
The fix you sent here was the incorrect one but I did test Arjan's as
per previous e-mail.

Real dumb question time. The lock validator is testing for recursive
lock holding. Given that this is a lock at a different address can we
eliminate all such cases? Or are you trying to detect code here that
keeps on locking same type of lock in case of error and we should
explicitly flag...

Ian
-- 
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Locking validator output on DCCP
  2006-06-22  4:51     ` Ian McDonald
@ 2006-06-23  2:38       ` Ian McDonald
  2006-06-23  7:45         ` Ingo Molnar
  0 siblings, 1 reply; 8+ messages in thread
From: Ian McDonald @ 2006-06-23  2:38 UTC (permalink / raw)
  To: Arjan van de Ven
  Cc: Herbert Xu, Arnaldo Carvalho de Melo, DCCP Mailing List, netdev,
	Ingo Molnar

On 6/22/06, Ian McDonald <ian.mcdonald@jandi.co.nz> wrote:
> On 6/21/06, Arjan van de Ven <arjan@linux.intel.com> wrote:
> > On Wed, 2006-06-21 at 10:34 +1000, Herbert Xu wrote:
> > > > As I read this it is not a recursive lock as sk_clone is occurring
> > > > second and is actually creating a new socket so they are trying to
> > > > lock on different sockets.
> > > >
> > > > Can someone tell me whether I am correct in my thinking or not? If I
> > > > am then I will work out how to tell the lock validator not to worry
> > > > about it.
> > >
> > > I agree, this looks bogus.  Ingo, could you please take a look?
> >
> > Fix is relatively easy:
> >
> >
> > sk_clone creates a new socket, and thus can never deadlock, and in fact
> > can be called with the original socket locked. This therefore is a
> > legitimate nesting case; mark it as such.
> >
> > Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
> >
> >
> > ---
> >  net/core/sock.c |    2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > Index: linux-2.6.17-rc6-mm2/net/core/sock.c
> > ===================================================================
> > --- linux-2.6.17-rc6-mm2.orig/net/core/sock.c
> > +++ linux-2.6.17-rc6-mm2/net/core/sock.c
> > @@ -846,7 +846,7 @@ struct sock *sk_clone(const struct sock
> >                 /* SANITY */
> >                 sk_node_init(&newsk->sk_node);
> >                 sock_lock_init(newsk);
> > -               bh_lock_sock(newsk);
> > +               bh_lock_sock_nested(newsk);
> >
> >                 atomic_set(&newsk->sk_rmem_alloc, 0);
> >                 atomic_set(&newsk->sk_wmem_alloc, 0);
> >
> >
> When I do this it now shifts around. I'll investigate further
> (probably tomorrow).
>
> Now get
>
> Jun 22 14:20:48 localhost kernel: [ 1276.424531]
> =============================================
> Jun 22 14:20:48 localhost kernel: [ 1276.424541] [ INFO: possible
> recursive locking detected ]
> Jun 22 14:20:48 localhost kernel: [ 1276.424546]
> ---------------------------------------------
> Jun 22 14:20:48 localhost kernel: [ 1276.424553] idle/0 is trying to
> acquire lock:
> Jun 22 14:20:48 localhost kernel: [ 1276.424559]
> (&sk->sk_lock.slock#5/1){-+..}, at: [<c024594e>] sk_clone+0x5f/0x195
> Jun 22 14:20:48 localhost kernel: [ 1276.424585]
> Jun 22 14:20:48 localhost kernel: [ 1276.424587] but task is already
> holding lock:
> Jun 22 14:20:48 localhost kernel: [ 1276.424592]
> (&sk->sk_lock.slock#5/1){-+..}, at: [<c027cd87>]
> tcp_v4_rcv+0x42e/0x9b3
> Jun 22 14:20:48 localhost kernel: [ 1276.424616]
> Jun 22 14:20:48 localhost kernel: [ 1276.424618] other info that might
> help us debug this:
> Jun 22 14:20:48 localhost kernel: [ 1276.424624] 2 locks held by idle/0:
> Jun 22 14:20:48 localhost kernel: [ 1276.424628]  #0:
> (&tp->rx_lock){-+..}, at: [<e0898915>] rtl8139_poll+0x42/0x41c
> [8139too]
> Jun 22 14:20:48 localhost kernel: [ 1276.424666]  #1:
> (&sk->sk_lock.slock#5/1){-+..}, at: [<c027cd87>]
> tcp_v4_rcv+0x42e/0x9b3
> Jun 22 14:20:48 localhost kernel: [ 1276.424685]
> Jun 22 14:20:48 localhost kernel: [ 1276.424686] stack backtrace:
> Jun 22 14:20:48 localhost kernel: [ 1276.425002]  [<c0103a2a>]
> show_trace_log_lvl+0x53/0xff
> Jun 22 14:20:48 localhost kernel: [ 1276.425038]  [<c0104078>]
> show_trace+0x16/0x19
> Jun 22 14:20:48 localhost kernel: [ 1276.425068]  [<c010411e>]
> dump_stack+0x1a/0x1f
> Jun 22 14:20:48 localhost kernel: [ 1276.425099]  [<c012d6cb>]
> __lock_acquire+0x8e6/0x902
> Jun 22 14:20:48 localhost kernel: [ 1276.425311]  [<c012d879>]
> lock_acquire+0x4e/0x66
> Jun 22 14:20:48 localhost kernel: [ 1276.425510]  [<c02989e1>]
> _spin_lock_nested+0x26/0x36
> Jun 22 14:20:48 localhost kernel: [ 1276.425726]  [<c024594e>]
> sk_clone+0x5f/0x195
> Jun 22 14:20:48 localhost kernel: [ 1276.427191]  [<c026d10f>]
> inet_csk_clone+0xf/0x67
> Jun 22 14:20:48 localhost kernel: [ 1276.428879]  [<c027d3d0>]
> tcp_create_openreq_child+0x15/0x32b
> Jun 22 14:20:48 localhost kernel: [ 1276.430598]  [<c027b383>]
> tcp_v4_syn_recv_sock+0x47/0x29c
> Jun 22 14:20:48 localhost kernel: [ 1276.432313]  [<e0fcf440>]
> tcp_v6_syn_recv_sock+0x37/0x534 [ipv6]
> Jun 22 14:20:48 localhost kernel: [ 1276.432482]  [<c027d886>]
> tcp_check_req+0x1a0/0x2db
> Jun 22 14:20:48 localhost kernel: [ 1276.434198]  [<c027aecc>]
> tcp_v4_do_rcv+0x9f/0x2fe
> Jun 22 14:20:48 localhost kernel: [ 1276.435911]  [<c027d28b>]
> tcp_v4_rcv+0x932/0x9b3
> Jun 22 14:20:48 localhost kernel: [ 1276.437632]  [<c0265980>]
> ip_local_deliver+0x159/0x1f1
> Jun 22 14:20:48 localhost kernel: [ 1276.439305]  [<c02657fa>]
> ip_rcv+0x3e9/0x416
> Jun 22 14:20:48 localhost kernel: [ 1276.440977]  [<c024bba4>]
> netif_receive_skb+0x287/0x317
> Jun 22 14:20:48 localhost kernel: [ 1276.442542]  [<e0898b67>]
> rtl8139_poll+0x294/0x41c [8139too]
> Jun 22 14:20:48 localhost kernel: [ 1276.442590]  [<c024d585>]
> net_rx_action+0x8b/0x17c
> Jun 22 14:20:48 localhost kernel: [ 1276.444160]  [<c011adf6>]
> __do_softirq+0x54/0xb3
> Jun 22 14:20:48 localhost kernel: [ 1276.444335]  [<c011ae84>]
> do_softirq+0x2f/0x47
> Jun 22 14:20:48 localhost kernel: [ 1276.444460]  [<c011b0a5>]
> irq_exit+0x39/0x46
> Jun 22 14:20:48 localhost kernel: [ 1276.444585]  [<c0104f73>] do_IRQ+0x77/0x84
> Jun 22 14:20:48 localhost kernel: [ 1276.444621]  [<c0103561>]
> common_interrupt+0x25/0x2c
>
OK. This is in net/ipv4/tcp_ipv4.c tcp_v4_rcv with the
bh_lock_sock_nested which I presume is clashing with the nested of
skb_clone....

Can we not do two levels nested?

Is there extra documentation for the locking validation suite so that
I can stop asking stupid questions? If not I'll just read more of the
source code.

Ian
-- 
Ian McDonald
Web: http://wand.net.nz/~iam4
Blog: http://imcdnzl.blogspot.com
WAND Network Research Group
Department of Computer Science
University of Waikato
New Zealand

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Locking validator output on DCCP
  2006-06-23  2:38       ` Ian McDonald
@ 2006-06-23  7:45         ` Ingo Molnar
  0 siblings, 0 replies; 8+ messages in thread
From: Ingo Molnar @ 2006-06-23  7:45 UTC (permalink / raw)
  To: Ian McDonald
  Cc: Arjan van de Ven, Herbert Xu, Arnaldo Carvalho de Melo,
	DCCP Mailing List, netdev


* Ian McDonald <ian.mcdonald@jandi.co.nz> wrote:

> OK. This is in net/ipv4/tcp_ipv4.c tcp_v4_rcv with the 
> bh_lock_sock_nested which I presume is clashing with the nested of 
> skb_clone....

yeah, that's exactly what happens.

> Can we not do two levels nested?

we can - but it needs to be thought through (and needs to be tested). 
But an initial glance seems to suggest that two levels, one used for 
clone and another one used for tcp_v4_rcv nesting ought to do the trick.

> Is there extra documentation for the locking validation suite so that 
> I can stop asking stupid questions? If not I'll just read more of the 
> source code.

i'd suggest to start with: Documentation/lockdep-design.txt, then with 
include/linux/lockdep.h and also look at some of the existing patches 
that introduce multiple nesting levels, such as:

    http://kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.17/2.6.17-mm1/broken-out/lock-validator-special-locking-dcache.patch

the best method is to find the proper central header for the level enum, 
and then make use of it. The enum should have a descriptive name, and 
the enum values too.

If one particular lock has multiple levels then SINGLE_DEPTH_NESTING 
should not be used. (to keep the levels architecture clean) I.e. 
possibly the best idea would be to extend bh_lock_sock_nested() with a 
'subtype' parameter, and to pass that into spin_lock_nested().

Or even better: define bh_lock_sock_nested() as an inline function with 
the right enum parameter, which will make sure only the right type of 
enum will be used. [if it's possible to do it as an inline there - it's 
a macro right now and i'm not sure all the types necessary for the 
inline are declared up to that point in the sock.h include file]

	Ingo

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2006-06-23  7:50 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-06-20 23:40 Locking validator output on DCCP Ian McDonald
2006-06-21  0:34 ` Herbert Xu
2006-06-21 10:18   ` Arjan van de Ven
2006-06-22  4:51     ` Ian McDonald
2006-06-23  2:38       ` Ian McDonald
2006-06-23  7:45         ` Ingo Molnar
2006-06-21 10:52   ` Ingo Molnar
2006-06-22  4:55     ` Ian McDonald

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).