Crash due to mutex genl_lock called from RCU context

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Crash due to mutex genl_lock called from RCU context
@ 2016-11-26  2:15 subashab
  2016-11-26  4:11 ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: subashab @ 2016-11-26  2:15 UTC (permalink / raw)
  To: tgraf, netdev; +Cc: eric.dumazet

We are seeing a crash due to gen_lock mutex being acquired in RCU 
context.
Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a
regression rack, so unfortunately I don't have steps for a reproducer.

It looks like freeing socket in RCU was brought in through commit
21e4902aea80ef35afc00ee8d2abdea4f519b7f7 ("netlink: Lockless lookup with
RCU grace period in socket release").
I am not very familiar with generic netlink sockets so I am not sure
if there is any other way to fix this apart from reverting this patch.

Any pointers to debug this would be appreciated.

Here is the call stack -

BUG: sleeping function called from invalid context 
kernel/locking/mutex.c:98
in_atomic(): 1, irqs_disabled(): 0, pid: 16400, name: busybox
[<ffffff80080cad20>] ___might_sleep+0x134/0x144
[<ffffff80080cadac>] __might_sleep+0x7c/0x8c
[<ffffff8008ef09a8>] mutex_lock+0x2c/0x4c
[<ffffff8008d307f0>] genl_lock+0x1c/0x24
[<ffffff8008d30848>] genl_lock_done+0x2c/0x50
[<ffffff8008d2ccac>] netlink_sock_destruct+0x30/0x94
[<ffffff8008cdef44>] sk_destruct+0x2c/0x150
[<ffffff8008cdf104>] __sk_free+0x9c/0xc4
[<ffffff8008cdf16c>] sk_free+0x40/0x4c
[<ffffff8008d2c7fc>] deferred_put_nlk_sk+0x40/0x4c
[<ffffff800810b104>] rcu_process_callbacks+0x4d4/0x644
[<ffffff80080a6598>] __do_softirq+0x1b8/0x3c4
[<ffffff80080a6a60>] irq_exit+0x80/0xd4
[<ffffff800808e554>] handle_IPI+0x1c0/0x364
[<ffffff80080817f8>] gic_handle_irq+0x154/0x1a4

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-26  2:15 Crash due to mutex genl_lock called from RCU context subashab
@ 2016-11-26  4:11 ` Eric Dumazet
  2016-11-26  4:54   ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2016-11-26  4:11 UTC (permalink / raw)
  To: subashab; +Cc: tgraf, netdev

On Fri, 2016-11-25 at 19:15 -0700, subashab@codeaurora.org wrote:
> We are seeing a crash due to gen_lock mutex being acquired in RCU 
> context.
> Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a
> regression rack, so unfortunately I don't have steps for a reproducer.
> 
> It looks like freeing socket in RCU was brought in through commit
> 21e4902aea80ef35afc00ee8d2abdea4f519b7f7 ("netlink: Lockless lookup with
> RCU grace period in socket release").
> I am not very familiar with generic netlink sockets so I am not sure
> if there is any other way to fix this apart from reverting this patch.
> 
> Any pointers to debug this would be appreciated.
> 
> Here is the call stack -
> 
> BUG: sleeping function called from invalid context 
> kernel/locking/mutex.c:98
> in_atomic(): 1, irqs_disabled(): 0, pid: 16400, name: busybox
> [<ffffff80080cad20>] ___might_sleep+0x134/0x144
> [<ffffff80080cadac>] __might_sleep+0x7c/0x8c
> [<ffffff8008ef09a8>] mutex_lock+0x2c/0x4c
> [<ffffff8008d307f0>] genl_lock+0x1c/0x24
> [<ffffff8008d30848>] genl_lock_done+0x2c/0x50
> [<ffffff8008d2ccac>] netlink_sock_destruct+0x30/0x94
> [<ffffff8008cdef44>] sk_destruct+0x2c/0x150
> [<ffffff8008cdf104>] __sk_free+0x9c/0xc4
> [<ffffff8008cdf16c>] sk_free+0x40/0x4c
> [<ffffff8008d2c7fc>] deferred_put_nlk_sk+0x40/0x4c
> [<ffffff800810b104>] rcu_process_callbacks+0x4d4/0x644
> [<ffffff80080a6598>] __do_softirq+0x1b8/0x3c4
> [<ffffff80080a6a60>] irq_exit+0x80/0xd4
> [<ffffff800808e554>] handle_IPI+0x1c0/0x364
> [<ffffff80080817f8>] gic_handle_irq+0x154/0x1a4

Right, Thomas commit looks buggy.

Unfortunately the proper infra was added later in commit 
a4298e4522d687a79af8 ("net: add SOCK_RCU_FREE socket flag")

I guess we should backport it, then apply following (untested) fix

Could you test this solution ? Thanks !

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 62bea4591054..fe0d43314198 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -456,8 +456,9 @@ static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
 
 	rcu_read_lock();
 	sk = __netlink_lookup(table, portid, net);
-	if (sk)
-		sock_hold(sk);
+	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+		sk = NULL;
+
 	rcu_read_unlock();
 
 	return sk;
@@ -581,6 +582,7 @@ static int __netlink_create(struct net *net, struct socket *sock,
 	}
 	init_waitqueue_head(&nlk->wait);
 
+	sock_set_flag(sk, SOCK_RCU_FREE);
 	sk->sk_destruct = netlink_sock_destruct;
 	sk->sk_protocol = protocol;
 	return 0;
@@ -645,13 +647,6 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
 	goto out;
 }
 
-static void deferred_put_nlk_sk(struct rcu_head *head)
-{
-	struct netlink_sock *nlk = container_of(head, struct netlink_sock, rcu);
-
-	sock_put(&nlk->sk);
-}
-
 static int netlink_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
@@ -724,7 +719,7 @@ static int netlink_release(struct socket *sock)
 	local_bh_disable();
 	sock_prot_inuse_add(sock_net(sk), &netlink_proto, -1);
 	local_bh_enable();
-	call_rcu(&nlk->rcu, deferred_put_nlk_sk);
+	sock_put(sk);
 	return 0;
 }
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 3cfd6cc60504..5dc08a7b0a2b 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -32,7 +32,6 @@ struct netlink_sock {
 	struct module		*module;
 
 	struct rhash_head	node;
-	struct rcu_head		rcu;
 };
 
 static inline struct netlink_sock *nlk_sk(struct sock *sk)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-26  4:11 ` Eric Dumazet
@ 2016-11-26  4:54   ` Eric Dumazet
  2016-11-26  5:59     ` subashab
  2016-11-27  2:08     ` Cong Wang
  0 siblings, 2 replies; 12+ messages in thread
From: Eric Dumazet @ 2016-11-26  4:54 UTC (permalink / raw)
  To: subashab; +Cc: tgraf, netdev

On Fri, 2016-11-25 at 20:11 -0800, Eric Dumazet wrote:
> On Fri, 2016-11-25 at 19:15 -0700, subashab@codeaurora.org wrote:
> > We are seeing a crash due to gen_lock mutex being acquired in RCU 
> > context.
> > Crash is seen on a 4.4 based kernel ARM64 device. This occurred in a
> > regression rack, so unfortunately I don't have steps for a reproducer.
> > 
> > It looks like freeing socket in RCU was brought in through commit
> > 21e4902aea80ef35afc00ee8d2abdea4f519b7f7 ("netlink: Lockless lookup with
> > RCU grace period in socket release").
> > I am not very familiar with generic netlink sockets so I am not sure
> > if there is any other way to fix this apart from reverting this patch.
> > 
> > Any pointers to debug this would be appreciated.
> > 
> > Here is the call stack -
> > 
> > BUG: sleeping function called from invalid context 
> > kernel/locking/mutex.c:98
> > in_atomic(): 1, irqs_disabled(): 0, pid: 16400, name: busybox
> > [<ffffff80080cad20>] ___might_sleep+0x134/0x144
> > [<ffffff80080cadac>] __might_sleep+0x7c/0x8c
> > [<ffffff8008ef09a8>] mutex_lock+0x2c/0x4c
> > [<ffffff8008d307f0>] genl_lock+0x1c/0x24
> > [<ffffff8008d30848>] genl_lock_done+0x2c/0x50
> > [<ffffff8008d2ccac>] netlink_sock_destruct+0x30/0x94
> > [<ffffff8008cdef44>] sk_destruct+0x2c/0x150
> > [<ffffff8008cdf104>] __sk_free+0x9c/0xc4
> > [<ffffff8008cdf16c>] sk_free+0x40/0x4c
> > [<ffffff8008d2c7fc>] deferred_put_nlk_sk+0x40/0x4c
> > [<ffffff800810b104>] rcu_process_callbacks+0x4d4/0x644
> > [<ffffff80080a6598>] __do_softirq+0x1b8/0x3c4
> > [<ffffff80080a6a60>] irq_exit+0x80/0xd4
> > [<ffffff800808e554>] handle_IPI+0x1c0/0x364
> > [<ffffff80080817f8>] gic_handle_irq+0x154/0x1a4
> 
> Right, Thomas commit looks buggy.
> 
> Unfortunately the proper infra was added later in commit 
> a4298e4522d687a79af8 ("net: add SOCK_RCU_FREE socket flag")
> 
> I guess we should backport it, then apply following (untested) fix
> 
> Could you test this solution ? Thanks !
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 62bea4591054..fe0d43314198 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -456,8 +456,9 @@ static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
>  
>  	rcu_read_lock();
>  	sk = __netlink_lookup(table, portid, net);
> -	if (sk)
> -		sock_hold(sk);
> +	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
> +		sk = NULL;
> +
>  	rcu_read_unlock();
>  
>  	return sk;
> @@ -581,6 +582,7 @@ static int __netlink_create(struct net *net, struct socket *sock,
>  	}
>  	init_waitqueue_head(&nlk->wait);
>  
> +	sock_set_flag(sk, SOCK_RCU_FREE);
>  	sk->sk_destruct = netlink_sock_destruct;
>  	sk->sk_protocol = protocol;

Oh well, this wont work, since sk->sk_destruct will be called from RCU
callback.

Grabbing the mutex should not be done from netlink_sock_destruct() but
from netlink_release()

Maybe this patch would be better :

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 62bea4591054..cce10e3c9b68 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -324,16 +324,6 @@ static void netlink_skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
 
 static void netlink_sock_destruct(struct sock *sk)
 {
-	struct netlink_sock *nlk = nlk_sk(sk);
-
-	if (nlk->cb_running) {
-		if (nlk->cb.done)
-			nlk->cb.done(&nlk->cb);
-
-		module_put(nlk->cb.module);
-		kfree_skb(nlk->cb.skb);
-	}
-
 	skb_queue_purge(&sk->sk_receive_queue);
 
 	if (!sock_flag(sk, SOCK_DEAD)) {
@@ -456,8 +446,9 @@ static struct sock *netlink_lookup(struct net *net, int protocol, u32 portid)
 
 	rcu_read_lock();
 	sk = __netlink_lookup(table, portid, net);
-	if (sk)
-		sock_hold(sk);
+	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
+		sk = NULL;
+
 	rcu_read_unlock();
 
 	return sk;
@@ -581,6 +572,7 @@ static int __netlink_create(struct net *net, struct socket *sock,
 	}
 	init_waitqueue_head(&nlk->wait);
 
+	sock_set_flag(sk, SOCK_RCU_FREE);
 	sk->sk_destruct = netlink_sock_destruct;
 	sk->sk_protocol = protocol;
 	return 0;
@@ -645,13 +637,6 @@ static int netlink_create(struct net *net, struct socket *sock, int protocol,
 	goto out;
 }
 
-static void deferred_put_nlk_sk(struct rcu_head *head)
-{
-	struct netlink_sock *nlk = container_of(head, struct netlink_sock, rcu);
-
-	sock_put(&nlk->sk);
-}
-
 static int netlink_release(struct socket *sock)
 {
 	struct sock *sk = sock->sk;
@@ -724,7 +709,19 @@ static int netlink_release(struct socket *sock)
 	local_bh_disable();
 	sock_prot_inuse_add(sock_net(sk), &netlink_proto, -1);
 	local_bh_enable();
-	call_rcu(&nlk->rcu, deferred_put_nlk_sk);
+	if (nlk->cb_running) {
+		mutex_lock(nlk->cb_mutex);
+		if (nlk->cb_running) {
+			if (nlk->cb.done)
+				nlk->cb.done(&nlk->cb);
+	
+			module_put(nlk->cb.module);
+			kfree_skb(nlk->cb.skb);
+			nlk->cb_running = false;
+		}
+		mutex_unlock(nlk->cb_mutex);
+	}
+	sock_put(sk);
 	return 0;
 }
 
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 3cfd6cc60504..5dc08a7b0a2b 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -32,7 +32,6 @@ struct netlink_sock {
 	struct module		*module;
 
 	struct rhash_head	node;
-	struct rcu_head		rcu;
 };
 
 static inline struct netlink_sock *nlk_sk(struct sock *sk)

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-26  4:54   ` Eric Dumazet
@ 2016-11-26  5:59     ` subashab
  2016-11-27  2:08     ` Cong Wang
  1 sibling, 0 replies; 12+ messages in thread
From: subashab @ 2016-11-26  5:59 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: tgraf, netdev, netdev-owner

> Oh well, this wont work, since sk->sk_destruct will be called from RCU
> callback.
> 
> Grabbing the mutex should not be done from netlink_sock_destruct() but
> from netlink_release()
> 
> Maybe this patch would be better :
> 
> diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
> index 62bea4591054..cce10e3c9b68 100644
> --- a/net/netlink/af_netlink.c
> +++ b/net/netlink/af_netlink.c
> @@ -324,16 +324,6 @@ static void netlink_skb_set_owner_r(struct
> sk_buff *skb, struct sock *sk)
> 
>  static void netlink_sock_destruct(struct sock *sk)
>  {
> -	struct netlink_sock *nlk = nlk_sk(sk);
> -
> -	if (nlk->cb_running) {
> -		if (nlk->cb.done)
> -			nlk->cb.done(&nlk->cb);
> -
> -		module_put(nlk->cb.module);
> -		kfree_skb(nlk->cb.skb);
> -	}
> -
>  	skb_queue_purge(&sk->sk_receive_queue);
> 
>  	if (!sock_flag(sk, SOCK_DEAD)) {
> @@ -456,8 +446,9 @@ static struct sock *netlink_lookup(struct net
> *net, int protocol, u32 portid)
> 
>  	rcu_read_lock();
>  	sk = __netlink_lookup(table, portid, net);
> -	if (sk)
> -		sock_hold(sk);
> +	if (sk && !atomic_inc_not_zero(&sk->sk_refcnt))
> +		sk = NULL;
> +
>  	rcu_read_unlock();
> 
>  	return sk;
> @@ -581,6 +572,7 @@ static int __netlink_create(struct net *net,
> struct socket *sock,
>  	}
>  	init_waitqueue_head(&nlk->wait);
> 
> +	sock_set_flag(sk, SOCK_RCU_FREE);
>  	sk->sk_destruct = netlink_sock_destruct;
>  	sk->sk_protocol = protocol;
>  	return 0;
> @@ -645,13 +637,6 @@ static int netlink_create(struct net *net, struct
> socket *sock, int protocol,
>  	goto out;
>  }
> 
> -static void deferred_put_nlk_sk(struct rcu_head *head)
> -{
> -	struct netlink_sock *nlk = container_of(head, struct netlink_sock, 
> rcu);
> -
> -	sock_put(&nlk->sk);
> -}
> -
>  static int netlink_release(struct socket *sock)
>  {
>  	struct sock *sk = sock->sk;
> @@ -724,7 +709,19 @@ static int netlink_release(struct socket *sock)
>  	local_bh_disable();
>  	sock_prot_inuse_add(sock_net(sk), &netlink_proto, -1);
>  	local_bh_enable();
> -	call_rcu(&nlk->rcu, deferred_put_nlk_sk);
> +	if (nlk->cb_running) {
> +		mutex_lock(nlk->cb_mutex);
> +		if (nlk->cb_running) {
> +			if (nlk->cb.done)
> +				nlk->cb.done(&nlk->cb);
> +
> +			module_put(nlk->cb.module);
> +			kfree_skb(nlk->cb.skb);
> +			nlk->cb_running = false;
> +		}
> +		mutex_unlock(nlk->cb_mutex);
> +	}
> +	sock_put(sk);
>  	return 0;
>  }
> 
> diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
> index 3cfd6cc60504..5dc08a7b0a2b 100644
> --- a/net/netlink/af_netlink.h
> +++ b/net/netlink/af_netlink.h
> @@ -32,7 +32,6 @@ struct netlink_sock {
>  	struct module		*module;
> 
>  	struct rhash_head	node;
> -	struct rcu_head		rcu;
>  };
> 
>  static inline struct netlink_sock *nlk_sk(struct sock *sk)

Thanks Eric! I'll try this and get back with results over this weekend.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-26  4:54   ` Eric Dumazet
  2016-11-26  5:59     ` subashab
@ 2016-11-27  2:08     ` Cong Wang
  2016-11-27  2:26       ` Eric Dumazet
  1 sibling, 1 reply; 12+ messages in thread
From: Cong Wang @ 2016-11-27  2:08 UTC (permalink / raw)
  To: Eric Dumazet; +Cc: subashab, Thomas Graf, Linux Kernel Network Developers

On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Oh well, this wont work, since sk->sk_destruct will be called from RCU
> callback.
>
> Grabbing the mutex should not be done from netlink_sock_destruct() but
> from netlink_release()

But you also change the behavior of cb.done(), currently it is called when the
last sock ref is gone, with your patch it is called when the first
sock is closed.
No?

I don't see why we need to get genl_lock in ->done() here, because we are
already the last sock using it and module ref protects the ops from being
removed via module, seems we are pretty safe without any lock.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-27  2:08     ` Cong Wang
@ 2016-11-27  2:26       ` Eric Dumazet
  2016-11-27  6:28         ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2016-11-27  2:26 UTC (permalink / raw)
  To: Cong Wang; +Cc: subashab, Thomas Graf, Linux Kernel Network Developers

On Sat, 2016-11-26 at 18:08 -0800, Cong Wang wrote:
> On Fri, Nov 25, 2016 at 8:54 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Oh well, this wont work, since sk->sk_destruct will be called from RCU
> > callback.
> >
> > Grabbing the mutex should not be done from netlink_sock_destruct() but
> > from netlink_release()
> 
> But you also change the behavior of cb.done(), currently it is called when the
> last sock ref is gone, with your patch it is called when the first
> sock is closed.

No. It will be called when last refcount on the socket is released,
sk_refcnt transitions to final 0.


My patch changes the sock_hold() to the variant that makes sure
sk_refcnt is not 0 before increase, otherwise a race can happen and
release could be called twice.

Classic refcounting stuff coupled to rcu rules.

> No?


Are you telling me inet_release() is called when we close() the first
file descriptor ?

fd1 = socket()
fd2 = dup(fd1);
close(fd2) -> release() ???


> 
> I don't see why we need to get genl_lock in ->done() here, because we are
> already the last sock using it and module ref protects the ops from being
> removed via module, seems we are pretty safe without any lock.

Well, at least this exposes a real bug in Thomas patch.

Removing the lock might be done for net-next, not stable branches.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-27  2:26       ` Eric Dumazet
@ 2016-11-27  6:28         ` Cong Wang
  2016-11-27 16:23           ` Eric Dumazet
  0 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2016-11-27  6:28 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Subash Abhinov Kasiviswanathan, Thomas Graf,
	Linux Kernel Network Developers

On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
> Are you telling me inet_release() is called when we close() the first
> file descriptor ?
>
> fd1 = socket()
> fd2 = dup(fd1);
> close(fd2) -> release() ???

Sorry, I didn't express myself clearly, I meant your change,
if exclude the SOCK_RCU_FREE part, basically reverts this commit:

commit 3f660d66dfbc13ea4b61d3865851b348444c24b4
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Thu May 3 03:17:14 2007 -0700

    [NETLINK]: Kill CB only when socket is unused

IOW, ->release() is called when the last sock fd ref is gone, but ->destructor()
is called with the last sock ref is gone. They are very different.


>> I don't see why we need to get genl_lock in ->done() here, because we are
>> already the last sock using it and module ref protects the ops from being
>> removed via module, seems we are pretty safe without any lock.
>
> Well, at least this exposes a real bug in Thomas patch.
>
> Removing the lock might be done for net-next, not stable branches.

I am confused, what Subash reported is a kernel warning which can
surely be fixed by removing genl lock (if it is correct, I need to double
check), so why for net-next?

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-27  6:28         ` Cong Wang
@ 2016-11-27 16:23           ` Eric Dumazet
  2016-11-28  6:53             ` Cong Wang
  0 siblings, 1 reply; 12+ messages in thread
From: Eric Dumazet @ 2016-11-27 16:23 UTC (permalink / raw)
  To: Cong Wang
  Cc: Subash Abhinov Kasiviswanathan, Thomas Graf,
	Linux Kernel Network Developers, Herbert Xu

On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote:
> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >
> > Are you telling me inet_release() is called when we close() the first
> > file descriptor ?
> >
> > fd1 = socket()
> > fd2 = dup(fd1);
> > close(fd2) -> release() ???
> 
> Sorry, I didn't express myself clearly, I meant your change,
> if exclude the SOCK_RCU_FREE part, basically reverts this commit:
> 
> commit 3f660d66dfbc13ea4b61d3865851b348444c24b4
> Author: Herbert Xu <herbert@gondor.apana.org.au>
> Date:   Thu May 3 03:17:14 2007 -0700
> 
>     [NETLINK]: Kill CB only when socket is unused
> 
> IOW, ->release() is called when the last sock fd ref is gone, but ->destructor()
> is called with the last sock ref is gone. They are very different.

Hmm...


> I am confused, what Subash reported is a kernel warning which can
> surely be fixed by removing genl lock (if it is correct, I need to double
> check), so why for net-next?

Because Subash pointed to a buggy commit.

We want to fix all issues bring by this commit, not only the immediate
problem about mutex.

I have no idea if we can safely remove the mutex from genl_lock_done() :

The genl_lock() is not only protecting the socket itself, it might
protect global data as well, or protect some kind of lock ordering among
multiple mutexes.

Have you checked all genl users, down to linux-4.0 , point where commit
21e4902aea80ef35a was added ?

Herbert, Thomas, your help would be appreciated, thanks.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-27 16:23           ` Eric Dumazet
@ 2016-11-28  6:53             ` Cong Wang
  2016-11-28 11:22               ` Herbert Xu
  0 siblings, 1 reply; 12+ messages in thread
From: Cong Wang @ 2016-11-28  6:53 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Subash Abhinov Kasiviswanathan, Thomas Graf,
	Linux Kernel Network Developers, Herbert Xu

On Sun, Nov 27, 2016 at 8:23 AM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
> On Sat, 2016-11-26 at 22:28 -0800, Cong Wang wrote:
>> On Sat, Nov 26, 2016 at 6:26 PM, Eric Dumazet <eric.dumazet@gmail.com> wrote:
>> >
>> > Are you telling me inet_release() is called when we close() the first
>> > file descriptor ?
>> >
>> > fd1 = socket()
>> > fd2 = dup(fd1);
>> > close(fd2) -> release() ???
>>
>> Sorry, I didn't express myself clearly, I meant your change,
>> if exclude the SOCK_RCU_FREE part, basically reverts this commit:
>>
>> commit 3f660d66dfbc13ea4b61d3865851b348444c24b4
>> Author: Herbert Xu <herbert@gondor.apana.org.au>
>> Date:   Thu May 3 03:17:14 2007 -0700
>>
>>     [NETLINK]: Kill CB only when socket is unused
>>
>> IOW, ->release() is called when the last sock fd ref is gone, but ->destructor()
>> is called with the last sock ref is gone. They are very different.
>
> Hmm...
>
>
>> I am confused, what Subash reported is a kernel warning which can
>> surely be fixed by removing genl lock (if it is correct, I need to double
>> check), so why for net-next?
>
> Because Subash pointed to a buggy commit.
>
> We want to fix all issues bring by this commit, not only the immediate
> problem about mutex.
>
> I have no idea if we can safely remove the mutex from genl_lock_done() :

I meant removing it only for the destructor case, we definitely can't remove
it for the dump case.

>
> The genl_lock() is not only protecting the socket itself, it might
> protect global data as well, or protect some kind of lock ordering among
> multiple mutexes.
>
> Have you checked all genl users, down to linux-4.0 , point where commit
> 21e4902aea80ef35a was added ?
>

I just took a deeper look, some user calls rhashtable_destroy() in ->done(),
so even removing that genl lock is not enough, perhaps we should just
move it to a work struct like what Daniel does for the tcf_proto, but that is
ugly... I don't know if RCU provides any API to execute the callback in process
context.

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-28  6:53             ` Cong Wang
@ 2016-11-28 11:22               ` Herbert Xu
  2016-11-29  4:33                 ` Cong Wang
  2016-11-30  0:49                 ` David Miller
  0 siblings, 2 replies; 12+ messages in thread
From: Herbert Xu @ 2016-11-28 11:22 UTC (permalink / raw)
  To: Cong Wang
  Cc: Eric Dumazet, Subash Abhinov Kasiviswanathan, Thomas Graf,
	Linux Kernel Network Developers

On Sun, Nov 27, 2016 at 10:53:21PM -0800, Cong Wang wrote:
>
> I just took a deeper look, some user calls rhashtable_destroy() in ->done(),
> so even removing that genl lock is not enough, perhaps we should just
> move it to a work struct like what Daniel does for the tcf_proto, but that is
> ugly... I don't know if RCU provides any API to execute the callback in process
> context.

I looked into doing it without a worker struct, but basically it
means that we'd have to add more crap to the common code path for
what is essentially a very rare case.

So I think we should go with the worker struct because all we lose
is a bit of memory.

---8<---
netlink: Call cb->done from a worker thread

The cb->done interface expects to be called in process context.
This was broken by the netlink RCU conversion.  This patch fixes
it by adding a worker struct to make the cb->done call where
necessary.

Fixes: 21e4902aea80 ("netlink: Lockless lookup with RCU grace...")
Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index 62bea45..602e5eb 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -322,14 +322,11 @@ static void netlink_skb_set_owner_r(struct sk_buff *skb, struct sock *sk)
 	sk_mem_charge(sk, skb->truesize);
 }
 
-static void netlink_sock_destruct(struct sock *sk)
+static void __netlink_sock_destruct(struct sock *sk)
 {
 	struct netlink_sock *nlk = nlk_sk(sk);
 
 	if (nlk->cb_running) {
-		if (nlk->cb.done)
-			nlk->cb.done(&nlk->cb);
-
 		module_put(nlk->cb.module);
 		kfree_skb(nlk->cb.skb);
 	}
@@ -346,6 +343,28 @@ static void netlink_sock_destruct(struct sock *sk)
 	WARN_ON(nlk_sk(sk)->groups);
 }
 
+static void netlink_sock_destruct_work(struct work_struct *work)
+{
+	struct netlink_sock *nlk = container_of(work, struct netlink_sock,
+						work);
+
+	nlk->cb.done(&nlk->cb);
+	__netlink_sock_destruct(&nlk->sk);
+}
+
+static void netlink_sock_destruct(struct sock *sk)
+{
+	struct netlink_sock *nlk = nlk_sk(sk);
+
+	if (nlk->cb_running && nlk->cb.done) {
+		INIT_WORK(&nlk->work, netlink_sock_destruct_work);
+		schedule_work(&nlk->work);
+		return;
+	}
+
+	__netlink_sock_destruct(sk);
+}
+
 /* This lock without WQ_FLAG_EXCLUSIVE is good on UP and it is _very_ bad on
  * SMP. Look, when several writers sleep and reader wakes them up, all but one
  * immediately hit write lock and grab all the cpus. Exclusive sleep solves
diff --git a/net/netlink/af_netlink.h b/net/netlink/af_netlink.h
index 3cfd6cc..4fdb383 100644
--- a/net/netlink/af_netlink.h
+++ b/net/netlink/af_netlink.h
@@ -3,6 +3,7 @@
 
 #include <linux/rhashtable.h>
 #include <linux/atomic.h>
+#include <linux/workqueue.h>
 #include <net/sock.h>
 
 #define NLGRPSZ(x)	(ALIGN(x, sizeof(unsigned long) * 8) / 8)
@@ -33,6 +34,7 @@ struct netlink_sock {
 
 	struct rhash_head	node;
 	struct rcu_head		rcu;
+	struct work_struct	work;
 };
 
 static inline struct netlink_sock *nlk_sk(struct sock *sk)

-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-28 11:22               ` Herbert Xu
@ 2016-11-29  4:33                 ` Cong Wang
  2016-11-30  0:49                 ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: Cong Wang @ 2016-11-29  4:33 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Eric Dumazet, Subash Abhinov Kasiviswanathan, Thomas Graf,
	Linux Kernel Network Developers

On Mon, Nov 28, 2016 at 3:22 AM, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> netlink: Call cb->done from a worker thread
>
> The cb->done interface expects to be called in process context.
> This was broken by the netlink RCU conversion.  This patch fixes
> it by adding a worker struct to make the cb->done call where
> necessary.
>
> Fixes: 21e4902aea80 ("netlink: Lockless lookup with RCU grace...")
> Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>


Looks good,

Acked-by: Cong Wang <xiyou.wangcong@gmail.com>

Thanks!

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: Crash due to mutex genl_lock called from RCU context
  2016-11-28 11:22               ` Herbert Xu
  2016-11-29  4:33                 ` Cong Wang
@ 2016-11-30  0:49                 ` David Miller
  1 sibling, 0 replies; 12+ messages in thread
From: David Miller @ 2016-11-30  0:49 UTC (permalink / raw)
  To: herbert; +Cc: xiyou.wangcong, eric.dumazet, subashab, tgraf, netdev

From: Herbert Xu <herbert@gondor.apana.org.au>
Date: Mon, 28 Nov 2016 19:22:12 +0800

> netlink: Call cb->done from a worker thread
> 
> The cb->done interface expects to be called in process context.
> This was broken by the netlink RCU conversion.  This patch fixes
> it by adding a worker struct to make the cb->done call where
> necessary.
> 
> Fixes: 21e4902aea80 ("netlink: Lockless lookup with RCU grace...")
> Reported-by: Subash Abhinov Kasiviswanathan <subashab@codeaurora.org>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

Applied and queued up for -stable, thanks Herbert.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2016-11-30  0:49 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-26  2:15 Crash due to mutex genl_lock called from RCU context subashab
2016-11-26  4:11 ` Eric Dumazet
2016-11-26  4:54   ` Eric Dumazet
2016-11-26  5:59     ` subashab
2016-11-27  2:08     ` Cong Wang
2016-11-27  2:26       ` Eric Dumazet
2016-11-27  6:28         ` Cong Wang
2016-11-27 16:23           ` Eric Dumazet
2016-11-28  6:53             ` Cong Wang
2016-11-28 11:22               ` Herbert Xu
2016-11-29  4:33                 ` Cong Wang
2016-11-30  0:49                 ` David Miller

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).