netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH net-next] netlink: avoid namespace change while creating socket
@ 2015-05-04  9:22 Ying Xue
  2015-05-05  1:52 ` Herbert Xu
  0 siblings, 1 reply; 4+ messages in thread
From: Ying Xue @ 2015-05-04  9:22 UTC (permalink / raw)
  To: herbert, xemul; +Cc: den, davem, avagin, netdev

Commit 23fe18669e7f ("[NETNS]: Fix race between put_net() and
netlink_kernel_create().") attempts to fix the following race
scenario:

put_net()
  if (atomic_dec_and_test(&net->refcnt))
    /* true */
      __put_net(net);
        queue_work(...);

/*
 * note: the net now has refcnt 0, but still in
 * the global list of net namespaces
 */

== re-schedule ==

register_pernet_subsys(&some_ops);
  register_pernet_operations(&some_ops);
    (*some_ops)->init(net);
      /*
       * we call netlink_kernel_create() here
       * in some places
       */
      netlink_kernel_create();
         sk_alloc();
            get_net(net); /* refcnt = 1 */
         /*
          * now we drop the net refcount not to
          * block the net namespace exit in the
          * future (or this can be done on the
          * error path)
          */
         put_net(sk->sk_net);
             if (atomic_dec_and_test(&...))
                   /*
                    * true. BOOOM! The net is
                    * scheduled for release twice
                    */

In order to prevent the race from happening, the commit adopted the
following solution: create netlink socket inside init_net namespace
and then re-attach it to the desired one right the socket is created;
similarly, when closing the socket, first move its namespace to
init_net so that the socket can be destroyed in the same context of
the socket creation.

Actually the proposal artificially makes the whole thing complex.
Instead there exists a simpler solution to avoid the risk of net
double release: if we find that the net reference counter reaches
zero before the reference counter is increased in sk_alloc(), we
can identify that the process of the net namespace exit happening
in workqueue is not finished yet. At the moment, we should immediately
exit from sk_alloc() to avoid the risk. The method is not only simple
and easily understandable, but also it can help to avoid the redundant
namespace change. After this, both creation and deletion of netlink
socket happen in its desired namespace all the time.

Signed-off-by: Ying Xue <ying.xue@windriver.com>
---
 net/core/sock.c          |    7 ++++++-
 net/netlink/af_netlink.c |   14 ++------------
 2 files changed, 8 insertions(+), 13 deletions(-)

diff --git a/net/core/sock.c b/net/core/sock.c
index e891bcf..88fdf2c 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1411,7 +1411,12 @@ struct sock *sk_alloc(struct net *net, int family, gfp_t priority,
 		 */
 		sk->sk_prot = sk->sk_prot_creator = prot;
 		sock_lock_init(sk);
-		sock_net_set(sk, get_net(net));
+		net = maybe_get_net(net);
+		if (!net) {
+			sk_prot_free(prot, sk);
+			return NULL;
+		}
+		sock_net_set(sk, net);
 		atomic_set(&sk->sk_wmem_alloc, 1);
 
 		sock_update_classid(sk);
diff --git a/net/netlink/af_netlink.c b/net/netlink/af_netlink.c
index ec4adbd..3914662 100644
--- a/net/netlink/af_netlink.c
+++ b/net/netlink/af_netlink.c
@@ -2473,17 +2473,10 @@ __netlink_kernel_create(struct net *net, int unit, struct module *module,
 	if (sock_create_lite(PF_NETLINK, SOCK_DGRAM, unit, &sock))
 		return NULL;
 
-	/*
-	 * We have to just have a reference on the net from sk, but don't
-	 * get_net it. Besides, we cannot get and then put the net here.
-	 * So we create one inside init_net and the move it to net.
-	 */
-
-	if (__netlink_create(&init_net, sock, cb_mutex, unit) < 0)
+	if (__netlink_create(net, sock, cb_mutex, unit) < 0)
 		goto out_sock_release_nosk;
 
 	sk = sock->sk;
-	sk_change_net(sk, net);
 
 	if (!cfg || cfg->groups < 32)
 		groups = 32;
@@ -2527,9 +2520,6 @@ __netlink_kernel_create(struct net *net, int unit, struct module *module,
 
 out_sock_release:
 	kfree(listeners);
-	netlink_kernel_release(sk);
-	return NULL;
-
 out_sock_release_nosk:
 	sock_release(sock);
 	return NULL;
@@ -2539,7 +2529,7 @@ EXPORT_SYMBOL(__netlink_kernel_create);
 void
 netlink_kernel_release(struct sock *sk)
 {
-	sk_release_kernel(sk);
+	sock_release(sk->sk_socket);
 }
 EXPORT_SYMBOL(netlink_kernel_release);
 
-- 
1.7.9.5

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH net-next] netlink: avoid namespace change while creating socket
  2015-05-04  9:22 [RFC PATCH net-next] netlink: avoid namespace change while creating socket Ying Xue
@ 2015-05-05  1:52 ` Herbert Xu
  2015-05-05  2:25   ` Ying Xue
  2015-05-05  2:38   ` Ying Xue
  0 siblings, 2 replies; 4+ messages in thread
From: Herbert Xu @ 2015-05-05  1:52 UTC (permalink / raw)
  To: Ying Xue; +Cc: xemul, den, davem, avagin, netdev

On Mon, May 04, 2015 at 05:22:19PM +0800, Ying Xue wrote:
> Commit 23fe18669e7f ("[NETNS]: Fix race between put_net() and
> netlink_kernel_create().") attempts to fix the following race
> scenario:
> 
> put_net()
>   if (atomic_dec_and_test(&net->refcnt))
>     /* true */
>       __put_net(net);
>         queue_work(...);
> 
> /*
>  * note: the net now has refcnt 0, but still in
>  * the global list of net namespaces
>  */
> 
> == re-schedule ==
> 
> register_pernet_subsys(&some_ops);
>   register_pernet_operations(&some_ops);
>     (*some_ops)->init(net);
>       /*
>        * we call netlink_kernel_create() here
>        * in some places
>        */
>       netlink_kernel_create();
>          sk_alloc();
>             get_net(net); /* refcnt = 1 */
>          /*
>           * now we drop the net refcount not to
>           * block the net namespace exit in the
>           * future (or this can be done on the
>           * error path)
>           */
>          put_net(sk->sk_net);
>              if (atomic_dec_and_test(&...))
>                    /*
>                     * true. BOOOM! The net is
>                     * scheduled for release twice
>                     */

Surely the problem here is that the caller of netlink_kernel_create
should hold a ref count on net, so why doesn't it?

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH net-next] netlink: avoid namespace change while creating socket
  2015-05-05  1:52 ` Herbert Xu
@ 2015-05-05  2:25   ` Ying Xue
  2015-05-05  2:38   ` Ying Xue
  1 sibling, 0 replies; 4+ messages in thread
From: Ying Xue @ 2015-05-05  2:25 UTC (permalink / raw)
  To: Herbert Xu; +Cc: xemul, den, davem, avagin, netdev

On 05/05/2015 09:52 AM, Herbert Xu wrote:
> On Mon, May 04, 2015 at 05:22:19PM +0800, Ying Xue wrote:
>> Commit 23fe18669e7f ("[NETNS]: Fix race between put_net() and
>> netlink_kernel_create().") attempts to fix the following race
>> scenario:
>>
>> put_net()
>>   if (atomic_dec_and_test(&net->refcnt))
>>     /* true */
>>       __put_net(net);
>>         queue_work(...);
>>
>> /*
>>  * note: the net now has refcnt 0, but still in
>>  * the global list of net namespaces
>>  */
>>
>> == re-schedule ==
>>
>> register_pernet_subsys(&some_ops);
>>   register_pernet_operations(&some_ops);
>>     (*some_ops)->init(net);
>>       /*
>>        * we call netlink_kernel_create() here
>>        * in some places
>>        */
>>       netlink_kernel_create();
>>          sk_alloc();
>>             get_net(net); /* refcnt = 1 */
>>          /*
>>           * now we drop the net refcount not to
>>           * block the net namespace exit in the
>>           * future (or this can be done on the
>>           * error path)
>>           */
>>          put_net(sk->sk_net);
>>              if (atomic_dec_and_test(&...))
>>                    /*
>>                     * true. BOOOM! The net is
>>                     * scheduled for release twice
>>                     */
> 
> Surely the problem here is that the caller of netlink_kernel_create
> should hold a ref count on net, so why doesn't it?
> 

I guess the main reason is because calling netlink_kernel_create() just happens
on the path of registering a network namespace subsystem. When
__register_pernet_operations() iterates the global list of net namespace(ie,
net_namespace_list) with for_each_net() to call each net's ops_init(ops, net),
it's supposed that it's safe to touch net instance without holding its refcount
as the net_namespace_list is being protected by net_mutex lock. More
importantly, even if the net refcount is decremented to 0 in putnet(), the net
is still in the global list of net namesapces(ie, net_namespace_list). This is
why the race happens.

Regards,
Ying


> Cheers,
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [RFC PATCH net-next] netlink: avoid namespace change while creating socket
  2015-05-05  1:52 ` Herbert Xu
  2015-05-05  2:25   ` Ying Xue
@ 2015-05-05  2:38   ` Ying Xue
  1 sibling, 0 replies; 4+ messages in thread
From: Ying Xue @ 2015-05-05  2:38 UTC (permalink / raw)
  To: Herbert Xu; +Cc: xemul, den, davem, avagin, netdev

On 05/05/2015 09:52 AM, Herbert Xu wrote:
> On Mon, May 04, 2015 at 05:22:19PM +0800, Ying Xue wrote:
>> Commit 23fe18669e7f ("[NETNS]: Fix race between put_net() and
>> netlink_kernel_create().") attempts to fix the following race
>> scenario:
>>
>> put_net()
>>   if (atomic_dec_and_test(&net->refcnt))
>>     /* true */
>>       __put_net(net);
>>         queue_work(...);
>>
>> /*
>>  * note: the net now has refcnt 0, but still in
>>  * the global list of net namespaces
>>  */
>>
>> == re-schedule ==
>>
>> register_pernet_subsys(&some_ops);
>>   register_pernet_operations(&some_ops);
>>     (*some_ops)->init(net);
>>       /*
>>        * we call netlink_kernel_create() here
>>        * in some places
>>        */
>>       netlink_kernel_create();
>>          sk_alloc();
>>             get_net(net); /* refcnt = 1 */
>>          /*
>>           * now we drop the net refcount not to
>>           * block the net namespace exit in the
>>           * future (or this can be done on the
>>           * error path)
>>           */
>>          put_net(sk->sk_net);
>>              if (atomic_dec_and_test(&...))
>>                    /*
>>                     * true. BOOOM! The net is
>>                     * scheduled for release twice
>>                     */
> 
> Surely the problem here is that the caller of netlink_kernel_create
> should hold a ref count on net, so why doesn't it?
> 

In addition, even if the caller of netlink_kernel_create() holds the net
refcount again, it's still unable to prevent the issue of releasing net twice
from happening. This is because the net's refcount is already decreased to 0 in
put_net(), which means the net will be destroyed in the future whatever we take
its refcount or not. In other words, once refcount reaches zero, we absolutely
should not touch the net again.

Regards,
Ying

> Cheers,
> 

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-05-05  2:39 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-05-04  9:22 [RFC PATCH net-next] netlink: avoid namespace change while creating socket Ying Xue
2015-05-05  1:52 ` Herbert Xu
2015-05-05  2:25   ` Ying Xue
2015-05-05  2:38   ` Ying Xue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).