All of lore.kernel.org
 help / color / mirror / Atom feed
From: Ying Xue <ying.xue@windriver.com>
To: "Eric W. Biederman" <ebiederm@xmission.com>
Cc: <netdev@vger.kernel.org>, <cwang@twopensource.com>,
	<herbert@gondor.apana.org.au>, <xemul@openvz.org>,
	<davem@davemloft.net>, <eric.dumazet@gmail.com>,
	<maxk@qti.qualcomm.com>, <stephen@networkplumber.org>,
	<tgraf@suug.ch>, <nicolas.dichtel@6wind.com>,
	<tom@herbertland.com>, <jchapman@katalix.com>,
	<erik.hugne@ericsson.com>, <jon.maloy@ericsson.com>,
	<horms@verge.net.au>
Subject: Re: [RFC PATCH net-next 00/11] netns: don't switch namespace while creating kernel sockets
Date: Fri, 8 May 2015 17:25:01 +0800	[thread overview]
Message-ID: <554C80ED.1070603@windriver.com> (raw)
In-Reply-To: <554C78BE.100@windriver.com>

On 05/08/2015 04:50 PM, Ying Xue wrote:
> On 05/08/2015 12:14 AM, Eric W. Biederman wrote:
>> I agree that commit 23fe18669e7f ("[NETNS]: Fix race between put_net()
>> and netlink_kernel_create()."  was a hack.
>>
> 
> Thanks for the agreement :)
> 
>> However it is not appropriate to call get_net on a network namespace
>> whose count might be zero.
> 
> I will explain why it's still safe for us in another mail even if we do this.
> 
>   I believe all of your patches rely on that
>> currently.
> 
> Yes, you are right.
> 
>   Instead we need to build something like sk_release_kernel
>> that does not increase the network namespace reference count
> 
> Please refer to above comments.
> 
>  if you are
>> going to avoid changing the network namespace on a socket (a worthy
>> goal).
>>
> 
> Thanks to the conformation for the effort!
> 
>> The following change shows how it is possible to always know that your
>> network namespace has a non-zero reference count in the network
>> namespace initialization methods.  My implementation of
>> lock_network_namespaces is problematic in that it does not sleep
>> while network namespaces are unregistering.  But it is enough to show
>> how the locking and reference counting can be fixed.
>>
> 
> If my understanding for your proposal is right, register_pernet_subsys() will
> return a failed error code to its caller if it's found that there is a net whose
> refcount is zero from net_namespace_list in lock_network_namespaces(), which
> means the per namespace operation is not registered at all in this situation.
> But in practice we should not reject the registration even if there is a dead
> net linked in the global net_namespace_list.
> 

Please consider tipc case below:
tipc_init()
  register_pernet_subsys(&tipc_net_ops)
    lock_network_namespaces();
       for_each_net(net) {
         if (!maybe_get_net(net))
            goto undo; //we should exit from the entire process of registration.

This is obviously unreasonable for us. TIPC module is unable to be loaded
successfully as there is a dead net in net_namespace_list.

Regards,
Ying

> Regards,
> Ying
> 
>> Eric
>>
>>
>> diff --git a/net/core/net_namespace.c b/net/core/net_namespace.c
>> index a3abb719221f..81c53ccc5764 100644
>> --- a/net/core/net_namespace.c
>> +++ b/net/core/net_namespace.c
>> @@ -822,6 +822,49 @@ static void unregister_pernet_operations(struct pernet_operations *ops)
>>  		ida_remove(&net_generic_ids, *ops->id);
>>  }
>>  
>> +static void unlock_network_namespaces(void)
>> +{
>> +	/* Drop the reference count to every network namespace
>> +	 * and then release the net_mutex.
>> +	 */
>> +	struct net *net;
>> +
>> +	for_each_net(net)
>> +		put_net(net);
>> +
>> +	mutex_unlock(&net_mutex);
>> +}
>> +
>> +static void lock_network_namespaces(void)
>> +{
>> +	/* Take the mutex lock ensuring no new network namespaces
>> +	 * and take a reference on all existing network namespaces
>> +	 * allowing network namespace initialization code to take
>> +	 * further references
>> +	 */
>> +	for (;;) {
>> +		struct net *net, *stop;
>> +
>> +		mutex_lock(&net_mutex);
>> +		for_each_net(net) {
>> +			if (!maybe_get_net(net))
>> +				goto undo;
>> +		}
>> +		return;
>> +undo:
>> +		/* Remember the network namespace whose reference
>> +		 * count was not acquired. */
>> +		stop = net;
>> +		for_each_net(net) {
>> +			if (net_eq(net, stop))
>> +				goto undone;
>> +			put_net(net);
>> +		}
>> +undone:
>> +		mutex_unlock(&net_mutex);
>> +	}
>> +}
>> +
>>  /**
>>   *      register_pernet_subsys - register a network namespace subsystem
>>   *	@ops:  pernet operations structure for the subsystem
>> @@ -844,9 +887,9 @@ static void unregister_pernet_operations(struct pernet_operations *ops)
>>  int register_pernet_subsys(struct pernet_operations *ops)
>>  {
>>  	int error;
>> -	mutex_lock(&net_mutex);
>> +	lock_network_namespaces();
>>  	error =  register_pernet_operations(first_device, ops);
>> -	mutex_unlock(&net_mutex);
>> +	unlock_network_namespaces();
>>  	return error;
>>  }
>>  EXPORT_SYMBOL_GPL(register_pernet_subsys);
>> @@ -890,11 +933,11 @@ EXPORT_SYMBOL_GPL(unregister_pernet_subsys);
>>  int register_pernet_device(struct pernet_operations *ops)
>>  {
>>  	int error;
>> -	mutex_lock(&net_mutex);
>> +	lock_network_namespaces();
>>  	error = register_pernet_operations(&pernet_list, ops);
>>  	if (!error && (first_device == &pernet_list))
>>  		first_device = &ops->list;
>> -	mutex_unlock(&net_mutex);
>> +	unlock_network_namespaces();
>>  	return error;
>>  }
>>  EXPORT_SYMBOL_GPL(register_pernet_device);
> 
> --
> To unsubscribe from this list: send the line "unsubscribe netdev" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

  reply	other threads:[~2015-05-08  9:26 UTC|newest]

Thread overview: 56+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-05-07  8:52 [RFC PATCH net-next 00/11] netns: don't switch namespace while creating kernel sockets Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 01/11] netns: Fix race between put_net() and netlink_kernel_create() Ying Xue
2015-05-07  9:04   ` Herbert Xu
2015-05-07 17:19     ` Cong Wang
2015-05-07 17:28       ` Eric W. Biederman
2015-05-08 11:20       ` Eric W. Biederman
2015-05-08 11:20       ` Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 02/11] netlink: avoid unnecessary namespace switch when create netlink kernel sockets Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 03/11] tun: avoid unnecessary namespace switch during kernel socket creation Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 04/11] inet: " Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 05/11] udp_tunnel: avoid to switch namespace for tunnel socket Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 06/11] ip6_udp_tunnel: " Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 07/11] l2tp: avoid to switch namespace for l2tp " Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 08/11] ipvs: avoid to switch namespace for ipvs kernel socket Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 09/11] tipc: fix net leak issue Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 10/11] tipc: remove sk_change_net interface Ying Xue
2015-05-07  8:52 ` [RFC PATCH net-next 11/11] net: change behaviours of functions of creating and releasing kernel sockets Ying Xue
2015-05-07 16:14 ` [RFC PATCH net-next 00/11] netns: don't switch namespace while creating " Eric W. Biederman
2015-05-07 18:19   ` Cong Wang
2015-05-07 18:26     ` Eric W. Biederman
2015-05-07 18:53       ` Cong Wang
2015-05-07 18:58         ` Eric W. Biederman
2015-05-07 19:29           ` Cong Wang
2015-05-07 20:01             ` Eric W. Biederman
2015-05-08  9:10               ` Ying Xue
2015-05-08 11:15                 ` Eric W. Biederman
2015-05-08  8:50   ` Ying Xue
2015-05-08  9:25     ` Ying Xue [this message]
2015-05-08 11:07     ` Eric W. Biederman
2015-05-08 16:33       ` Cong Wang
2015-05-08 14:07   ` Herbert Xu
2015-05-08 17:36     ` Eric W. Biederman
2015-05-08 20:27       ` Cong Wang
2015-05-08 21:13         ` Cong Wang
2015-05-08 22:08           ` Eric W. Biederman
2015-05-09  1:13       ` Herbert Xu
2015-05-09  1:53         ` Eric W. Biederman
2015-05-09  2:05         ` [PATCH 0/6] Cleanup the " Eric W. Biederman
2015-05-09  2:07           ` [PATCH 1/6] tun: Utilize the normal socket network namespace refcounting Eric W. Biederman
2015-05-09  2:08           ` [PATCH 2/6] net: Add a struct net parameter to sock_create_kern Eric W. Biederman
2015-05-12  8:24             ` David Laight
2015-05-12  8:55               ` Eric W. Biederman
2015-05-12 11:48                 ` David Laight
2015-05-12 12:28                   ` Nicolas Dichtel
2015-05-12 13:16                     ` David Laight
2015-05-12 14:15                       ` Nicolas Dichtel
2015-05-12 15:58                       ` Eric W. Biederman
2015-05-12 14:45               ` David Miller
2015-05-09  2:09           ` [PATCH 3/6] net: Pass kern from net_proto_family.create to sk_alloc Eric W. Biederman
2015-05-09 16:51             ` Eric Dumazet
2015-05-09 17:31               ` Eric W. Biederman
2015-05-09  2:10           ` [PATCH 4/6] net: Modify sk_alloc to not reference count the netns of kernel sockets Eric W. Biederman
2015-05-09  2:11           ` [PATCH 5/6] netlink: Create kernel netlink sockets in the proper network namespace Eric W. Biederman
2015-05-09  2:12           ` [PATCH 6/6] net: kill sk_change_net and sk_release_kernel Eric W. Biederman
2015-05-09  2:38           ` [PATCH 0/6] Cleanup the kernel sockets Herbert Xu
2015-05-11 14:53           ` David Miller

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=554C80ED.1070603@windriver.com \
    --to=ying.xue@windriver.com \
    --cc=cwang@twopensource.com \
    --cc=davem@davemloft.net \
    --cc=ebiederm@xmission.com \
    --cc=eric.dumazet@gmail.com \
    --cc=erik.hugne@ericsson.com \
    --cc=herbert@gondor.apana.org.au \
    --cc=horms@verge.net.au \
    --cc=jchapman@katalix.com \
    --cc=jon.maloy@ericsson.com \
    --cc=maxk@qti.qualcomm.com \
    --cc=netdev@vger.kernel.org \
    --cc=nicolas.dichtel@6wind.com \
    --cc=stephen@networkplumber.org \
    --cc=tgraf@suug.ch \
    --cc=tom@herbertland.com \
    --cc=xemul@openvz.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.