netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances.
@ 2011-11-20  7:22 Tetsuo Handa
  2011-11-20 23:13 ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2011-11-20  7:22 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev

In order to solve below problems, can we add sysctl variable for
restricting number of NET_NS instances?
--------------------------------------------------
[PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances.

CONFIG_NET_NS support in 2.6.32 has a problem that leads to OOM killer when
clone(CLONE_NEWNET) is called instantly.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
But disabling CONFIG_NET_NS broke lxc containers.
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/790863

This patch introduces /proc/sys/net/core/netns_max interface that limits
max number of network namespace instances.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/net/sock.h         |    4 ++++
 net/core/net_namespace.c   |    9 +++++++++
 net/core/sysctl_net_core.c |   10 ++++++++++
 3 files changed, 23 insertions(+)

--- linux-2.6.32.48.orig/include/net/sock.h
+++ linux-2.6.32.48/include/net/sock.h
@@ -1598,4 +1598,8 @@ extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
+#ifdef CONFIG_NET_NS
+extern int max_netns_count;
+#endif
+
 #endif	/* _SOCK_H */
--- linux-2.6.32.48.orig/net/core/net_namespace.c
+++ linux-2.6.32.48/net/core/net_namespace.c
@@ -81,12 +81,18 @@ static struct net_generic *net_alloc_gen
 #ifdef CONFIG_NET_NS
 static struct kmem_cache *net_cachep;
 static struct workqueue_struct *netns_wq;
+static atomic_t used_netns_count = ATOMIC_INIT(0);
+unsigned int max_netns_count;
 
 static struct net *net_alloc(void)
 {
 	struct net *net = NULL;
 	struct net_generic *ng;
 
+	atomic_inc(&used_netns_count);
+	if (atomic_read(&used_netns_count) > max_netns_count)
+		goto out;
+
 	ng = net_alloc_generic();
 	if (!ng)
 		goto out;
@@ -96,7 +102,9 @@ static struct net *net_alloc(void)
 		goto out_free;
 
 	rcu_assign_pointer(net->gen, ng);
+	return net;
 out:
+	atomic_dec(&used_netns_count);
 	return net;
 
 out_free:
@@ -115,6 +123,7 @@ static void net_free(struct net *net)
 #endif
 	kfree(net->gen);
 	kmem_cache_free(net_cachep, net);
+	atomic_dec(&used_netns_count);
 }
 
 static struct net *net_create(void)
--- linux-2.6.32.48.orig/net/core/sysctl_net_core.c
+++ linux-2.6.32.48/net/core/sysctl_net_core.c
@@ -89,6 +89,16 @@ static struct ctl_table net_core_table[]
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+#ifdef CONFIG_NET_NS
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "netns_max",
+		.data           = &max_netns_count,
+		.maxlen         = sizeof(int),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec,
+	},
+#endif
 #endif /* CONFIG_NET */
 	{
 		.ctl_name	= NET_CORE_BUDGET,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances.
  2011-11-20  7:22 [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances Tetsuo Handa
@ 2011-11-20 23:13 ` Eric W. Biederman
  2011-11-21  1:57   ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2011-11-20 23:13 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> In order to solve below problems, can we add sysctl variable for
> restricting number of NET_NS instances?

I don't have any particular problems with patch but I don't think it
will result in a working system that is easy to keep working.  Tuning
static limits can be fickle.

Simply throttling the number of processes as anything reasonable will do
should keep the problem in check.  The practical issue is that we have
a huge build of network namespaces that don't get cleaned up.

My inclination in this case the practical fix is that during network
namespace allocation someone take a look at the cleanup_list.  See
that there is ongoing cleanup activity, and wait until at least one
network namespace has cleaned up.  Perhaps by creating a work struct
and waiting for it to cycle through the netns workqueue.

That should throttle network namespace creation to the same speed as
network namespace deletion and prevent the problem of too many
dead network namespaces building up and taking resources.

Eric


> --------------------------------------------------
> [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances.
>
> CONFIG_NET_NS support in 2.6.32 has a problem that leads to OOM killer when
> clone(CLONE_NEWNET) is called instantly.
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/720095
> But disabling CONFIG_NET_NS broke lxc containers.
> https://bugs.launchpad.net/ubuntu/+source/linux/+bug/790863
>
> This patch introduces /proc/sys/net/core/netns_max interface that limits
> max number of network namespace instances.
>
> Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
> ---
>  include/net/sock.h         |    4 ++++
>  net/core/net_namespace.c   |    9 +++++++++
>  net/core/sysctl_net_core.c |   10 ++++++++++
>  3 files changed, 23 insertions(+)
>
> --- linux-2.6.32.48.orig/include/net/sock.h
> +++ linux-2.6.32.48/include/net/sock.h
> @@ -1598,4 +1598,8 @@ extern int sysctl_optmem_max;
>  extern __u32 sysctl_wmem_default;
>  extern __u32 sysctl_rmem_default;
>  
> +#ifdef CONFIG_NET_NS
> +extern int max_netns_count;
> +#endif
> +
>  #endif	/* _SOCK_H */
> --- linux-2.6.32.48.orig/net/core/net_namespace.c
> +++ linux-2.6.32.48/net/core/net_namespace.c
> @@ -81,12 +81,18 @@ static struct net_generic *net_alloc_gen
>  #ifdef CONFIG_NET_NS
>  static struct kmem_cache *net_cachep;
>  static struct workqueue_struct *netns_wq;
> +static atomic_t used_netns_count = ATOMIC_INIT(0);
> +unsigned int max_netns_count;
>  
>  static struct net *net_alloc(void)
>  {
>  	struct net *net = NULL;
>  	struct net_generic *ng;
>  
> +	atomic_inc(&used_netns_count);
> +	if (atomic_read(&used_netns_count) > max_netns_count)
> +		goto out;
> +
>  	ng = net_alloc_generic();
>  	if (!ng)
>  		goto out;
> @@ -96,7 +102,9 @@ static struct net *net_alloc(void)
>  		goto out_free;
>  
>  	rcu_assign_pointer(net->gen, ng);
> +	return net;
>  out:
> +	atomic_dec(&used_netns_count);
>  	return net;
>  
>  out_free:
> @@ -115,6 +123,7 @@ static void net_free(struct net *net)
>  #endif
>  	kfree(net->gen);
>  	kmem_cache_free(net_cachep, net);
> +	atomic_dec(&used_netns_count);
>  }
>  
>  static struct net *net_create(void)
> --- linux-2.6.32.48.orig/net/core/sysctl_net_core.c
> +++ linux-2.6.32.48/net/core/sysctl_net_core.c
> @@ -89,6 +89,16 @@ static struct ctl_table net_core_table[]
>  		.mode		= 0644,
>  		.proc_handler	= proc_dointvec
>  	},
> +#ifdef CONFIG_NET_NS
> +	{
> +		.ctl_name       = CTL_UNNUMBERED,
> +		.procname       = "netns_max",
> +		.data           = &max_netns_count,
> +		.maxlen         = sizeof(int),
> +		.mode           = 0644,
> +		.proc_handler   = proc_dointvec,
> +	},
> +#endif
>  #endif /* CONFIG_NET */
>  	{
>  		.ctl_name	= NET_CORE_BUDGET,

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances.
  2011-11-20 23:13 ` Eric W. Biederman
@ 2011-11-21  1:57   ` Tetsuo Handa
  2011-11-21  2:45     ` Eric W. Biederman
  0 siblings, 1 reply; 5+ messages in thread
From: Tetsuo Handa @ 2011-11-21  1:57 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev

Eric W. Biederman wrote:
> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
> 
> > In order to solve below problems, can we add sysctl variable for
> > restricting number of NET_NS instances?
> 
> I don't have any particular problems with patch but I don't think it
> will result in a working system that is easy to keep working.  Tuning
> static limits can be fickle.

What I worry is that, although clone() is an operation that is allowed to
sleep, waiting for too long might be annoying for users, especially when the
user cannot easily send Ctrl-C or SIGKILL. (I think ftp client is an example.)

> My inclination in this case the practical fix is that during network
> namespace allocation someone take a look at the cleanup_list.  See
> that there is ongoing cleanup activity, and wait until at least one
> network namespace has cleaned up.  Perhaps by creating a work struct
> and waiting for it to cycle through the netns workqueue.

Are you suggesting that we should wait only when "the number of NET_NS
instances exceeded quota" and "there is a dead NET_NS instance"?
In other words, let clone() fail immediately if "the number of NET_NS
instances exceeded quota" but "cleanup_list is empty"?

If you are suggesting that we should always wait until "the number of NET_NS
instances becomes smaller than quota", clone() might sleep too long when the
user cannot easily send signals.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances.
  2011-11-21  1:57   ` Tetsuo Handa
@ 2011-11-21  2:45     ` Eric W. Biederman
  2011-11-21 13:07       ` Tetsuo Handa
  0 siblings, 1 reply; 5+ messages in thread
From: Eric W. Biederman @ 2011-11-21  2:45 UTC (permalink / raw)
  To: Tetsuo Handa; +Cc: netdev

Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:

> Eric W. Biederman wrote:
>> Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> writes:
>> 
>> > In order to solve below problems, can we add sysctl variable for
>> > restricting number of NET_NS instances?
>> 
>> I don't have any particular problems with patch but I don't think it
>> will result in a working system that is easy to keep working.  Tuning
>> static limits can be fickle.
>
> What I worry is that, although clone() is an operation that is allowed to
> sleep, waiting for too long might be annoying for users, especially when the
> user cannot easily send Ctrl-C or SIGKILL. (I think ftp client is an
> example.)

An ftp client can always close the connection.  We already have to
contend for the net_mutex when both creating and destroying network
namespaces so I would be surprised if it is actually a problem.

But the reality is that under high connection load if we actually want
to use network namespaces we have to wait for previous network
namespaces to clean up.  So I am not particularly worried.  Especially
since most of the cleanup speed issues when there is a backlog have
been fixed in more recent kernels.

>> My inclination in this case the practical fix is that during network
>> namespace allocation someone take a look at the cleanup_list.  See
>> that there is ongoing cleanup activity, and wait until at least one
>> network namespace has cleaned up.  Perhaps by creating a work struct
>> and waiting for it to cycle through the netns workqueue.
>
> Are you suggesting that we should wait only when "the number of NET_NS
> instances exceeded quota" and "there is a dead NET_NS instance"?
> In other words, let clone() fail immediately if "the number of NET_NS
> instances exceeded quota" but "cleanup_list is empty"?
>
> If you are suggesting that we should always wait until "the number of NET_NS
> instances becomes smaller than quota", clone() might sleep too long when the
> user cannot easily send signals.

I am suggesting that if a netns instance is being cleaned up we should
wait for one netns instance to be cleaned up.  A single netns instance
does not take long to clean up (in general).  But a lot of netns
instances do take a while.

With waiting for one netns instance to be cleaned up we should be able
to guarantee that we don't develop a substantial backlog network
namespaces to be cleaned up.  And that was the problem.

I don't expect we need to do anything if there are no network namespaces
not being cleaned up.

There is of course debian's solution which was to simply tweak vsftp
to not use network namespaces on 2.6.32 and only enable the feature
on later kernels.  But you seem to want to do something a little
more substantial than that.

Eric

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances.
  2011-11-21  2:45     ` Eric W. Biederman
@ 2011-11-21 13:07       ` Tetsuo Handa
  0 siblings, 0 replies; 5+ messages in thread
From: Tetsuo Handa @ 2011-11-21 13:07 UTC (permalink / raw)
  To: ebiederm; +Cc: netdev

Eric W. Biederman wrote:
> I am suggesting that if a netns instance is being cleaned up we should
> wait for one netns instance to be cleaned up.  A single netns instance
> does not take long to clean up (in general).  But a lot of netns
> instances do take a while.
> 
> With waiting for one netns instance to be cleaned up we should be able
> to guarantee that we don't develop a substantial backlog network
> namespaces to be cleaned up.  And that was the problem.

Ah, you are suggesting to wait for completion of cleaning up.
But unfortunately, we cannot sleep at __put_net().

> But the reality is that under high connection load if we actually want
> to use network namespaces we have to wait for previous network
> namespaces to clean up.

Right. Some quota with wait queue like below is wanted?
----------
This patch introduces /proc/sys/net/core/netns_max interface that limits
max number of network namespace instances. When number of network namespace
instances exceeded this quota, the process will be blocked in a killable state
rather than returning immediately. Setting this quota to 0 means nobody can
create new network namespace instances.

Signed-off-by: Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp>
---
 include/net/sock.h         |    4 ++++
 net/core/net_namespace.c   |   15 +++++++++++++++
 net/core/sysctl_net_core.c |   10 ++++++++++
 3 files changed, 29 insertions(+)

--- linux-2.6.32.48.orig/include/net/sock.h
+++ linux-2.6.32.48/include/net/sock.h
@@ -1598,4 +1598,8 @@ extern int sysctl_optmem_max;
 extern __u32 sysctl_wmem_default;
 extern __u32 sysctl_rmem_default;
 
+#ifdef CONFIG_NET_NS
+extern int max_netns_count;
+#endif
+
 #endif	/* _SOCK_H */
--- linux-2.6.32.48.orig/net/core/net_namespace.c
+++ linux-2.6.32.48/net/core/net_namespace.c
@@ -81,12 +81,22 @@ static struct net_generic *net_alloc_gen
 #ifdef CONFIG_NET_NS
 static struct kmem_cache *net_cachep;
 static struct workqueue_struct *netns_wq;
+static atomic_t used_netns_count = ATOMIC_INIT(0);
+static DECLARE_WAIT_QUEUE_HEAD(netns_alloc_wait);
+unsigned int max_netns_count;
 
 static struct net *net_alloc(void)
 {
 	struct net *net = NULL;
 	struct net_generic *ng;
 
+	atomic_inc(&used_netns_count);
+	wait_event_killable(netns_alloc_wait,
+			    !max_netns_count ||
+			    atomic_read(&used_netns_count) <= max_netns_count);
+	if (atomic_read(&used_netns_count) > max_netns_count)
+		goto out;
+
 	ng = net_alloc_generic();
 	if (!ng)
 		goto out;
@@ -96,7 +106,10 @@ static struct net *net_alloc(void)
 		goto out_free;
 
 	rcu_assign_pointer(net->gen, ng);
+	return net;
 out:
+	atomic_dec(&used_netns_count);
+	wake_up(&netns_alloc_wait);
 	return net;
 
 out_free:
@@ -115,6 +128,8 @@ static void net_free(struct net *net)
 #endif
 	kfree(net->gen);
 	kmem_cache_free(net_cachep, net);
+	atomic_dec(&used_netns_count);
+	wake_up(&netns_alloc_wait);
 }
 
 static struct net *net_create(void)
--- linux-2.6.32.48.orig/net/core/sysctl_net_core.c
+++ linux-2.6.32.48/net/core/sysctl_net_core.c
@@ -89,6 +89,16 @@ static struct ctl_table net_core_table[]
 		.mode		= 0644,
 		.proc_handler	= proc_dointvec
 	},
+#ifdef CONFIG_NET_NS
+	{
+		.ctl_name       = CTL_UNNUMBERED,
+		.procname       = "netns_max",
+		.data           = &max_netns_count,
+		.maxlen         = sizeof(int),
+		.mode           = 0644,
+		.proc_handler   = proc_dointvec,
+	},
+#endif
 #endif /* CONFIG_NET */
 	{
 		.ctl_name	= NET_CORE_BUDGET,

> There is of course debian's solution which was to simply tweak vsftp
> to not use network namespaces on 2.6.32 and only enable the feature
> on later kernels.

Thank you for info. I added link to this thread to #790863 and would like to
wait for how they want to handle this problem. (Maybe they don't reenable
NET_NS, maybe they reenable NET_NS with some quota.)

Thanks.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2011-11-21 13:08 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-20  7:22 [PATCH for 2.6.32 (untested)] netns: Add quota for number of NET_NS instances Tetsuo Handa
2011-11-20 23:13 ` Eric W. Biederman
2011-11-21  1:57   ` Tetsuo Handa
2011-11-21  2:45     ` Eric W. Biederman
2011-11-21 13:07       ` Tetsuo Handa

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).