* [PATCH] ns: move free_nsproxy() out of do_exit() path
@ 2012-07-13 11:48 Kirill A. Shutemov
2012-07-13 21:08 ` Andrew Morton
0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2012-07-13 11:48 UTC (permalink / raw)
To: Andrew Morton, Serge Hallyn
Cc: KOSAKI Motohiro, Al Viro, Dmitry V. Levin, Pavel Emelyanov,
Kirill A. Shutemov, Doug Ledford, linux-kernel, containers,
Kirill A. Shutemov
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
free_nsproxy() is too heavy to be on exit path. Let's free namespaces
asynchronously to not block exit_group() syscall.
Microbenchmark:
: #define _GNU_SOURCE
: #include <unistd.h>
: #include <sched.h>
: #include <stdlib.h>
: #include <sys/wait.h>
:
: int
: main(void)
: {
: int i;
: for (i = 0; i < 1024; i++) {
: if (fork()) {
: wait(NULL);
: continue;
: }
: unshare(CLONE_NEWIPC);
: exit(0);
: }
: return 0;
: }
Before the patch:
real 0m8.335s
user 0m0.000s
sys 0m0.265s
After:
real 0m0.569s
user 0m0.001s
sys 0m0.154s
The patch also fixes bug with free namespace without synchronize_rcu() through
put_nsproxy().
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
---
Serge, I've changed the patch a bit, but I hope I can reuse your Ack.
---
include/linux/nsproxy.h | 1 +
kernel/nsproxy.c | 34 +++++++++++++++++++++++-----------
2 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index cc37a55..1d26be7 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -24,6 +24,7 @@ struct fs_struct;
*/
struct nsproxy {
atomic_t count;
+ struct work_struct free_nsproxy_work;
struct uts_namespace *uts_ns;
struct ipc_namespace *ipc_ns;
struct mnt_namespace *mnt_ns;
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index b576f7f..ebc7d40 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -41,13 +41,17 @@ struct nsproxy init_nsproxy = {
#endif
};
+static void free_nsproxy_work(struct work_struct *work);
+
static inline struct nsproxy *create_nsproxy(void)
{
struct nsproxy *nsproxy;
nsproxy = kmem_cache_alloc(nsproxy_cachep, GFP_KERNEL);
- if (nsproxy)
+ if (nsproxy) {
atomic_set(&nsproxy->count, 1);
+ INIT_WORK(&nsproxy->free_nsproxy_work, free_nsproxy_work);
+ }
return nsproxy;
}
@@ -166,6 +170,14 @@ out:
void free_nsproxy(struct nsproxy *ns)
{
+ /*
+ * wait for others to get what they want from this nsproxy.
+ *
+ * cannot release this nsproxy via the call_rcu() since
+ * put_mnt_ns() will want to sleep
+ */
+ synchronize_rcu();
+
if (ns->mnt_ns)
put_mnt_ns(ns->mnt_ns);
if (ns->uts_ns)
@@ -178,6 +190,14 @@ void free_nsproxy(struct nsproxy *ns)
kmem_cache_free(nsproxy_cachep, ns);
}
+static void free_nsproxy_work(struct work_struct *work)
+{
+ struct nsproxy *ns = container_of(work, struct nsproxy,
+ free_nsproxy_work);
+
+ free_nsproxy(ns);
+}
+
/*
* Called from unshare. Unshare all the namespaces part of nsproxy.
* On success, returns the new nsproxy.
@@ -215,16 +235,8 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
rcu_assign_pointer(p->nsproxy, new);
- if (ns && atomic_dec_and_test(&ns->count)) {
- /*
- * wait for others to get what they want from this nsproxy.
- *
- * cannot release this nsproxy via the call_rcu() since
- * put_mnt_ns() will want to sleep
- */
- synchronize_rcu();
- free_nsproxy(ns);
- }
+ if (ns && atomic_dec_and_test(&ns->count))
+ schedule_work(&ns->free_nsproxy_work);
}
void exit_task_namespaces(struct task_struct *p)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] ns: move free_nsproxy() out of do_exit() path
2012-07-13 11:48 [PATCH] ns: move free_nsproxy() out of do_exit() path Kirill A. Shutemov
@ 2012-07-13 21:08 ` Andrew Morton
2012-07-13 22:23 ` Kirill A. Shutemov
2012-07-16 15:09 ` [PATCH v2] ns: do not block exit_task_namespaces() for a long time Kirill A. Shutemov
0 siblings, 2 replies; 11+ messages in thread
From: Andrew Morton @ 2012-07-13 21:08 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Serge Hallyn, KOSAKI Motohiro, Al Viro, Dmitry V. Levin,
Pavel Emelyanov, Kirill A. Shutemov, Doug Ledford, linux-kernel,
containers
On Fri, 13 Jul 2012 14:48:08 +0300
"Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> free_nsproxy() is too heavy to be on exit path. Let's free namespaces
> asynchronously to not block exit_group() syscall.
Please be specific, and complete.
Why is it "too heavy"? Where is the time being spent? Is it spent in
D state or is it spent burning CPU cycles? Does the patch simply
offload the work into kernel threads, providing no net gain?
> The patch also fixes bug with free namespace without synchronize_rcu() through
> put_nsproxy().
I just don't understand this description. Please send a new one which
includes all details about the bug, including a description of
the user-visible effects of the bug.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] ns: move free_nsproxy() out of do_exit() path
2012-07-13 21:08 ` Andrew Morton
@ 2012-07-13 22:23 ` Kirill A. Shutemov
2012-07-16 15:09 ` [PATCH v2] ns: do not block exit_task_namespaces() for a long time Kirill A. Shutemov
1 sibling, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2012-07-13 22:23 UTC (permalink / raw)
To: Andrew Morton, Pavel Emelyanov
Cc: Kirill A. Shutemov, Serge Hallyn, KOSAKI Motohiro, Al Viro,
Dmitry V. Levin, Doug Ledford, linux-kernel, containers
On Fri, Jul 13, 2012 at 02:08:06PM -0700, Andrew Morton wrote:
> On Fri, 13 Jul 2012 14:48:08 +0300
> "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> wrote:
>
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > free_nsproxy() is too heavy to be on exit path. Let's free namespaces
> > asynchronously to not block exit_group() syscall.
>
> Please be specific, and complete.
>
> Why is it "too heavy"? Where is the time being spent? Is it spent in
> D state or is it spent burning CPU cycles? Does the patch simply
> offload the work into kernel threads, providing no net gain?
Unpatched switch_task_namespaces() takes 0.010 - 0.011 seconds on my
machine. About 0.008 of the time is synchronize_rcu().
So it's mostly waiting with wait_for_completion() in wait_rcu_gp().
It means D state.
> > The patch also fixes bug with free namespace without synchronize_rcu() through
> > put_nsproxy().
>
> I just don't understand this description.
IIUC current locking model requires synchronize_rcu() before
free_nsproxy(). put_nsproxy() calls free_nsproxy() without
synchronize_rcu(). So it's racy.
I guess it was missed during switch to RCU (see cf7b708).
Pavel, am I right?
> Please send a new one which
> includes all details about the bug, including a description of
> the user-visible effects of the bug.
Okay, I will.
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 11+ messages in thread
* [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-13 21:08 ` Andrew Morton
2012-07-13 22:23 ` Kirill A. Shutemov
@ 2012-07-16 15:09 ` Kirill A. Shutemov
2012-07-16 15:39 ` Myklebust, Trond
` (2 more replies)
1 sibling, 3 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2012-07-16 15:09 UTC (permalink / raw)
To: Andrew Morton, Pavel Emelyanov
Cc: Serge Hallyn, KOSAKI Motohiro, Al Viro, Dmitry V. Levin,
Kirill A. Shutemov, Doug Ledford, linux-kernel, containers,
Kirill A. Shutemov
From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
On exiting of the last task in a namespace we need to trigger freeing of
the namespace. Currently, we call synchronize_rcu() and free_nsproxy()
directly on do_exit() path.
On my machine synchronize_rcu() blocks for about 0.01 seconds. For
comparing: normal exit_group() syscall takes less than 0.0003 seconds.
Let's offload synchronize_rcu() and free_nsproxy() to a workqueue.
I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
I guess it was missed during switch to RCU (see cf7b708).
Microbenchmark:
: #define _GNU_SOURCE
: #include <unistd.h>
: #include <sched.h>
: #include <stdlib.h>
: #include <sys/wait.h>
:
: int
: main(void)
: {
: int i;
: for (i = 0; i < 1024; i++) {
: if (fork()) {
: wait(NULL);
: continue;
: }
: unshare(CLONE_NEWIPC);
: exit(0);
: }
: return 0;
: }
Before the patch:
real 0m8.335s
user 0m0.000s
sys 0m0.265s
After:
real 0m0.569s
user 0m0.001s
sys 0m0.154s
Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
---
v2:
- Updated description.
---
include/linux/nsproxy.h | 1 +
kernel/nsproxy.c | 34 +++++++++++++++++++++++-----------
2 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
index cc37a55..1d26be7 100644
--- a/include/linux/nsproxy.h
+++ b/include/linux/nsproxy.h
@@ -24,6 +24,7 @@ struct fs_struct;
*/
struct nsproxy {
atomic_t count;
+ struct work_struct free_nsproxy_work;
struct uts_namespace *uts_ns;
struct ipc_namespace *ipc_ns;
struct mnt_namespace *mnt_ns;
diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
index b576f7f..ebc7d40 100644
--- a/kernel/nsproxy.c
+++ b/kernel/nsproxy.c
@@ -41,13 +41,17 @@ struct nsproxy init_nsproxy = {
#endif
};
+static void free_nsproxy_work(struct work_struct *work);
+
static inline struct nsproxy *create_nsproxy(void)
{
struct nsproxy *nsproxy;
nsproxy = kmem_cache_alloc(nsproxy_cachep, GFP_KERNEL);
- if (nsproxy)
+ if (nsproxy) {
atomic_set(&nsproxy->count, 1);
+ INIT_WORK(&nsproxy->free_nsproxy_work, free_nsproxy_work);
+ }
return nsproxy;
}
@@ -166,6 +170,14 @@ out:
void free_nsproxy(struct nsproxy *ns)
{
+ /*
+ * wait for others to get what they want from this nsproxy.
+ *
+ * cannot release this nsproxy via the call_rcu() since
+ * put_mnt_ns() will want to sleep
+ */
+ synchronize_rcu();
+
if (ns->mnt_ns)
put_mnt_ns(ns->mnt_ns);
if (ns->uts_ns)
@@ -178,6 +190,14 @@ void free_nsproxy(struct nsproxy *ns)
kmem_cache_free(nsproxy_cachep, ns);
}
+static void free_nsproxy_work(struct work_struct *work)
+{
+ struct nsproxy *ns = container_of(work, struct nsproxy,
+ free_nsproxy_work);
+
+ free_nsproxy(ns);
+}
+
/*
* Called from unshare. Unshare all the namespaces part of nsproxy.
* On success, returns the new nsproxy.
@@ -215,16 +235,8 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
rcu_assign_pointer(p->nsproxy, new);
- if (ns && atomic_dec_and_test(&ns->count)) {
- /*
- * wait for others to get what they want from this nsproxy.
- *
- * cannot release this nsproxy via the call_rcu() since
- * put_mnt_ns() will want to sleep
- */
- synchronize_rcu();
- free_nsproxy(ns);
- }
+ if (ns && atomic_dec_and_test(&ns->count))
+ schedule_work(&ns->free_nsproxy_work);
}
void exit_task_namespaces(struct task_struct *p)
--
1.7.7.6
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-16 15:09 ` [PATCH v2] ns: do not block exit_task_namespaces() for a long time Kirill A. Shutemov
@ 2012-07-16 15:39 ` Myklebust, Trond
2012-07-16 16:39 ` Kirill A. Shutemov
2012-07-16 16:53 ` Al Viro
2012-07-16 21:05 ` Kirill A. Shutemov
2 siblings, 1 reply; 11+ messages in thread
From: Myklebust, Trond @ 2012-07-16 15:39 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Pavel Emelyanov, Serge Hallyn, KOSAKI Motohiro,
Al Viro, Dmitry V. Levin, Kirill A. Shutemov, Doug Ledford,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 4400 bytes --]
On Mon, 2012-07-16 at 18:09 +0300, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> On exiting of the last task in a namespace we need to trigger freeing of
> the namespace. Currently, we call synchronize_rcu() and free_nsproxy()
> directly on do_exit() path.
>
> On my machine synchronize_rcu() blocks for about 0.01 seconds. For
> comparing: normal exit_group() syscall takes less than 0.0003 seconds.
>
> Let's offload synchronize_rcu() and free_nsproxy() to a workqueue.
>
> I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
> put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
> I guess it was missed during switch to RCU (see cf7b708).
>
> Microbenchmark:
>
> : #define _GNU_SOURCE
> : #include <unistd.h>
> : #include <sched.h>
> : #include <stdlib.h>
> : #include <sys/wait.h>
> :
> : int
> : main(void)
> : {
> : int i;
> : for (i = 0; i < 1024; i++) {
> : if (fork()) {
> : wait(NULL);
> : continue;
> : }
> : unshare(CLONE_NEWIPC);
> : exit(0);
> : }
> : return 0;
> : }
>
> Before the patch:
>
> real 0m8.335s
> user 0m0.000s
> sys 0m0.265s
>
> After:
>
> real 0m0.569s
> user 0m0.001s
> sys 0m0.154s
>
> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> ---
>
> v2:
> - Updated description.
>
> ---
> include/linux/nsproxy.h | 1 +
> kernel/nsproxy.c | 34 +++++++++++++++++++++++-----------
> 2 files changed, 24 insertions(+), 11 deletions(-)
>
> diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
> index cc37a55..1d26be7 100644
> --- a/include/linux/nsproxy.h
> +++ b/include/linux/nsproxy.h
> @@ -24,6 +24,7 @@ struct fs_struct;
> */
> struct nsproxy {
> atomic_t count;
> + struct work_struct free_nsproxy_work;
> struct uts_namespace *uts_ns;
> struct ipc_namespace *ipc_ns;
> struct mnt_namespace *mnt_ns;
> diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> index b576f7f..ebc7d40 100644
> --- a/kernel/nsproxy.c
> +++ b/kernel/nsproxy.c
> @@ -41,13 +41,17 @@ struct nsproxy init_nsproxy = {
> #endif
> };
>
> +static void free_nsproxy_work(struct work_struct *work);
> +
> static inline struct nsproxy *create_nsproxy(void)
> {
> struct nsproxy *nsproxy;
>
> nsproxy = kmem_cache_alloc(nsproxy_cachep, GFP_KERNEL);
> - if (nsproxy)
> + if (nsproxy) {
> atomic_set(&nsproxy->count, 1);
> + INIT_WORK(&nsproxy->free_nsproxy_work, free_nsproxy_work);
> + }
> return nsproxy;
> }
>
> @@ -166,6 +170,14 @@ out:
>
> void free_nsproxy(struct nsproxy *ns)
> {
> + /*
> + * wait for others to get what they want from this nsproxy.
> + *
> + * cannot release this nsproxy via the call_rcu() since
> + * put_mnt_ns() will want to sleep
> + */
> + synchronize_rcu();
> +
> if (ns->mnt_ns)
> put_mnt_ns(ns->mnt_ns);
> if (ns->uts_ns)
> @@ -178,6 +190,14 @@ void free_nsproxy(struct nsproxy *ns)
> kmem_cache_free(nsproxy_cachep, ns);
> }
>
> +static void free_nsproxy_work(struct work_struct *work)
> +{
> + struct nsproxy *ns = container_of(work, struct nsproxy,
> + free_nsproxy_work);
> +
> + free_nsproxy(ns);
> +}
> +
> /*
> * Called from unshare. Unshare all the namespaces part of nsproxy.
> * On success, returns the new nsproxy.
> @@ -215,16 +235,8 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
>
> rcu_assign_pointer(p->nsproxy, new);
>
> - if (ns && atomic_dec_and_test(&ns->count)) {
> - /*
> - * wait for others to get what they want from this nsproxy.
> - *
> - * cannot release this nsproxy via the call_rcu() since
> - * put_mnt_ns() will want to sleep
> - */
> - synchronize_rcu();
> - free_nsproxy(ns);
> - }
> + if (ns && atomic_dec_and_test(&ns->count))
> + schedule_work(&ns->free_nsproxy_work);
What's wrong with using call_rcu()? The above will cause a workqueue
thread to block for no good reason.
> }
>
> void exit_task_namespaces(struct task_struct *p)
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-16 15:39 ` Myklebust, Trond
@ 2012-07-16 16:39 ` Kirill A. Shutemov
2012-07-16 16:53 ` Myklebust, Trond
0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2012-07-16 16:39 UTC (permalink / raw)
To: Myklebust, Trond
Cc: Andrew Morton, Pavel Emelyanov, Serge Hallyn, KOSAKI Motohiro,
Al Viro, Dmitry V. Levin, Kirill A. Shutemov, Doug Ledford,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org
[-- Attachment #1: Type: text/plain, Size: 4787 bytes --]
On Mon, Jul 16, 2012 at 03:39:36PM +0000, Myklebust, Trond wrote:
> On Mon, 2012-07-16 at 18:09 +0300, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > On exiting of the last task in a namespace we need to trigger freeing of
> > the namespace. Currently, we call synchronize_rcu() and free_nsproxy()
> > directly on do_exit() path.
> >
> > On my machine synchronize_rcu() blocks for about 0.01 seconds. For
> > comparing: normal exit_group() syscall takes less than 0.0003 seconds.
> >
> > Let's offload synchronize_rcu() and free_nsproxy() to a workqueue.
> >
> > I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
> > put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
> > I guess it was missed during switch to RCU (see cf7b708).
> >
> > Microbenchmark:
> >
> > : #define _GNU_SOURCE
> > : #include <unistd.h>
> > : #include <sched.h>
> > : #include <stdlib.h>
> > : #include <sys/wait.h>
> > :
> > : int
> > : main(void)
> > : {
> > : int i;
> > : for (i = 0; i < 1024; i++) {
> > : if (fork()) {
> > : wait(NULL);
> > : continue;
> > : }
> > : unshare(CLONE_NEWIPC);
> > : exit(0);
> > : }
> > : return 0;
> > : }
> >
> > Before the patch:
> >
> > real 0m8.335s
> > user 0m0.000s
> > sys 0m0.265s
> >
> > After:
> >
> > real 0m0.569s
> > user 0m0.001s
> > sys 0m0.154s
> >
> > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> > ---
> >
> > v2:
> > - Updated description.
> >
> > ---
> > include/linux/nsproxy.h | 1 +
> > kernel/nsproxy.c | 34 +++++++++++++++++++++++-----------
> > 2 files changed, 24 insertions(+), 11 deletions(-)
> >
> > diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
> > index cc37a55..1d26be7 100644
> > --- a/include/linux/nsproxy.h
> > +++ b/include/linux/nsproxy.h
> > @@ -24,6 +24,7 @@ struct fs_struct;
> > */
> > struct nsproxy {
> > atomic_t count;
> > + struct work_struct free_nsproxy_work;
> > struct uts_namespace *uts_ns;
> > struct ipc_namespace *ipc_ns;
> > struct mnt_namespace *mnt_ns;
> > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> > index b576f7f..ebc7d40 100644
> > --- a/kernel/nsproxy.c
> > +++ b/kernel/nsproxy.c
> > @@ -41,13 +41,17 @@ struct nsproxy init_nsproxy = {
> > #endif
> > };
> >
> > +static void free_nsproxy_work(struct work_struct *work);
> > +
> > static inline struct nsproxy *create_nsproxy(void)
> > {
> > struct nsproxy *nsproxy;
> >
> > nsproxy = kmem_cache_alloc(nsproxy_cachep, GFP_KERNEL);
> > - if (nsproxy)
> > + if (nsproxy) {
> > atomic_set(&nsproxy->count, 1);
> > + INIT_WORK(&nsproxy->free_nsproxy_work, free_nsproxy_work);
> > + }
> > return nsproxy;
> > }
> >
> > @@ -166,6 +170,14 @@ out:
> >
> > void free_nsproxy(struct nsproxy *ns)
> > {
> > + /*
> > + * wait for others to get what they want from this nsproxy.
> > + *
> > + * cannot release this nsproxy via the call_rcu() since
> > + * put_mnt_ns() will want to sleep
> > + */
> > + synchronize_rcu();
> > +
> > if (ns->mnt_ns)
> > put_mnt_ns(ns->mnt_ns);
> > if (ns->uts_ns)
> > @@ -178,6 +190,14 @@ void free_nsproxy(struct nsproxy *ns)
> > kmem_cache_free(nsproxy_cachep, ns);
> > }
> >
> > +static void free_nsproxy_work(struct work_struct *work)
> > +{
> > + struct nsproxy *ns = container_of(work, struct nsproxy,
> > + free_nsproxy_work);
> > +
> > + free_nsproxy(ns);
> > +}
> > +
> > /*
> > * Called from unshare. Unshare all the namespaces part of nsproxy.
> > * On success, returns the new nsproxy.
> > @@ -215,16 +235,8 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
> >
> > rcu_assign_pointer(p->nsproxy, new);
> >
> > - if (ns && atomic_dec_and_test(&ns->count)) {
> > - /*
> > - * wait for others to get what they want from this nsproxy.
> > - *
> > - * cannot release this nsproxy via the call_rcu() since
> > - * put_mnt_ns() will want to sleep
> > - */
> > - synchronize_rcu();
> > - free_nsproxy(ns);
> > - }
> > + if (ns && atomic_dec_and_test(&ns->count))
> > + schedule_work(&ns->free_nsproxy_work);
>
> What's wrong with using call_rcu()? The above will cause a workqueue
> thread to block for no good reason.
See comment to synchronize_rcu(). free_nsproxy() might sleep.
call_rcu() callback invocation might happen from either softirq or process
context, so we can't use it.
--
Kirill A. Shutemov
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-16 16:39 ` Kirill A. Shutemov
@ 2012-07-16 16:53 ` Myklebust, Trond
0 siblings, 0 replies; 11+ messages in thread
From: Myklebust, Trond @ 2012-07-16 16:53 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Pavel Emelyanov, Serge Hallyn, KOSAKI Motohiro,
Al Viro, Dmitry V. Levin, Kirill A. Shutemov, Doug Ledford,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset="utf-8", Size: 5376 bytes --]
On Mon, 2012-07-16 at 19:39 +0300, Kirill A. Shutemov wrote:
> On Mon, Jul 16, 2012 at 03:39:36PM +0000, Myklebust, Trond wrote:
> > On Mon, 2012-07-16 at 18:09 +0300, Kirill A. Shutemov wrote:
> > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > >
> > > On exiting of the last task in a namespace we need to trigger freeing of
> > > the namespace. Currently, we call synchronize_rcu() and free_nsproxy()
> > > directly on do_exit() path.
> > >
> > > On my machine synchronize_rcu() blocks for about 0.01 seconds. For
> > > comparing: normal exit_group() syscall takes less than 0.0003 seconds.
> > >
> > > Let's offload synchronize_rcu() and free_nsproxy() to a workqueue.
> > >
> > > I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
> > > put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
> > > I guess it was missed during switch to RCU (see cf7b708).
> > >
> > > Microbenchmark:
> > >
> > > : #define _GNU_SOURCE
> > > : #include <unistd.h>
> > > : #include <sched.h>
> > > : #include <stdlib.h>
> > > : #include <sys/wait.h>
> > > :
> > > : int
> > > : main(void)
> > > : {
> > > : int i;
> > > : for (i = 0; i < 1024; i++) {
> > > : if (fork()) {
> > > : wait(NULL);
> > > : continue;
> > > : }
> > > : unshare(CLONE_NEWIPC);
> > > : exit(0);
> > > : }
> > > : return 0;
> > > : }
> > >
> > > Before the patch:
> > >
> > > real 0m8.335s
> > > user 0m0.000s
> > > sys 0m0.265s
> > >
> > > After:
> > >
> > > real 0m0.569s
> > > user 0m0.001s
> > > sys 0m0.154s
> > >
> > > Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com>
> > > Acked-by: Serge E. Hallyn <serge.hallyn@ubuntu.com>
> > > ---
> > >
> > > v2:
> > > - Updated description.
> > >
> > > ---
> > > include/linux/nsproxy.h | 1 +
> > > kernel/nsproxy.c | 34 +++++++++++++++++++++++-----------
> > > 2 files changed, 24 insertions(+), 11 deletions(-)
> > >
> > > diff --git a/include/linux/nsproxy.h b/include/linux/nsproxy.h
> > > index cc37a55..1d26be7 100644
> > > --- a/include/linux/nsproxy.h
> > > +++ b/include/linux/nsproxy.h
> > > @@ -24,6 +24,7 @@ struct fs_struct;
> > > */
> > > struct nsproxy {
> > > atomic_t count;
> > > + struct work_struct free_nsproxy_work;
> > > struct uts_namespace *uts_ns;
> > > struct ipc_namespace *ipc_ns;
> > > struct mnt_namespace *mnt_ns;
> > > diff --git a/kernel/nsproxy.c b/kernel/nsproxy.c
> > > index b576f7f..ebc7d40 100644
> > > --- a/kernel/nsproxy.c
> > > +++ b/kernel/nsproxy.c
> > > @@ -41,13 +41,17 @@ struct nsproxy init_nsproxy = {
> > > #endif
> > > };
> > >
> > > +static void free_nsproxy_work(struct work_struct *work);
> > > +
> > > static inline struct nsproxy *create_nsproxy(void)
> > > {
> > > struct nsproxy *nsproxy;
> > >
> > > nsproxy = kmem_cache_alloc(nsproxy_cachep, GFP_KERNEL);
> > > - if (nsproxy)
> > > + if (nsproxy) {
> > > atomic_set(&nsproxy->count, 1);
> > > + INIT_WORK(&nsproxy->free_nsproxy_work, free_nsproxy_work);
> > > + }
> > > return nsproxy;
> > > }
> > >
> > > @@ -166,6 +170,14 @@ out:
> > >
> > > void free_nsproxy(struct nsproxy *ns)
> > > {
> > > + /*
> > > + * wait for others to get what they want from this nsproxy.
> > > + *
> > > + * cannot release this nsproxy via the call_rcu() since
> > > + * put_mnt_ns() will want to sleep
> > > + */
> > > + synchronize_rcu();
> > > +
> > > if (ns->mnt_ns)
> > > put_mnt_ns(ns->mnt_ns);
> > > if (ns->uts_ns)
> > > @@ -178,6 +190,14 @@ void free_nsproxy(struct nsproxy *ns)
> > > kmem_cache_free(nsproxy_cachep, ns);
> > > }
> > >
> > > +static void free_nsproxy_work(struct work_struct *work)
> > > +{
> > > + struct nsproxy *ns = container_of(work, struct nsproxy,
> > > + free_nsproxy_work);
> > > +
> > > + free_nsproxy(ns);
> > > +}
> > > +
> > > /*
> > > * Called from unshare. Unshare all the namespaces part of nsproxy.
> > > * On success, returns the new nsproxy.
> > > @@ -215,16 +235,8 @@ void switch_task_namespaces(struct task_struct *p, struct nsproxy *new)
> > >
> > > rcu_assign_pointer(p->nsproxy, new);
> > >
> > > - if (ns && atomic_dec_and_test(&ns->count)) {
> > > - /*
> > > - * wait for others to get what they want from this nsproxy.
> > > - *
> > > - * cannot release this nsproxy via the call_rcu() since
> > > - * put_mnt_ns() will want to sleep
> > > - */
> > > - synchronize_rcu();
> > > - free_nsproxy(ns);
> > > - }
> > > + if (ns && atomic_dec_and_test(&ns->count))
> > > + schedule_work(&ns->free_nsproxy_work);
> >
> > What's wrong with using call_rcu()? The above will cause a workqueue
> > thread to block for no good reason.
>
> See comment to synchronize_rcu(). free_nsproxy() might sleep.
> call_rcu() callback invocation might happen from either softirq or process
> context, so we can't use it.
But call_rcu() should be allowed to call schedule_work(). At least you'd
be able to get rid of the 0.01s synchronize_rcu() sleep inside keventd.
--
Trond Myklebust
Linux NFS client maintainer
NetApp
Trond.Myklebust@netapp.com
www.netapp.com
ÿôèº{.nÇ+·®+%Ëÿ±éݶ\x17¥wÿº{.nÇ+·¥{±þG«éÿ{ayº\x1dÊÚë,j\a¢f£¢·hïêÿêçz_è®\x03(éÝ¢j"ú\x1a¶^[m§ÿÿ¾\a«þG«éÿ¢¸?¨èÚ&£ø§~á¶iOæ¬z·vØ^\x14\x04\x1a¶^[m§ÿÿÃ\fÿ¶ìÿ¢¸?I¥
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-16 15:09 ` [PATCH v2] ns: do not block exit_task_namespaces() for a long time Kirill A. Shutemov
2012-07-16 15:39 ` Myklebust, Trond
@ 2012-07-16 16:53 ` Al Viro
2012-07-16 17:16 ` Kirill A. Shutemov
2012-07-16 21:05 ` Kirill A. Shutemov
2 siblings, 1 reply; 11+ messages in thread
From: Al Viro @ 2012-07-16 16:53 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Andrew Morton, Pavel Emelyanov, Serge Hallyn, KOSAKI Motohiro,
Dmitry V. Levin, Kirill A. Shutemov, Doug Ledford, linux-kernel,
containers
On Mon, Jul 16, 2012 at 06:09:24PM +0300, Kirill A. Shutemov wrote:
> From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
>
> On exiting of the last task in a namespace we need to trigger freeing of
> the namespace. Currently, we call synchronize_rcu() and free_nsproxy()
> directly on do_exit() path.
>
> On my machine synchronize_rcu() blocks for about 0.01 seconds. For
> comparing: normal exit_group() syscall takes less than 0.0003 seconds.
>
> Let's offload synchronize_rcu() and free_nsproxy() to a workqueue.
>
> I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
> put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
> I guess it was missed during switch to RCU (see cf7b708).
NAK. Making final umounts of anything in that namespace asynchronous,
even though nothing is holding the stuff on them busy is simply
wrong. Note that they can take a _long_ time, so we are talking about
minutes worth of delay in the worst case. It's user-visible and
it's a serious potential for trouble.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-16 16:53 ` Al Viro
@ 2012-07-16 17:16 ` Kirill A. Shutemov
2012-07-16 17:22 ` Al Viro
0 siblings, 1 reply; 11+ messages in thread
From: Kirill A. Shutemov @ 2012-07-16 17:16 UTC (permalink / raw)
To: Al Viro
Cc: Kirill A. Shutemov, Andrew Morton, Pavel Emelyanov, Serge Hallyn,
KOSAKI Motohiro, Dmitry V. Levin, Doug Ledford, linux-kernel,
containers
On Mon, Jul 16, 2012 at 05:53:01PM +0100, Al Viro wrote:
> On Mon, Jul 16, 2012 at 06:09:24PM +0300, Kirill A. Shutemov wrote:
> > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> >
> > On exiting of the last task in a namespace we need to trigger freeing of
> > the namespace. Currently, we call synchronize_rcu() and free_nsproxy()
> > directly on do_exit() path.
> >
> > On my machine synchronize_rcu() blocks for about 0.01 seconds. For
> > comparing: normal exit_group() syscall takes less than 0.0003 seconds.
> >
> > Let's offload synchronize_rcu() and free_nsproxy() to a workqueue.
> >
> > I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
> > put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
> > I guess it was missed during switch to RCU (see cf7b708).
>
> NAK. Making final umounts of anything in that namespace asynchronous,
> even though nothing is holding the stuff on them busy is simply
> wrong. Note that they can take a _long_ time, so we are talking about
> minutes worth of delay in the worst case. It's user-visible and
> it's a serious potential for trouble.
Good point.
Now in worst case we have a process which hang for a few minutes in
exit_group() syscall in D state, right? Why is that any better?
Does it provide better user experience or better accounting or what?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-16 17:16 ` Kirill A. Shutemov
@ 2012-07-16 17:22 ` Al Viro
0 siblings, 0 replies; 11+ messages in thread
From: Al Viro @ 2012-07-16 17:22 UTC (permalink / raw)
To: Kirill A. Shutemov
Cc: Kirill A. Shutemov, Andrew Morton, Pavel Emelyanov, Serge Hallyn,
KOSAKI Motohiro, Dmitry V. Levin, Doug Ledford, linux-kernel,
containers
On Mon, Jul 16, 2012 at 08:16:34PM +0300, Kirill A. Shutemov wrote:
> On Mon, Jul 16, 2012 at 05:53:01PM +0100, Al Viro wrote:
> > On Mon, Jul 16, 2012 at 06:09:24PM +0300, Kirill A. Shutemov wrote:
> > > From: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com>
> > >
> > > On exiting of the last task in a namespace we need to trigger freeing of
> > > the namespace. Currently, we call synchronize_rcu() and free_nsproxy()
> > > directly on do_exit() path.
> > >
> > > On my machine synchronize_rcu() blocks for about 0.01 seconds. For
> > > comparing: normal exit_group() syscall takes less than 0.0003 seconds.
> > >
> > > Let's offload synchronize_rcu() and free_nsproxy() to a workqueue.
> > >
> > > I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
> > > put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
> > > I guess it was missed during switch to RCU (see cf7b708).
> >
> > NAK. Making final umounts of anything in that namespace asynchronous,
> > even though nothing is holding the stuff on them busy is simply
> > wrong. Note that they can take a _long_ time, so we are talking about
> > minutes worth of delay in the worst case. It's user-visible and
> > it's a serious potential for trouble.
>
> Good point.
>
> Now in worst case we have a process which hang for a few minutes in
> exit_group() syscall in D state, right? Why is that any better?
> Does it provide better user experience or better accounting or what?
"Session that was using that USB stick has still not finished exiting;
might be still busy writing stuff there, so better not pull it out yet".
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH v2] ns: do not block exit_task_namespaces() for a long time
2012-07-16 15:09 ` [PATCH v2] ns: do not block exit_task_namespaces() for a long time Kirill A. Shutemov
2012-07-16 15:39 ` Myklebust, Trond
2012-07-16 16:53 ` Al Viro
@ 2012-07-16 21:05 ` Kirill A. Shutemov
2 siblings, 0 replies; 11+ messages in thread
From: Kirill A. Shutemov @ 2012-07-16 21:05 UTC (permalink / raw)
To: Andrew Morton, Pavel Emelyanov
Cc: Serge Hallyn, KOSAKI Motohiro, Al Viro, Dmitry V. Levin,
Kirill A. Shutemov, Doug Ledford, linux-kernel, containers
[-- Attachment #1: Type: text/plain, Size: 404 bytes --]
On Mon, Jul 16, 2012 at 06:09:24PM +0300, Kirill A. Shutemov wrote:
> I also move synchronize_rcu() inside free_nsproxy(). It fixes racy
> put_nsproxy() which calls free_nsproxy() without synchronize_rcu().
> I guess it was missed during switch to RCU (see cf7b708).
I was wrong here. No races. RCU protects task->ns_proxy, not ns_proxy
itself.
Sorry for misleading.
--
Kirill A. Shutemov
[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 836 bytes --]
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2012-07-16 21:03 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-07-13 11:48 [PATCH] ns: move free_nsproxy() out of do_exit() path Kirill A. Shutemov
2012-07-13 21:08 ` Andrew Morton
2012-07-13 22:23 ` Kirill A. Shutemov
2012-07-16 15:09 ` [PATCH v2] ns: do not block exit_task_namespaces() for a long time Kirill A. Shutemov
2012-07-16 15:39 ` Myklebust, Trond
2012-07-16 16:39 ` Kirill A. Shutemov
2012-07-16 16:53 ` Myklebust, Trond
2012-07-16 16:53 ` Al Viro
2012-07-16 17:16 ` Kirill A. Shutemov
2012-07-16 17:22 ` Al Viro
2012-07-16 21:05 ` Kirill A. Shutemov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox