[PATCH 1/5] cgroup: fix broken css_has_online

cgroups.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 1/5] cgroup: fix broken css_has_online_children()
@ 2014-06-12  6:31 Li Zefan
       [not found] ` <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
                   ` (2 more replies)
  0 siblings, 3 replies; 15+ messages in thread
From: Li Zefan @ 2014-06-12  6:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

After running:

  # mount -t cgroup cpu xxx /cgroup && mkdir /cgroup/sub && \
    rmdir /cgroup/sub && umount /cgroup

I found the cgroup root still existed:

  # cat /proc/cgroups
  #subsys_name    hierarchy       num_cgroups     enabled
  cpuset  0       1       1
  cpu     1       1       1
  ...

It turned out css_has_online_children() is broken.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 kernel/cgroup.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 05b8ca4..1c65f24 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -3327,7 +3327,7 @@ bool css_has_online_children(struct cgroup_subsys_state *css)
 
 	rcu_read_lock();
 	css_for_each_child(child, css) {
-		if (css->flags & CSS_ONLINE) {
+		if (child->flags & CSS_ONLINE) {
 			ret = true;
 			break;
 		}
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

[parent not found: <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]

* [PATCH 2/5] percpu-ref: introduce percpu_ref_alive()
       [not found] ` <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2014-06-12  6:31   ` Li Zefan
  2014-06-12  6:33   ` [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() Li Zefan
  2014-06-17 19:26   ` [PATCH 1/5] cgroup: fix broken css_has_online_children() Tejun Heo
  2 siblings, 0 replies; 15+ messages in thread
From: Li Zefan @ 2014-06-12  6:31 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

This is used to check if the percpu_ref has been killed.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
 include/linux/percpu-refcount.h | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/include/linux/percpu-refcount.h b/include/linux/percpu-refcount.h
index dba35c4..1d5f2b3 100644
--- a/include/linux/percpu-refcount.h
+++ b/include/linux/percpu-refcount.h
@@ -96,6 +96,17 @@ static inline void percpu_ref_kill(struct percpu_ref *ref)
 #define REF_STATUS(count)	(((unsigned long) count) & PCPU_STATUS_MASK)
 
 /**
+ * percpu_ref_alive - check if the ref has been killed
+ * @ref: percpu_ref to check
+ *
+ * Return true if percpu_ref_kill() has been called to drop the initial ref.
+ */
+static inline bool percpu_ref_alive(struct percpu_ref *ref)
+{
+	return !(REF_STATUS(ref->pcpu_count) == PCPU_REF_DEAD);
+}
+
+/**
  * percpu_ref_get - increment a percpu refcount
  * @ref: percpu_ref to get
  *
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
       [not found] ` <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2014-06-12  6:31   ` [PATCH 2/5] percpu-ref: introduce percpu_ref_alive() Li Zefan
@ 2014-06-12  6:33   ` Li Zefan
       [not found]     ` <539949A1.90301-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2014-06-17 19:26   ` [PATCH 1/5] cgroup: fix broken css_has_online_children() Tejun Heo
  2 siblings, 1 reply; 15+ messages in thread
From: Li Zefan @ 2014-06-12  6:33 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

We've converted cgroup to kernfs so cgroup won't be intertwined with
vfs objects and locking, but there are dark areas.

Run two instances of this script concurrently:

    for ((; ;))
    {
    	mount -t cgroup -o cpuacct xxx /cgroup
    	umount /cgroup
    }

After a while, I saw two mount processes were stuck at retrying, because
they were waiting for a subsystem to become free, but the root associated
with this subsystem never got freed.

This can happen, if thread A is in the process of killing superblock but
hasn't called percpu_ref_kill(), and at this time thread B is mounting
the same cgroup root and finds the root in the root list and performs
percpu_ref_try_get().

To fix this, we increase the refcnt of the superblock instead of increasing
the percpu refcnt of cgroup root.

Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---

A better fix is welcome!

---
 kernel/cgroup.c | 24 ++++++++++++++----------
 1 file changed, 14 insertions(+), 10 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index bd37e8d..94e1814 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1654,7 +1654,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	struct dentry *dentry;
 	int ret;
 	int i;
-	bool new_sb;
+	bool sb_pinned = false;
 
 	/*
 	 * The first time anyone tries to mount a cgroup, enable the list
@@ -1735,19 +1735,21 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		}
 
 		/*
-		 * A root's lifetime is governed by its root cgroup.
-		 * tryget_live failure indicate that the root is being
-		 * destroyed.  Wait for destruction to complete so that the
-		 * subsystems are free.  We can use wait_queue for the wait
-		 * but this path is super cold.  Let's just sleep for a bit
-		 * and retry.
+		 * This may fail for two reasons:
+		 * - A concurrent mount is in process. We wait for that mount
+		     to complete.
+		 * - The superblock is being destroyed. We wait for the
+		 *   desctruction to complete so that the subsystems are free.
+		 * We can use wait_queue for the wait but this path is super
+		 * cold.  Let's just sleep for a bit and retry.
 		 */
-		if (!percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
+		if (!kernfs_pin_sb(root->kf_root, NULL)) {
 			mutex_unlock(&cgroup_mutex);
 			msleep(10);
 			ret = restart_syscall();
 			goto out_free;
 		}
+		sb_pinned = true;
 
 		ret = 0;
 		goto out_unlock;
@@ -1784,8 +1786,10 @@ out_free:
 	if (ret)
 		return ERR_PTR(ret);
 
-	dentry = kernfs_mount(fs_type, flags, root->kf_root, &new_sb);
-	if (IS_ERR(dentry) || !new_sb)
+	dentry = kernfs_mount(fs_type, flags, root->kf_root, NULL);
+	if (sb_pinned)
+		kernfs_drop_sb(root->kf_root, NULL);
+	if (!sb_pinned && IS_ERR(dentry))
 		cgroup_put(&root->cgrp);
 	return dentry;
 }
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

[parent not found: <539949A1.90301-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
       [not found]     ` <539949A1.90301-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2014-06-20 19:35       ` Tejun Heo
       [not found]         ` <20140620193521.GB28324-9pTldWuhBndy/B6EtB590w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2014-06-20 19:35 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups

Hello, Li.

Sorry about the long delay.

On Thu, Jun 12, 2014 at 02:33:05PM +0800, Li Zefan wrote:
> We've converted cgroup to kernfs so cgroup won't be intertwined with
> vfs objects and locking, but there are dark areas.
> 
> Run two instances of this script concurrently:
> 
>     for ((; ;))
>     {
>     	mount -t cgroup -o cpuacct xxx /cgroup
>     	umount /cgroup
>     }
> 
> After a while, I saw two mount processes were stuck at retrying, because
> they were waiting for a subsystem to become free, but the root associated
> with this subsystem never got freed.
> 
> This can happen, if thread A is in the process of killing superblock but
> hasn't called percpu_ref_kill(), and at this time thread B is mounting
> the same cgroup root and finds the root in the root list and performs
> percpu_ref_try_get().
> 
> To fix this, we increase the refcnt of the superblock instead of increasing
> the percpu refcnt of cgroup root.

Ah, right.  Gees, I'm really hating the fact that we have ->mount but
not ->umount.  However, can't we make it a bit simpler by just
introducing a mutex protecting looking up and refing up an existing
root and a sb going away?  The only problem is that the refcnt being
killed isn't atomic w.r.t. new live ref coming up, right?  Why not
just add a mutex around them so that they can't race?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <20140620193521.GB28324-9pTldWuhBndy/B6EtB590w@public.gmane.org>]

* Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
       [not found]         ` <20140620193521.GB28324-9pTldWuhBndy/B6EtB590w@public.gmane.org>
@ 2014-06-24  1:22           ` Li Zefan
       [not found]             ` <53A8D2B8.4080107-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Li Zefan @ 2014-06-24  1:22 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

On 2014/6/21 3:35, Tejun Heo wrote:
> Hello, Li.
> 
> Sorry about the long delay.
> 
> On Thu, Jun 12, 2014 at 02:33:05PM +0800, Li Zefan wrote:
>> We've converted cgroup to kernfs so cgroup won't be intertwined with
>> vfs objects and locking, but there are dark areas.
>>
>> Run two instances of this script concurrently:
>>
>>     for ((; ;))
>>     {
>>     	mount -t cgroup -o cpuacct xxx /cgroup
>>     	umount /cgroup
>>     }
>>
>> After a while, I saw two mount processes were stuck at retrying, because
>> they were waiting for a subsystem to become free, but the root associated
>> with this subsystem never got freed.
>>
>> This can happen, if thread A is in the process of killing superblock but
>> hasn't called percpu_ref_kill(), and at this time thread B is mounting
>> the same cgroup root and finds the root in the root list and performs
>> percpu_ref_try_get().
>>
>> To fix this, we increase the refcnt of the superblock instead of increasing
>> the percpu refcnt of cgroup root.
> 
> Ah, right.  Gees, I'm really hating the fact that we have ->mount but
> not ->umount.  However, can't we make it a bit simpler by just
> introducing a mutex protecting looking up and refing up an existing
> root and a sb going away?  The only problem is that the refcnt being
> killed isn't atomic w.r.t. new live ref coming up, right?  Why not
> just add a mutex around them so that they can't race?
> 

Well, kill_sb() is called with sb->s_umount held, while kernfs_mount()
returned with sb->s_umount held, so adding a mutex will lead to ABBA
deadlock.

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <53A8D2B8.4080107-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
       [not found]             ` <53A8D2B8.4080107-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2014-06-24 21:01               ` Tejun Heo
       [not found]                 ` <20140624210119.GC14909-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2014-06-24 21:01 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups

Hello, Li.

On Tue, Jun 24, 2014 at 09:22:00AM +0800, Li Zefan wrote:
> > Ah, right.  Gees, I'm really hating the fact that we have ->mount but
> > not ->umount.  However, can't we make it a bit simpler by just
> > introducing a mutex protecting looking up and refing up an existing
> > root and a sb going away?  The only problem is that the refcnt being
> > killed isn't atomic w.r.t. new live ref coming up, right?  Why not
> > just add a mutex around them so that they can't race?
> 
> Well, kill_sb() is called with sb->s_umount held, while kernfs_mount()
> returned with sb->s_umount held, so adding a mutex will lead to ABBA
> deadlock.

Hmmm?  Why does that matter?  The only region in cgroup_mount() which
needs to be put inside such mutex would be root lookup, no?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <20140624210119.GC14909-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]

* Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
       [not found]                 ` <20140624210119.GC14909-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2014-06-25  1:56                   ` Li Zefan
  2014-06-25 15:00                     ` Tejun Heo
  0 siblings, 1 reply; 15+ messages in thread
From: Li Zefan @ 2014-06-25  1:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

On 2014/6/25 5:01, Tejun Heo wrote:
> Hello, Li.
> 
> On Tue, Jun 24, 2014 at 09:22:00AM +0800, Li Zefan wrote:
>>> Ah, right.  Gees, I'm really hating the fact that we have ->mount but
>>> not ->umount.  However, can't we make it a bit simpler by just
>>> introducing a mutex protecting looking up and refing up an existing
>>> root and a sb going away?  The only problem is that the refcnt being
>>> killed isn't atomic w.r.t. new live ref coming up, right?  Why not
>>> just add a mutex around them so that they can't race?
>>
>> Well, kill_sb() is called with sb->s_umount held, while kernfs_mount()
>> returned with sb->s_umount held, so adding a mutex will lead to ABBA
>> deadlock.
> 
> Hmmm?  Why does that matter?  The only region in cgroup_mount() which
> needs to be put inside such mutex would be root lookup, no?
> 

unfortunately that won't help. I think what you suggest is:

cgroup_mount()
{
	mutex_lock();
	lookup_cgroup_root();
	mutex_unlock();
	kernfs_mount();
}

cgroup_kill_sb()
{
	mutex_lock();
	percpu_ref_kill();
	mutex_Unlock();
	kernfs_kill_sb();
}

See, we may still be destroying the superblock after we've succeeded
in getting the refcnt of cgroup root.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
  2014-06-25  1:56                   ` Li Zefan
@ 2014-06-25 15:00                     ` Tejun Heo
       [not found]                       ` <20140625150053.GE26883-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2014-06-25 15:00 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups

Hey,

On Wed, Jun 25, 2014 at 09:56:31AM +0800, Li Zefan wrote:
> > Hmmm?  Why does that matter?  The only region in cgroup_mount() which
> > needs to be put inside such mutex would be root lookup, no?
> 
> unfortunately that won't help. I think what you suggest is:
> 
> cgroup_mount()
> {
> 	mutex_lock();
> 	lookup_cgroup_root();
> 	mutex_unlock();
> 	kernfs_mount();
> }
> 
> cgroup_kill_sb()
> {
> 	mutex_lock();
> 	percpu_ref_kill();
> 	mutex_Unlock();
> 	kernfs_kill_sb();
> }
> 
> See, we may still be destroying the superblock after we've succeeded
> in getting the refcnt of cgroup root.

Sure, but now the decision to kill is synchronized so the other side
can interlock with it.  e.g.

cgroup_mount()
{
	mutex_lock();
	lookup_cgroup_root();
	if (root isn't killed yet)
		root->this_better_stay_alive++;
	mutex_unlock();
	kernfs_mount();
}

cgroup_kill_sb()
{
	mutex_lock();
	if (check whether root can be killed)
		percpu_ref_kill();
	mutex_unlock();
	if (the above condition was true)
		kernfs_kill_sb();
}

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <20140625150053.GE26883-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]

* Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
       [not found]                       ` <20140625150053.GE26883-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2014-06-27  6:32                         ` Li Zefan
       [not found]                           ` <53AD1001.4090405-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  0 siblings, 1 reply; 15+ messages in thread
From: Li Zefan @ 2014-06-27  6:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

On 2014/6/25 23:00, Tejun Heo wrote:
> Hey,
> 
> On Wed, Jun 25, 2014 at 09:56:31AM +0800, Li Zefan wrote:
>>> Hmmm?  Why does that matter?  The only region in cgroup_mount() which
>>> needs to be put inside such mutex would be root lookup, no?
>>
>> unfortunately that won't help. I think what you suggest is:
>>
>> cgroup_mount()
>> {
>> 	mutex_lock();
>> 	lookup_cgroup_root();
>> 	mutex_unlock();
>> 	kernfs_mount();
>> }
>>
>> cgroup_kill_sb()
>> {
>> 	mutex_lock();
>> 	percpu_ref_kill();
>> 	mutex_Unlock();
>> 	kernfs_kill_sb();
>> }
>>
>> See, we may still be destroying the superblock after we've succeeded
>> in getting the refcnt of cgroup root.
> 
> Sure, but now the decision to kill is synchronized so the other side
> can interlock with it.  e.g.
> 
> cgroup_mount()
> {
> 	mutex_lock();
> 	lookup_cgroup_root();
> 	if (root isn't killed yet)
> 		root->this_better_stay_alive++;
> 	mutex_unlock();
> 	kernfs_mount();
> }
> 
> cgroup_kill_sb()
> {
> 	mutex_lock();
> 	if (check whether root can be killed)
> 		percpu_ref_kill();
> 	mutex_unlock();
> 	if (the above condition was true)
> 		kernfs_kill_sb();
> }
> 

This looks nasty, and I don't think it's correct. If we skip the call
to kernfs_kill_sb(), kernfs_super_info won't be freed but super_block
will, so we will end up either leaking memory or accessing invalid
memory. There are other problems like returning with sb->s_umount still
held.

^ permalink raw reply	[flat|nested] 15+ messages in thread

[parent not found: <53AD1001.4090405-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
       [not found]                           ` <53AD1001.4090405-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2014-06-27 15:00                             ` Tejun Heo
  0 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2014-06-27 15:00 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups

On Fri, Jun 27, 2014 at 02:32:33PM +0800, Li Zefan wrote:
> > cgroup_mount()
> > {
> > 	mutex_lock();
> > 	lookup_cgroup_root();
> > 	if (root isn't killed yet)
> > 		root->this_better_stay_alive++;
> > 	mutex_unlock();
> > 	kernfs_mount();
> > }
> > 
> > cgroup_kill_sb()
> > {
> > 	mutex_lock();
> > 	if (check whether root can be killed)
> > 		percpu_ref_kill();
> > 	mutex_unlock();
> > 	if (the above condition was true)
> > 		kernfs_kill_sb();
> > }
> > 
> 
> This looks nasty, and I don't think it's correct. If we skip the call
> to kernfs_kill_sb(), kernfs_super_info won't be freed but super_block
> will, so we will end up either leaking memory or accessing invalid
> memory. There are other problems like returning with sb->s_umount still
> held.

Yeah, right, the conditional shouldn't be on kernfs_kill_sb().  It
should only be on percpu_ref_kill().  kernfs mount code will wait out
the dead sb and create a new one; however, this is still not feasible
because we don't have a place to dec ->this_better_stay_alive as
there's no umount callback.  :(

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 1/5] cgroup: fix broken css_has_online_children()
       [not found] ` <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2014-06-12  6:31   ` [PATCH 2/5] percpu-ref: introduce percpu_ref_alive() Li Zefan
  2014-06-12  6:33   ` [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() Li Zefan
@ 2014-06-17 19:26   ` Tejun Heo
  2 siblings, 0 replies; 15+ messages in thread
From: Tejun Heo @ 2014-06-17 19:26 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups

On Thu, Jun 12, 2014 at 02:31:31PM +0800, Li Zefan wrote:
> After running:
> 
>   # mount -t cgroup cpu xxx /cgroup && mkdir /cgroup/sub && \
>     rmdir /cgroup/sub && umount /cgroup
> 
> I found the cgroup root still existed:
> 
>   # cat /proc/cgroups
>   #subsys_name    hierarchy       num_cgroups     enabled
>   cpuset  0       1       1
>   cpu     1       1       1
>   ...
> 
> It turned out css_has_online_children() is broken.
> 
> Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>

Applied to cgroup/for-3.16-fixes.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 3/5] cgroup: fix mount failure in a corner case
  2014-06-12  6:31 [PATCH 1/5] cgroup: fix broken css_has_online_children() Li Zefan
       [not found] ` <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2014-06-12  6:32 ` Li Zefan
       [not found]   ` <5399496D.6060003-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2014-06-12  6:32 ` [PATCH 4/5] kernfs: introduce kernfs_pin_sb() and kernfs_drop_sb() Li Zefan
  2 siblings, 1 reply; 15+ messages in thread
From: Li Zefan @ 2014-06-12  6:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

  # cat test.sh
  #! /bin/bash

  mount -t cgroup -o cpu xxx /cgroup
  umount /cgroup

  mount -t cgroup -o cpu,cpuacct xxx /cgroup
  umount /cgroup
  # ./test.sh
  mount: xxx already mounted or /cgroup busy
  mount: according to mtab, xxx is already mounted on /cgroup

It's because the cgroupfs_root of the first mount was under destruction
asynchronously.

Fix this by delaying and then retrying mount in this case.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1c65f24..bd37e8d 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1648,10 +1648,12 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 			 int flags, const char *unused_dev_name,
 			 void *data)
 {
+	struct cgroup_subsys *ss;
 	struct cgroup_root *root;
 	struct cgroup_sb_opts opts;
 	struct dentry *dentry;
 	int ret;
+	int i;
 	bool new_sb;
 
 	/*
@@ -1677,6 +1679,22 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_unlock;
 	}
 
+	/*
+	 * If some subsystems have been bound to existing cgroup hierarchies,
+	 * but those hierachies are being destroyed, let's wait a little bit
+	 * and retry.
+	 */
+	for_each_subsys(ss, i) {
+		if (!(opts.subsys_mask & (1 << i)))
+			continue;
+		if (!percpu_ref_alive(&ss->root->cgrp.self.refcnt)) {
+			mutex_unlock(&cgroup_mutex);
+			msleep(10);
+			ret = restart_syscall();
+			goto out_free;
+		}
+	}
+
 	for_each_root(root) {
 		bool name_match = false;
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

[parent not found: <5399496D.6060003-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]

* Re: [PATCH 3/5] cgroup: fix mount failure in a corner case
       [not found]   ` <5399496D.6060003-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
@ 2014-06-20 19:10     ` Tejun Heo
  2014-06-24  1:15       ` Li Zefan
  0 siblings, 1 reply; 15+ messages in thread
From: Tejun Heo @ 2014-06-20 19:10 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups

On Thu, Jun 12, 2014 at 02:32:13PM +0800, Li Zefan wrote:
> @@ -1677,6 +1679,22 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
>  		goto out_unlock;
>  	}
>  
> +	/*
> +	 * If some subsystems have been bound to existing cgroup hierarchies,
> +	 * but those hierachies are being destroyed, let's wait a little bit
> +	 * and retry.
> +	 */
> +	for_each_subsys(ss, i) {
> +		if (!(opts.subsys_mask & (1 << i)))
> +			continue;
> +		if (!percpu_ref_alive(&ss->root->cgrp.self.refcnt)) {

Can't we just do tryget_live() instead and then put before retrying?
It's not exactly a hot path and the operations are dirt cheap anyway.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH 3/5] cgroup: fix mount failure in a corner case
  2014-06-20 19:10     ` Tejun Heo
@ 2014-06-24  1:15       ` Li Zefan
  0 siblings, 0 replies; 15+ messages in thread
From: Li Zefan @ 2014-06-24  1:15 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

On 2014/6/21 3:10, Tejun Heo wrote:
> On Thu, Jun 12, 2014 at 02:32:13PM +0800, Li Zefan wrote:
>> @@ -1677,6 +1679,22 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
>>  		goto out_unlock;
>>  	}
>>  
>> +	/*
>> +	 * If some subsystems have been bound to existing cgroup hierarchies,
>> +	 * but those hierachies are being destroyed, let's wait a little bit
>> +	 * and retry.
>> +	 */
>> +	for_each_subsys(ss, i) {
>> +		if (!(opts.subsys_mask & (1 << i)))
>> +			continue;
>> +		if (!percpu_ref_alive(&ss->root->cgrp.self.refcnt)) {
> 
> Can't we just do tryget_live() instead and then put before retrying?
> It's not exactly a hot path and the operations are dirt cheap anyway.
> 

No much difference, though would be a bit more code. I can do that.

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [PATCH 4/5] kernfs: introduce kernfs_pin_sb() and kernfs_drop_sb()
  2014-06-12  6:31 [PATCH 1/5] cgroup: fix broken css_has_online_children() Li Zefan
       [not found] ` <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
  2014-06-12  6:32 ` [PATCH 3/5] cgroup: fix mount failure in a corner case Li Zefan
@ 2014-06-12  6:32 ` Li Zefan
  2 siblings, 0 replies; 15+ messages in thread
From: Li Zefan @ 2014-06-12  6:32 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups


kernfs_pin_sb() tries to get a refcnt of the superblock, while
kernfs_drop_sb() drops this refcnt.

This will be used by cgroupfs.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 fs/kernfs/mount.c      | 45 +++++++++++++++++++++++++++++++++++++++++++++
 include/linux/kernfs.h |  3 +++
 2 files changed, 48 insertions(+)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index f25a7c0..4f924e0 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -210,6 +210,51 @@ void kernfs_kill_sb(struct super_block *sb)
 	kernfs_put(root_kn);
 }
 
+/**
+ * kernfs_pin_sb: try to pin the superblock associated with a kernfs_root
+ * @kernfs_root: the kernfs_root in question
+ * @ns: the namespace tag
+ *
+ * Pin the superblock so the superblock won't be destroyed in subsequent
+ * operations.
+ */
+bool kernfs_pin_sb(struct kernfs_root *root, const void *ns)
+{
+	struct kernfs_super_info *info;
+	int ret = false;
+
+	mutex_lock(&kernfs_mutex);
+	list_for_each_entry(info, &root->supers, node) {
+		if (info->ns == ns) {
+			ret = atomic_inc_not_zero(&info->sb->s_active);
+			break;
+		}
+	}
+	mutex_unlock(&kernfs_mutex);
+	return ret;
+}
+
+/**
+ * kernfs_drop_sb: drop the refcnt that we got by kernfs_pin_sb()
+ * @root: the kernfs_root in question
+ * @ns: the namespace tag
+ *
+ * This must be paired with kernfs_pin_sb(). It will require sb->u_mount
+ * if the refcnt reaches zero.
+ */
+void kernfs_drop_sb(struct kernfs_root *root, const void *ns)
+{
+	struct kernfs_super_info *info;
+
+	mutex_lock(&kernfs_mutex);
+	list_for_each_entry(info, &root->supers, node) {
+		if (info->ns == ns)
+			break;
+	}
+	mutex_unlock(&kernfs_mutex);
+	deactivate_super(info->sb);
+}
+
 void __init kernfs_init(void)
 {
 	kernfs_node_cache = kmem_cache_create("kernfs_node_cache",
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 589318b..1958017 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -288,6 +288,9 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
 			       const void *ns);
 void kernfs_kill_sb(struct super_block *sb);
 
+bool kernfs_pin_sb(struct kernfs_root *root, const void *ns);
+void kernfs_drop_sb(struct kernfs_root *root, const void *ns);
+
 void kernfs_init(void);
 
 #else	/* CONFIG_KERNFS */
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2014-06-27 15:00 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-12  6:31 [PATCH 1/5] cgroup: fix broken css_has_online_children() Li Zefan
     [not found] ` <53994943.60703-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-12  6:31   ` [PATCH 2/5] percpu-ref: introduce percpu_ref_alive() Li Zefan
2014-06-12  6:33   ` [PATCH 5/5] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() Li Zefan
     [not found]     ` <539949A1.90301-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-20 19:35       ` Tejun Heo
     [not found]         ` <20140620193521.GB28324-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2014-06-24  1:22           ` Li Zefan
     [not found]             ` <53A8D2B8.4080107-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-24 21:01               ` Tejun Heo
     [not found]                 ` <20140624210119.GC14909-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-06-25  1:56                   ` Li Zefan
2014-06-25 15:00                     ` Tejun Heo
     [not found]                       ` <20140625150053.GE26883-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-06-27  6:32                         ` Li Zefan
     [not found]                           ` <53AD1001.4090405-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-27 15:00                             ` Tejun Heo
2014-06-17 19:26   ` [PATCH 1/5] cgroup: fix broken css_has_online_children() Tejun Heo
2014-06-12  6:32 ` [PATCH 3/5] cgroup: fix mount failure in a corner case Li Zefan
     [not found]   ` <5399496D.6060003-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-20 19:10     ` Tejun Heo
2014-06-24  1:15       ` Li Zefan
2014-06-12  6:32 ` [PATCH 4/5] kernfs: introduce kernfs_pin_sb() and kernfs_drop_sb() Li Zefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).