linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 1/3] cgroup: fix mount failure in a corner case
@ 2014-06-27  7:10 Li Zefan
  2014-06-27  7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Li Zefan @ 2014-06-27  7:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

  # cat test.sh
  #! /bin/bash

  mount -t cgroup -o cpu xxx /cgroup
  umount /cgroup

  mount -t cgroup -o cpu,cpuacct xxx /cgroup
  umount /cgroup
  # ./test.sh
  mount: xxx already mounted or /mnt busy
  mount: according to mtab, xxx is already mounted on /mnt

It's because the cgroupfs_root of the first mount was under destruction
asynchronously.

Fix this by delaying and then retrying mount for this case.

v2:
- use percpu_ref_tryget_live() rather that introducing
  percpu_ref_alive(). (Tejun)
- adjust comment.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 27 +++++++++++++++++++++++++++
 1 file changed, 27 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1c65f24..ae2b382 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1648,10 +1648,12 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 			 int flags, const char *unused_dev_name,
 			 void *data)
 {
+	struct cgroup_subsys *ss;
 	struct cgroup_root *root;
 	struct cgroup_sb_opts opts;
 	struct dentry *dentry;
 	int ret;
+	int i, j = -1;
 	bool new_sb;
 
 	/*
@@ -1677,6 +1679,23 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_unlock;
 	}
 
+	/*
+	 * Destruction of cgroup root is asynchronous, so we may fail to
+	 * mount a cgroupfs if it immediately follows a umount. Let's wait
+	 * a little bit and retry.
+	 */
+	for_each_subsys(ss, i) {
+		if (!(opts.subsys_mask & (1 << i)))
+			continue;
+		if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) {
+			mutex_unlock(&cgroup_mutex);
+			msleep(10);
+			ret = restart_syscall

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb()
  2014-06-27  7:10 [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
@ 2014-06-27  7:10 ` Li Zefan
  2014-06-27 15:01   ` Tejun Heo
  2014-06-27  7:11 ` [PATCH v2 3/3] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() Li Zefan
  2014-06-27  8:16 ` [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
  2 siblings, 1 reply; 9+ messages in thread
From: Li Zefan @ 2014-06-27  7:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Greg Kroah-Hartman

kernfs_pin_sb() tries to get a refcnt of the superblock.

This will be used by cgroupfs.

v2:
- make kernfs_pin_sb() return pointer to the superblock.
- drop kernfs_drop_sb().

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 fs/kernfs/mount.c      | 27 +++++++++++++++++++++++++++
 include/linux/kernfs.h |  1 +
 2 files changed, 28 insertions(+)

diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c
index f25a7c0..616c5c4 100644
--- a/fs/kernfs/mount.c
+++ b/fs/kernfs/mount.c
@@ -210,6 +210,33 @@ void kernfs_kill_sb(struct super_block *sb)
 	kernfs_put(root_kn);
 }
 
+/**
+ * kernfs_pin_sb: try to pin the superblock associated with a kernfs_root
+ * @kernfs_root: the kernfs_root in question
+ * @ns: the namespace tag
+ *
+ * Pin the superblock so the superblock won't be destroyed in subsequent
+ * operations. Return NULL if there's no superblock associated to this
+ * kernfs_root, or -EINVAL if the superblock is being freed.
+ */
+struct super_block *kernfs_pin_sb(struct kernfs_root *root, const void *ns)
+{
+	struct kernfs_super_info *info;
+	struct super_block *sb = NULL;
+
+	mutex_lock(&kernfs_mutex);
+	list_for_each_entry(info, &root->supers, node) {
+		if (info->ns == ns) {
+			sb = info->sb;
+			if (!atomic_inc_not_zero(&info->sb->s_active))
+				sb = ERR_PTR(-EINVAL);
+			break;
+		}
+	}
+	mutex_unlock(&kernfs_mutex);
+	return sb;
+}
+
 void __init kernfs_init(void)
 {
 	kernfs_node_cache = kmem_cache_create("kernfs_node_cache",
diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h
index 589318b..9096296 100644
--- a/include/linux/kernfs.h
+++ b/include/linux/kernfs.h
@@ -287,6 +287,7 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags,
 			       struct kernfs_root *root, bool *new_sb_created,
 			       const void *ns);
 void kernfs_kill_sb(struct super_block *sb);
+struct super_block *kernfs_pin_sb(struct kernfs_root *root, const void *ns);
 
 void kernfs_init(void);
 
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH v2 3/3] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb()
  2014-06-27  7:10 [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
  2014-06-27  7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan
@ 2014-06-27  7:11 ` Li Zefan
  2014-06-27  8:16 ` [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
  2 siblings, 0 replies; 9+ messages in thread
From: Li Zefan @ 2014-06-27  7:11 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups, Greg Kroah-Hartman

We've converted cgroup to kernfs so cgroup won't be intertwined with
vfs objects and locking, but there are dark areas.

Run two instances of this script concurrently:

    for ((; ;))
    {
    	mount -t cgroup -o cpuacct xxx /cgroup
    	umount /cgroup
    }

After a while, I saw two mount processes were stuck at retrying, because
they were waiting for a subsystem to become free, but the root associated
with this subsystem never got freed.

This can happen, if thread A is in the process of killing superblock but
hasn't called percpu_ref_kill(), and at this time thread B is mounting
the same cgroup root and finds the root in the root list and performs
percpu_ref_try_get().

To fix this, we try to increase both the refcnt of the superblock and the
percpu refcnt of cgroup root.

v2:
- we should try to get both the superblock refcnt and cgroup_root refcnt,
  because cgroup_root may have no superblock assosiated with it.
- adjust/add comments.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 28 ++++++++++++++++++++++------
 1 file changed, 22 insertions(+), 6 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index ae2b382..111b7c3 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1655,6 +1655,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 	int ret;
 	int i, j = -1;
 	bool new_sb;
+	struct super_block *sb = NULL;
 
 	/*
 	 * The first time anyone tries to mount a cgroup, enable the list
@@ -1737,14 +1738,18 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 
 		/*
 		 * A root's lifetime is governed by its root cgroup.
-		 * tryget_live failure indicate that the root is being
-		 * destroyed.  Wait for destruction to complete so that the
-		 * subsystems are free.  We can use wait_queue for the wait
-		 * but this path is super cold.  Let's just sleep for a bit
-		 * and retry.
+		 * pin_sb and tryget_live failure indicate that the root is
+		 * being destroyed.  Wait for destruction to complete so that
+		 * the subsystems are free.  We can use wait_queue for the
+		 * wait but this path is super cold.  Let's just sleep for
+		 * a bit and retry.
 		 */
-		if (!percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
+		sb = kernfs_pin_sb(root->kf_root, NULL);
+		if (IS_ERR(sb) ||
+		    !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) {
 			mutex_unlock(&cgroup_mutex);
+			if (!IS_ERR_OR_NULL(sb))
+				deactivate_super(sb);
 			msleep(10);
 			ret = restart_syscall();
 			goto out_free;
@@ -1796,6 +1801,17 @@ out_free:
 	dentry = kernfs_mount(fs_type, flags, root->kf_root, &new_sb);
 	if (IS_ERR(dentry) || !new_sb)
 		cgroup_put(&root->cgrp);
+
+	if (sb) {
+		/*
+		 * On success kernfs_mount() returns with sb->s_umount held,
+		 * but kernfs_mount() also increases the superblock's refcnt,
+		 * so calling deactivate_super() to drop the refcnt we got when
+		 * looking up cgroup root won't acquire sb->s_umount again.
+		 */
+		WARN_ON(new_sb);
+		deactivate_super(sb);
+	}
 	return dentry;
 }
 
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case
  2014-06-27  7:10 [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
  2014-06-27  7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan
  2014-06-27  7:11 ` [PATCH v2 3/3] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() Li Zefan
@ 2014-06-27  8:16 ` Li Zefan
  2014-06-27  9:13   ` Li Zefan
  2 siblings, 1 reply; 9+ messages in thread
From: Li Zefan @ 2014-06-27  8:16 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

Oh sorry the cut&paste was incomplete. Here's the complete one:

================

From: Li Zefan <lizefan@huawei.com>
Date: Thu, 12 Jun 2014 09:11:00 +0800
Subject: [PATCH v2 1/3] cgroup: fix mount failure in a corner case

  # cat test.sh
  #! /bin/bash

  mount -t cgroup -o cpu xxx /cgroup
  umount /cgroup

  mount -t cgroup -o cpu,cpuacct xxx /cgroup
  umount /cgroup
  # ./test.sh
  mount: xxx already mounted or /cgroup busy
  mount: according to mtab, xxx is already mounted on /cgroup

It's because the cgroupfs_root of the first mount was under destruction
asynchronously.

Fix this by delaying and then retrying mount for this case.

v2:
- use percpu_ref_tryget_live() rather that introducing
  percpu_ref_alive(). (Tejun)
- adjust comment.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1c65f24..6826227 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1648,10 +1648,12 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 			 int flags, const char *unused_dev_name,
 			 void *data)
 {
+	struct cgroup_subsys *ss;
 	struct cgroup_root *root;
 	struct cgroup_sb_opts opts;
 	struct dentry *dentry;
 	int ret;
+	int i, j = -1;
 	bool new_sb;
 
 	/*
@@ -1677,6 +1679,25 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_unlock;
 	}
 
+	/*
+	 * Destruction of cgroup root is asynchronous, so we may fail to
+	 * mount a cgroupfs if it immediately follows a umount. Let's wait
+	 * a little bit and retry.
+	 */
+	for_each_subsys(ss, i) {
+		if (!(opts.subsys_mask & (1 << i)) ||
+		    ss->root === &cgrp_dfl_root)
+			continue;
+
+		if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) {
+			mutex_unlock(&cgroup_mutex);
+			msleep(10);
+			ret = restart_syscall();
+			goto out_free;
+		}
+		j = i;
+	}
+
 	for_each_root(root) {
 		bool name_match = false;
 
@@ -1763,6 +1784,14 @@ out_free:
 	kfree(opts.release_agent);
 	kfree(opts.name);
 
+	for_each_subsys(ss, i) {
+		if (i > j)
+			break;
+		if (!(opts.subsys_mask & (1 << i)))
+			continue;
+		cgroup_put(&ss->root->cgrp);
+	}
+
 	if (ret)
 		return ERR_PTR(ret);
 
-- 
1.8.0.2



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case
  2014-06-27  8:16 ` [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
@ 2014-06-27  9:13   ` Li Zefan
  2014-06-28 11:58     ` Tejun Heo
  0 siblings, 1 reply; 9+ messages in thread
From: Li Zefan @ 2014-06-27  9:13 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

Made a mistake again.. :(


==============

From: Li Zefan <lizefan@huawei.com>
Subject: [PATCH 1/3] cgroup: fix mount failure in a corner case

  # cat test.sh
  #! /bin/bash

  mount -t cgroup -o cpu xxx /cgroup
  umount /cgroup

  mount -t cgroup -o cpu,cpuacct xxx /cgroup
  umount /cgroup
  # ./test.sh
  mount: xxx already mounted or /cgroup busy
  mount: according to mtab, xxx is already mounted on /cgroup

It's because the cgroupfs_root of the first mount was under destruction
asynchronously.

Fix this by delaying and then retrying mount for this case.

v2:
- use percpu_ref_tryget_live() rather that introducing
  percpu_ref_alive(). (Tejun)
- adjust comment.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1c65f24..b94449f 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1648,10 +1648,12 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 			 int flags, const char *unused_dev_name,
 			 void *data)
 {
+	struct cgroup_subsys *ss;
 	struct cgroup_root *root;
 	struct cgroup_sb_opts opts;
 	struct dentry *dentry;
 	int ret;
+	int i, j = -1;
 	bool new_sb;
 
 	/*
@@ -1677,6 +1679,25 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
 		goto out_unlock;
 	}
 
+	/*
+	 * Destruction of cgroup root is asynchronous, so we may fail to
+	 * mount a cgroupfs if it immediately follows a umount. Let's wait
+	 * a little bit and retry.
+	 */
+	for_each_subsys(ss, i) {
+		if (!(opts.subsys_mask & (1 << i)) ||
+		    ss->root == &cgrp_dfl_root)
+			continue;
+
+		if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) {
+			mutex_unlock(&cgroup_mutex);
+			msleep(10);
+			ret = restart_syscall();
+			goto out_free;
+		}
+		j = i;
+	}
+
 	for_each_root(root) {
 		bool name_match = false;
 
@@ -1763,6 +1784,14 @@ out_free:
 	kfree(opts.release_agent);
 	kfree(opts.name);
 
+	for_each_subsys(ss, i) {
+		if (i > j)
+			break;
+		if (!(opts.subsys_mask & (1 << i)))
+			continue;
+		cgroup_put(&ss->root->cgrp);
+	}
+
 	if (ret)
 		return ERR_PTR(ret);
 
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb()
  2014-06-27  7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan
@ 2014-06-27 15:01   ` Tejun Heo
  2014-06-27 17:48     ` Greg Kroah-Hartman
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2014-06-27 15:01 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups, Greg Kroah-Hartman

On Fri, Jun 27, 2014 at 03:10:48PM +0800, Li Zefan wrote:
> kernfs_pin_sb() tries to get a refcnt of the superblock.
> 
> This will be used by cgroupfs.

Greg, this is pretty much cgroup specific due to the way cgroup
dynamically manages multiple hierarchies.  Can I route this through
cgroup/for-3.16-fixes w/ stable cc'd?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb()
  2014-06-27 15:01   ` Tejun Heo
@ 2014-06-27 17:48     ` Greg Kroah-Hartman
  0 siblings, 0 replies; 9+ messages in thread
From: Greg Kroah-Hartman @ 2014-06-27 17:48 UTC (permalink / raw)
  To: Tejun Heo; +Cc: Li Zefan, LKML, Cgroups

On Fri, Jun 27, 2014 at 11:01:41AM -0400, Tejun Heo wrote:
> On Fri, Jun 27, 2014 at 03:10:48PM +0800, Li Zefan wrote:
> > kernfs_pin_sb() tries to get a refcnt of the superblock.
> > 
> > This will be used by cgroupfs.
> 
> Greg, this is pretty much cgroup specific due to the way cgroup
> dynamically manages multiple hierarchies.  Can I route this through
> cgroup/for-3.16-fixes w/ stable cc'd?

Please do:

Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case
  2014-06-27  9:13   ` Li Zefan
@ 2014-06-28 11:58     ` Tejun Heo
  2014-06-30  1:41       ` Li Zefan
  0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2014-06-28 11:58 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, Cgroups

Hello, Li.

On Fri, Jun 27, 2014 at 05:13:12PM +0800, Li Zefan wrote:
> +	for_each_subsys(ss, i) {
> +		if (!(opts.subsys_mask & (1 << i)) ||
> +		    ss->root == &cgrp_dfl_root)
> +			continue;
> +
> +		if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) {
> +			mutex_unlock(&cgroup_mutex);
> +			msleep(10);
> +			ret = restart_syscall();
> +			goto out_free;
> +		}

Why not just put it immediately?  We know that it's not gonna be
destroyed while holding cgroup_mutex.  It may look a bit weird but
this is a pretty special case anyway and deferring put doesn't buy
anything.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case
  2014-06-28 11:58     ` Tejun Heo
@ 2014-06-30  1:41       ` Li Zefan
  0 siblings, 0 replies; 9+ messages in thread
From: Li Zefan @ 2014-06-30  1:41 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, Cgroups

On 2014/6/28 19:58, Tejun Heo wrote:
> Hello, Li.
> 
> On Fri, Jun 27, 2014 at 05:13:12PM +0800, Li Zefan wrote:
>> +	for_each_subsys(ss, i) {
>> +		if (!(opts.subsys_mask & (1 << i)) ||
>> +		    ss->root == &cgrp_dfl_root)
>> +			continue;
>> +
>> +		if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) {
>> +			mutex_unlock(&cgroup_mutex);
>> +			msleep(10);
>> +			ret = restart_syscall();
>> +			goto out_free;
>> +		}
> 
> Why not just put it immediately?  We know that it's not gonna be
> destroyed while holding cgroup_mutex.  It may look a bit weird but
> this is a pretty special case anyway and deferring put doesn't buy
> anything.
> 

Yeah, this is better. :)


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2014-06-30  1:41 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-27  7:10 [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
2014-06-27  7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan
2014-06-27 15:01   ` Tejun Heo
2014-06-27 17:48     ` Greg Kroah-Hartman
2014-06-27  7:11 ` [PATCH v2 3/3] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() Li Zefan
2014-06-27  8:16 ` [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
2014-06-27  9:13   ` Li Zefan
2014-06-28 11:58     ` Tejun Heo
2014-06-30  1:41       ` Li Zefan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).