* [PATCH v2 1/3] cgroup: fix mount failure in a corner case
@ 2014-06-27 7:10 Li Zefan
2014-06-27 7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan
` (2 more replies)
0 siblings, 3 replies; 9+ messages in thread
From: Li Zefan @ 2014-06-27 7:10 UTC (permalink / raw)
To: Tejun Heo; +Cc: LKML, Cgroups
# cat test.sh
#! /bin/bash
mount -t cgroup -o cpu xxx /cgroup
umount /cgroup
mount -t cgroup -o cpu,cpuacct xxx /cgroup
umount /cgroup
# ./test.sh
mount: xxx already mounted or /mnt busy
mount: according to mtab, xxx is already mounted on /mnt
It's because the cgroupfs_root of the first mount was under destruction
asynchronously.
Fix this by delaying and then retrying mount for this case.
v2:
- use percpu_ref_tryget_live() rather that introducing
percpu_ref_alive(). (Tejun)
- adjust comment.
Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
---
kernel/cgroup.c | 27 +++++++++++++++++++++++++++
1 file changed, 27 insertions(+)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 1c65f24..ae2b382 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1648,10 +1648,12 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
int flags, const char *unused_dev_name,
void *data)
{
+ struct cgroup_subsys *ss;
struct cgroup_root *root;
struct cgroup_sb_opts opts;
struct dentry *dentry;
int ret;
+ int i, j = -1;
bool new_sb;
/*
@@ -1677,6 +1679,23 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type,
goto out_unlock;
}
+ /*
+ * Destruction of cgroup root is asynchronous, so we may fail to
+ * mount a cgroupfs if it immediately follows a umount. Let's wait
+ * a little bit and retry.
+ */
+ for_each_subsys(ss, i) {
+ if (!(opts.subsys_mask & (1 << i)))
+ continue;
+ if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) {
+ mutex_unlock(&cgroup_mutex);
+ msleep(10);
+ ret = restart_syscall
^ permalink raw reply related [flat|nested] 9+ messages in thread* [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() 2014-06-27 7:10 [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan @ 2014-06-27 7:10 ` Li Zefan [not found] ` <53AD18F8.4050501-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> [not found] ` <53AD18D0.3090100-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 2014-06-27 8:16 ` [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan 2 siblings, 1 reply; 9+ messages in thread From: Li Zefan @ 2014-06-27 7:10 UTC (permalink / raw) To: Tejun Heo; +Cc: LKML, Cgroups, Greg Kroah-Hartman kernfs_pin_sb() tries to get a refcnt of the superblock. This will be used by cgroupfs. v2: - make kernfs_pin_sb() return pointer to the superblock. - drop kernfs_drop_sb(). Signed-off-by: Li Zefan <lizefan@huawei.com> --- fs/kernfs/mount.c | 27 +++++++++++++++++++++++++++ include/linux/kernfs.h | 1 + 2 files changed, 28 insertions(+) diff --git a/fs/kernfs/mount.c b/fs/kernfs/mount.c index f25a7c0..616c5c4 100644 --- a/fs/kernfs/mount.c +++ b/fs/kernfs/mount.c @@ -210,6 +210,33 @@ void kernfs_kill_sb(struct super_block *sb) kernfs_put(root_kn); } +/** + * kernfs_pin_sb: try to pin the superblock associated with a kernfs_root + * @kernfs_root: the kernfs_root in question + * @ns: the namespace tag + * + * Pin the superblock so the superblock won't be destroyed in subsequent + * operations. Return NULL if there's no superblock associated to this + * kernfs_root, or -EINVAL if the superblock is being freed. + */ +struct super_block *kernfs_pin_sb(struct kernfs_root *root, const void *ns) +{ + struct kernfs_super_info *info; + struct super_block *sb = NULL; + + mutex_lock(&kernfs_mutex); + list_for_each_entry(info, &root->supers, node) { + if (info->ns == ns) { + sb = info->sb; + if (!atomic_inc_not_zero(&info->sb->s_active)) + sb = ERR_PTR(-EINVAL); + break; + } + } + mutex_unlock(&kernfs_mutex); + return sb; +} + void __init kernfs_init(void) { kernfs_node_cache = kmem_cache_create("kernfs_node_cache", diff --git a/include/linux/kernfs.h b/include/linux/kernfs.h index 589318b..9096296 100644 --- a/include/linux/kernfs.h +++ b/include/linux/kernfs.h @@ -287,6 +287,7 @@ struct dentry *kernfs_mount_ns(struct file_system_type *fs_type, int flags, struct kernfs_root *root, bool *new_sb_created, const void *ns); void kernfs_kill_sb(struct super_block *sb); +struct super_block *kernfs_pin_sb(struct kernfs_root *root, const void *ns); void kernfs_init(void); -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <53AD18F8.4050501-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() [not found] ` <53AD18F8.4050501-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2014-06-27 15:01 ` Tejun Heo [not found] ` <20140627150141.GB4044-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Tejun Heo @ 2014-06-27 15:01 UTC (permalink / raw) To: Li Zefan; +Cc: LKML, Cgroups, Greg Kroah-Hartman On Fri, Jun 27, 2014 at 03:10:48PM +0800, Li Zefan wrote: > kernfs_pin_sb() tries to get a refcnt of the superblock. > > This will be used by cgroupfs. Greg, this is pretty much cgroup specific due to the way cgroup dynamically manages multiple hierarchies. Can I route this through cgroup/for-3.16-fixes w/ stable cc'd? Thanks. -- tejun ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20140627150141.GB4044-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() [not found] ` <20140627150141.GB4044-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> @ 2014-06-27 17:48 ` Greg Kroah-Hartman 0 siblings, 0 replies; 9+ messages in thread From: Greg Kroah-Hartman @ 2014-06-27 17:48 UTC (permalink / raw) To: Tejun Heo; +Cc: Li Zefan, LKML, Cgroups On Fri, Jun 27, 2014 at 11:01:41AM -0400, Tejun Heo wrote: > On Fri, Jun 27, 2014 at 03:10:48PM +0800, Li Zefan wrote: > > kernfs_pin_sb() tries to get a refcnt of the superblock. > > > > This will be used by cgroupfs. > > Greg, this is pretty much cgroup specific due to the way cgroup > dynamically manages multiple hierarchies. Can I route this through > cgroup/for-3.16-fixes w/ stable cc'd? Please do: Acked-by: Greg Kroah-Hartman <gregkh-hQyY1W1yCW8ekmWlsbkhG0B+6BGkLq7r@public.gmane.org> ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <53AD18D0.3090100-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* [PATCH v2 3/3] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() [not found] ` <53AD18D0.3090100-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2014-06-27 7:11 ` Li Zefan 0 siblings, 0 replies; 9+ messages in thread From: Li Zefan @ 2014-06-27 7:11 UTC (permalink / raw) To: Tejun Heo; +Cc: LKML, Cgroups, Greg Kroah-Hartman We've converted cgroup to kernfs so cgroup won't be intertwined with vfs objects and locking, but there are dark areas. Run two instances of this script concurrently: for ((; ;)) { mount -t cgroup -o cpuacct xxx /cgroup umount /cgroup } After a while, I saw two mount processes were stuck at retrying, because they were waiting for a subsystem to become free, but the root associated with this subsystem never got freed. This can happen, if thread A is in the process of killing superblock but hasn't called percpu_ref_kill(), and at this time thread B is mounting the same cgroup root and finds the root in the root list and performs percpu_ref_try_get(). To fix this, we try to increase both the refcnt of the superblock and the percpu refcnt of cgroup root. v2: - we should try to get both the superblock refcnt and cgroup_root refcnt, because cgroup_root may have no superblock assosiated with it. - adjust/add comments. Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> --- kernel/cgroup.c | 28 ++++++++++++++++++++++------ 1 file changed, 22 insertions(+), 6 deletions(-) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index ae2b382..111b7c3 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1655,6 +1655,7 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, int ret; int i, j = -1; bool new_sb; + struct super_block *sb = NULL; /* * The first time anyone tries to mount a cgroup, enable the list @@ -1737,14 +1738,18 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, /* * A root's lifetime is governed by its root cgroup. - * tryget_live failure indicate that the root is being - * destroyed. Wait for destruction to complete so that the - * subsystems are free. We can use wait_queue for the wait - * but this path is super cold. Let's just sleep for a bit - * and retry. + * pin_sb and tryget_live failure indicate that the root is + * being destroyed. Wait for destruction to complete so that + * the subsystems are free. We can use wait_queue for the + * wait but this path is super cold. Let's just sleep for + * a bit and retry. */ - if (!percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { + sb = kernfs_pin_sb(root->kf_root, NULL); + if (IS_ERR(sb) || + !percpu_ref_tryget_live(&root->cgrp.self.refcnt)) { mutex_unlock(&cgroup_mutex); + if (!IS_ERR_OR_NULL(sb)) + deactivate_super(sb); msleep(10); ret = restart_syscall(); goto out_free; @@ -1796,6 +1801,17 @@ out_free: dentry = kernfs_mount(fs_type, flags, root->kf_root, &new_sb); if (IS_ERR(dentry) || !new_sb) cgroup_put(&root->cgrp); + + if (sb) { + /* + * On success kernfs_mount() returns with sb->s_umount held, + * but kernfs_mount() also increases the superblock's refcnt, + * so calling deactivate_super() to drop the refcnt we got when + * looking up cgroup root won't acquire sb->s_umount again. + */ + WARN_ON(new_sb); + deactivate_super(sb); + } return dentry; } -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case 2014-06-27 7:10 [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan 2014-06-27 7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan [not found] ` <53AD18D0.3090100-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2014-06-27 8:16 ` Li Zefan [not found] ` <53AD2852.2060304-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 2 siblings, 1 reply; 9+ messages in thread From: Li Zefan @ 2014-06-27 8:16 UTC (permalink / raw) To: Tejun Heo; +Cc: LKML, Cgroups Oh sorry the cut&paste was incomplete. Here's the complete one: ================ From: Li Zefan <lizefan@huawei.com> Date: Thu, 12 Jun 2014 09:11:00 +0800 Subject: [PATCH v2 1/3] cgroup: fix mount failure in a corner case # cat test.sh #! /bin/bash mount -t cgroup -o cpu xxx /cgroup umount /cgroup mount -t cgroup -o cpu,cpuacct xxx /cgroup umount /cgroup # ./test.sh mount: xxx already mounted or /cgroup busy mount: according to mtab, xxx is already mounted on /cgroup It's because the cgroupfs_root of the first mount was under destruction asynchronously. Fix this by delaying and then retrying mount for this case. v2: - use percpu_ref_tryget_live() rather that introducing percpu_ref_alive(). (Tejun) - adjust comment. Signed-off-by: Li Zefan <lizefan@huawei.com> --- kernel/cgroup.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 1c65f24..6826227 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1648,10 +1648,12 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, int flags, const char *unused_dev_name, void *data) { + struct cgroup_subsys *ss; struct cgroup_root *root; struct cgroup_sb_opts opts; struct dentry *dentry; int ret; + int i, j = -1; bool new_sb; /* @@ -1677,6 +1679,25 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, goto out_unlock; } + /* + * Destruction of cgroup root is asynchronous, so we may fail to + * mount a cgroupfs if it immediately follows a umount. Let's wait + * a little bit and retry. + */ + for_each_subsys(ss, i) { + if (!(opts.subsys_mask & (1 << i)) || + ss->root === &cgrp_dfl_root) + continue; + + if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) { + mutex_unlock(&cgroup_mutex); + msleep(10); + ret = restart_syscall(); + goto out_free; + } + j = i; + } + for_each_root(root) { bool name_match = false; @@ -1763,6 +1784,14 @@ out_free: kfree(opts.release_agent); kfree(opts.name); + for_each_subsys(ss, i) { + if (i > j) + break; + if (!(opts.subsys_mask & (1 << i))) + continue; + cgroup_put(&ss->root->cgrp); + } + if (ret) return ERR_PTR(ret); -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <53AD2852.2060304-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case [not found] ` <53AD2852.2060304-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2014-06-27 9:13 ` Li Zefan [not found] ` <53AD35A8.7030908-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Li Zefan @ 2014-06-27 9:13 UTC (permalink / raw) To: Tejun Heo; +Cc: LKML, Cgroups Made a mistake again.. :( ============== From: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> Subject: [PATCH 1/3] cgroup: fix mount failure in a corner case # cat test.sh #! /bin/bash mount -t cgroup -o cpu xxx /cgroup umount /cgroup mount -t cgroup -o cpu,cpuacct xxx /cgroup umount /cgroup # ./test.sh mount: xxx already mounted or /cgroup busy mount: according to mtab, xxx is already mounted on /cgroup It's because the cgroupfs_root of the first mount was under destruction asynchronously. Fix this by delaying and then retrying mount for this case. v2: - use percpu_ref_tryget_live() rather that introducing percpu_ref_alive(). (Tejun) - adjust comment. Signed-off-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> --- kernel/cgroup.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/kernel/cgroup.c b/kernel/cgroup.c index 1c65f24..b94449f 100644 --- a/kernel/cgroup.c +++ b/kernel/cgroup.c @@ -1648,10 +1648,12 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, int flags, const char *unused_dev_name, void *data) { + struct cgroup_subsys *ss; struct cgroup_root *root; struct cgroup_sb_opts opts; struct dentry *dentry; int ret; + int i, j = -1; bool new_sb; /* @@ -1677,6 +1679,25 @@ static struct dentry *cgroup_mount(struct file_system_type *fs_type, goto out_unlock; } + /* + * Destruction of cgroup root is asynchronous, so we may fail to + * mount a cgroupfs if it immediately follows a umount. Let's wait + * a little bit and retry. + */ + for_each_subsys(ss, i) { + if (!(opts.subsys_mask & (1 << i)) || + ss->root == &cgrp_dfl_root) + continue; + + if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) { + mutex_unlock(&cgroup_mutex); + msleep(10); + ret = restart_syscall(); + goto out_free; + } + j = i; + } + for_each_root(root) { bool name_match = false; @@ -1763,6 +1784,14 @@ out_free: kfree(opts.release_agent); kfree(opts.name); + for_each_subsys(ss, i) { + if (i > j) + break; + if (!(opts.subsys_mask & (1 << i))) + continue; + cgroup_put(&ss->root->cgrp); + } + if (ret) return ERR_PTR(ret); -- 1.8.0.2 ^ permalink raw reply related [flat|nested] 9+ messages in thread
[parent not found: <53AD35A8.7030908-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>]
* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case [not found] ` <53AD35A8.7030908-hv44wF8Li93QT0dZR+AlfA@public.gmane.org> @ 2014-06-28 11:58 ` Tejun Heo [not found] ` <20140628115743.GB10829-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> 0 siblings, 1 reply; 9+ messages in thread From: Tejun Heo @ 2014-06-28 11:58 UTC (permalink / raw) To: Li Zefan; +Cc: LKML, Cgroups Hello, Li. On Fri, Jun 27, 2014 at 05:13:12PM +0800, Li Zefan wrote: > + for_each_subsys(ss, i) { > + if (!(opts.subsys_mask & (1 << i)) || > + ss->root == &cgrp_dfl_root) > + continue; > + > + if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) { > + mutex_unlock(&cgroup_mutex); > + msleep(10); > + ret = restart_syscall(); > + goto out_free; > + } Why not just put it immediately? We know that it's not gonna be destroyed while holding cgroup_mutex. It may look a bit weird but this is a pretty special case anyway and deferring put doesn't buy anything. Thanks. -- tejun ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <20140628115743.GB10829-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>]
* Re: [PATCH v2 1/3] cgroup: fix mount failure in a corner case [not found] ` <20140628115743.GB10829-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org> @ 2014-06-30 1:41 ` Li Zefan 0 siblings, 0 replies; 9+ messages in thread From: Li Zefan @ 2014-06-30 1:41 UTC (permalink / raw) To: Tejun Heo; +Cc: LKML, Cgroups On 2014/6/28 19:58, Tejun Heo wrote: > Hello, Li. > > On Fri, Jun 27, 2014 at 05:13:12PM +0800, Li Zefan wrote: >> + for_each_subsys(ss, i) { >> + if (!(opts.subsys_mask & (1 << i)) || >> + ss->root == &cgrp_dfl_root) >> + continue; >> + >> + if (!percpu_ref_tryget_live(&ss->root->cgrp.self.refcnt)) { >> + mutex_unlock(&cgroup_mutex); >> + msleep(10); >> + ret = restart_syscall(); >> + goto out_free; >> + } > > Why not just put it immediately? We know that it's not gonna be > destroyed while holding cgroup_mutex. It may look a bit weird but > this is a pretty special case anyway and deferring put doesn't buy > anything. > Yeah, this is better. :) ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2014-06-30 1:41 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-06-27 7:10 [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
2014-06-27 7:10 ` [PATCH v2 2/3] kernfs: introduce kernfs_pin_sb() Li Zefan
[not found] ` <53AD18F8.4050501-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-27 15:01 ` Tejun Heo
[not found] ` <20140627150141.GB4044-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-06-27 17:48 ` Greg Kroah-Hartman
[not found] ` <53AD18D0.3090100-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-27 7:11 ` [PATCH v2 3/3] cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() Li Zefan
2014-06-27 8:16 ` [PATCH v2 1/3] cgroup: fix mount failure in a corner case Li Zefan
[not found] ` <53AD2852.2060304-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-27 9:13 ` Li Zefan
[not found] ` <53AD35A8.7030908-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2014-06-28 11:58 ` Tejun Heo
[not found] ` <20140628115743.GB10829-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2014-06-30 1:41 ` Li Zefan
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox