[PATCH 1/2] cgroup: no need to check css refs for release notification

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

* [PATCH 1/2] cgroup: no need to check css refs for release notification
@ 2013-03-01  7:06 Li Zefan
  2013-03-01  7:06 ` [PATCH 2/2] cgroup: avoid accessing modular cgroup subsys structure without locking Li Zefan
  2013-03-04 18:05 ` [PATCH 1/2] cgroup: no need to check css refs for release notification Tejun Heo
  0 siblings, 2 replies; 4+ messages in thread
From: Li Zefan @ 2013-03-01  7:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, cgroups

We no longer fail rmdir() when there're still css refs, so we don't
need to check css refs in check_for_release().

This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
without cgroup_mutex, so it can race with cgroup_unload_subsys().

cgroup_has_css_refs()
...
  if (ss == NULL || ss->root != cgrp->root)

if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
right after the former check but before the latter, the memory that
net_cls_subsys resides has become invalid.

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 kernel/cgroup.c | 67 +++++++--------------------------------------------------
 1 file changed, 8 insertions(+), 59 deletions(-)

diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index 43ff59e..f4554cc 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4343,47 +4343,6 @@ static int cgroup_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
 	return cgroup_create(c_parent, dentry, mode | S_IFDIR);
 }
 
-/*
- * Check the reference count on each subsystem. Since we already
- * established that there are no tasks in the cgroup, if the css refcount
- * is also 1, then there should be no outstanding references, so the
- * subsystem is safe to destroy. We scan across all subsystems rather than
- * using the per-hierarchy linked list of mounted subsystems since we can
- * be called via check_for_release() with no synchronization other than
- * RCU, and the subsystem linked list isn't RCU-safe.
- */
-static int cgroup_has_css_refs(struct cgroup *cgrp)
-{
-	int i;
-
-	/*
-	 * We won't need to lock the subsys array, because the subsystems
-	 * we're concerned about aren't going anywhere since our cgroup root
-	 * has a reference on them.
-	 */
-	for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
-		struct cgroup_subsys *ss = subsys[i];
-		struct cgroup_subsys_state *css;
-
-		/* Skip subsystems not present or not in this hierarchy */
-		if (ss == NULL || ss->root != cgrp->root)
-			continue;
-
-		css = cgrp->subsys[ss->subsys_id];
-		/*
-		 * When called from check_for_release() it's possible
-		 * that by this point the cgroup has been removed
-		 * and the css deleted. But a false-positive doesn't
-		 * matter, since it can only happen if the cgroup
-		 * has been deleted and hence no longer needs the
-		 * release agent to be called anyway.
-		 */
-		if (css && css_refcnt(css) > 1)
-			return 1;
-	}
-	return 0;
-}
-
 static int cgroup_destroy_locked(struct cgroup *cgrp)
 	__releases(&cgroup_mutex) __acquires(&cgroup_mutex)
 {
@@ -5112,12 +5071,15 @@ static void check_for_release(struct cgroup *cgrp)
 {
 	/* All of these checks rely on RCU to keep the cgroup
 	 * structure alive */
-	if (cgroup_is_releasable(cgrp) && !atomic_read(&cgrp->count)
-	    && list_empty(&cgrp->children) && !cgroup_has_css_refs(cgrp)) {
-		/* Control Group is currently removeable. If it's not
+	if (cgroup_is_releasable(cgrp) &&
+	    !atomic_read(&cgrp->count) && list_empty(&cgrp->children)) {
+		/*
+		 * Control Group is currently removeable. If it's not
 		 * already queued for a userspace notification, queue
-		 * it now */
+		 * it now
+		 */
 		int need_schedule_work = 0;
+
 		raw_spin_lock(&release_list_lock);
 		if (!cgroup_is_removed(cgrp) &&
 		    list_empty(&cgrp->release_list)) {
@@ -5150,24 +5112,11 @@ EXPORT_SYMBOL_GPL(__css_tryget);
 /* Caller must verify that the css is not for root cgroup */
 void __css_put(struct cgroup_subsys_state *css)
 {
-	struct cgroup *cgrp = css->cgroup;
 	int v;
 
-	rcu_read_lock();
 	v = css_unbias_refcnt(atomic_dec_return(&css->refcnt));
-
-	switch (v) {
-	case 1:
-		if (notify_on_release(cgrp)) {
-			set_bit(CGRP_RELEASABLE, &cgrp->flags);
-			check_for_release(cgrp);
-		}
-		break;
-	case 0:
+	if (v == 0)
 		schedule_work(&css->dput_work);
-		break;
-	}
-	rcu_read_unlock();
 }
 EXPORT_SYMBOL_GPL(__css_put);
 
-- 
1.8.0.2

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* [PATCH 2/2] cgroup: avoid accessing modular cgroup subsys structure without locking
  2013-03-01  7:06 [PATCH 1/2] cgroup: no need to check css refs for release notification Li Zefan
@ 2013-03-01  7:06 ` Li Zefan
  2013-03-04 18:04   ` Tejun Heo
  2013-03-04 18:05 ` [PATCH 1/2] cgroup: no need to check css refs for release notification Tejun Heo
  1 sibling, 1 reply; 4+ messages in thread
From: Li Zefan @ 2013-03-01  7:06 UTC (permalink / raw)
  To: Tejun Heo; +Cc: LKML, cgroups

subsys[i] is set to NULL in cgroup_unload_subsys() at modular unload,
and that's protected by cgroup_mutex, and then the memory *subsys[i]
resides will be freed.

So this is unsafe without any locking:

  if (!ss || ss->module)
  ...

Signed-off-by: Li Zefan <lizefan@huawei.com>
---
 include/linux/cgroup.h | 11 +++++++++--
 kernel/cgroup.c        | 32 ++++++++++++++++++--------------
 2 files changed, 27 insertions(+), 16 deletions(-)

diff --git a/include/linux/cgroup.h b/include/linux/cgroup.h
index 75c6ec1..3ac6bb0 100644
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -46,12 +46,19 @@ extern const struct file_operations proc_cgroup_operations;
 
 /* Define the enumeration of all builtin cgroup subsystems */
 #define SUBSYS(_x) _x ## _subsys_id,
-#define IS_SUBSYS_ENABLED(option) IS_ENABLED(option)
 enum cgroup_subsys_id {
+#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
 #include <linux/cgroup_subsys.h>
+#undef IS_SUBSYS_ENABLED
+	CGROUP_BUILTIN_SUBSYS_COUNT,
+
+	__CGROUP_SUBSYS_TEMP_PLACEHOLDER = CGROUP_BUILTIN_SUBSYS_COUNT - 1,
+
+#define IS_SUBSYS_ENABLED(option) IS_MODULE(option)
+#include <linux/cgroup_subsys.h>
+#undef IS_SUBSYS_ENABLED
 	CGROUP_SUBSYS_COUNT,
 };
-#undef IS_SUBSYS_ENABLED
 #undef SUBSYS
 
 /* Per-subsystem/per-cgroup state maintained by the system. */
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f4554cc..29273db 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -4944,17 +4944,17 @@ void cgroup_post_fork(struct task_struct *child)
 	 * and addition to css_set.
 	 */
 	if (need_forkexit_callback) {
-		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+		/*
+		 * fork/exit callbacks are supported only for builtin
+		 * subsystems, and the builtin section of the subsys
+		 * array is immutable, so we don't need to lock the
+		 * subsys array here. On the other hand, modular section
+		 * of the array can be freed at module unload, so we
+		 * can't touch that.
+		 */
+		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
 			struct cgroup_subsys *ss = subsys[i];
 
-			/*
-			 * fork/exit callbacks are supported only for
-			 * builtin subsystems and we don't need further
-			 * synchronization as they never go away.
-			 */
-			if (!ss || ss->module)
-				continue;
-
 			if (ss->fork)
 				ss->fork(child);
 		}
@@ -5019,13 +5019,17 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
 	tsk->cgroups = &init_css_set;
 
 	if (run_callbacks && need_forkexit_callback) {
-		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
+		/*
+		 * fork/exit callbacks are supported only for builtin
+		 * subsystems, and the builtin section of the subsys
+		 * array is immutable, so we don't need to lock the
+		 * subsys array here. On the other hand, modular section
+		 * of the array can be freed at module unload, so we
+		 * can't touch that.
+		 */
+		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {
 			struct cgroup_subsys *ss = subsys[i];
 
-			/* modular subsystems can't use callbacks */
-			if (!ss || ss->module)
-				continue;
-
 			if (ss->exit) {
 				struct cgroup *old_cgrp =
 					rcu_dereference_raw(cg->subsys[i])->cgroup;
-- 
1.8.0.2


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [PATCH 2/2] cgroup: avoid accessing modular cgroup subsys structure without locking
  2013-03-01  7:06 ` [PATCH 2/2] cgroup: avoid accessing modular cgroup subsys structure without locking Li Zefan
@ 2013-03-04 18:04   ` Tejun Heo
  0 siblings, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2013-03-04 18:04 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, cgroups

Hello, Li.

On Fri, Mar 01, 2013 at 03:06:36PM +0800, Li Zefan wrote:
>  /* Define the enumeration of all builtin cgroup subsystems */
>  #define SUBSYS(_x) _x ## _subsys_id,
> -#define IS_SUBSYS_ENABLED(option) IS_ENABLED(option)
>  enum cgroup_subsys_id {
> +#define IS_SUBSYS_ENABLED(option) IS_BUILTIN(option)
>  #include <linux/cgroup_subsys.h>
> +#undef IS_SUBSYS_ENABLED
> +	CGROUP_BUILTIN_SUBSYS_COUNT,
> +
> +	__CGROUP_SUBSYS_TEMP_PLACEHOLDER = CGROUP_BUILTIN_SUBSYS_COUNT - 1,
> +
> +#define IS_SUBSYS_ENABLED(option) IS_MODULE(option)
> +#include <linux/cgroup_subsys.h>
> +#undef IS_SUBSYS_ENABLED
>  	CGROUP_SUBSYS_COUNT,
>  };
> -#undef IS_SUBSYS_ENABLED
>  #undef SUBSYS

Arghh.... can we at least have a comment explaining what we're doing
here?  It's ugly and confusing.

> @@ -5019,13 +5019,17 @@ void cgroup_exit(struct task_struct *tsk, int run_callbacks)
>  	tsk->cgroups = &init_css_set;
>  
>  	if (run_callbacks && need_forkexit_callback) {
> -		for (i = 0; i < CGROUP_SUBSYS_COUNT; i++) {
> +		/*
> +		 * fork/exit callbacks are supported only for builtin
> +		 * subsystems, and the builtin section of the subsys
> +		 * array is immutable, so we don't need to lock the
> +		 * subsys array here. On the other hand, modular section
> +		 * of the array can be freed at module unload, so we
> +		 * can't touch that.
> +		 */
> +		for (i = 0; i < CGROUP_BUILTIN_SUBSYS_COUNT; i++) {

Probably enough to say "for/exit callback are supported only for
builtin subsys, see cgroup_for() for details"?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [PATCH 1/2] cgroup: no need to check css refs for release notification
  2013-03-01  7:06 [PATCH 1/2] cgroup: no need to check css refs for release notification Li Zefan
  2013-03-01  7:06 ` [PATCH 2/2] cgroup: avoid accessing modular cgroup subsys structure without locking Li Zefan
@ 2013-03-04 18:05 ` Tejun Heo
  1 sibling, 0 replies; 4+ messages in thread
From: Tejun Heo @ 2013-03-04 18:05 UTC (permalink / raw)
  To: Li Zefan; +Cc: LKML, cgroups

On Fri, Mar 01, 2013 at 03:06:07PM +0800, Li Zefan wrote:
> We no longer fail rmdir() when there're still css refs, so we don't
> need to check css refs in check_for_release().
> 
> This also voids a bug. cgroup_has_css_refs() accesses subsys[i]
> without cgroup_mutex, so it can race with cgroup_unload_subsys().
> 
> cgroup_has_css_refs()
> ...
>   if (ss == NULL || ss->root != cgrp->root)
> 
> if ss pointers to net_cls_subsys, and cls_cgroup module is unloaded
> right after the former check but before the latter, the memory that
> net_cls_subsys resides has become invalid.
> 
> Signed-off-by: Li Zefan <lizefan@huawei.com>

Applied to cgroup/for-3.10.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-03-04 18:05 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-03-01  7:06 [PATCH 1/2] cgroup: no need to check css refs for release notification Li Zefan
2013-03-01  7:06 ` [PATCH 2/2] cgroup: avoid accessing modular cgroup subsys structure without locking Li Zefan
2013-03-04 18:04   ` Tejun Heo
2013-03-04 18:05 ` [PATCH 1/2] cgroup: no need to check css refs for release notification Tejun Heo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox