* [PATCH cgroup/for-3.11 2/3] cgroup: fix RCU accesses around task->cgroups
[not found] ` <20130621225116.GC3949-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
@ 2013-06-21 22:52 ` Tejun Heo
[not found] ` <20130621225204.GD3949-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
[not found] ` <20130621225233.GE3949@htj.dyndns.org>
2013-06-25 1:55 ` [PATCH cgroup/for-3.11 1/3] cgroup: fix RCU accesses to task->cgroups Li Zefan
` (2 subsequent siblings)
3 siblings, 2 replies; 10+ messages in thread
From: Tejun Heo @ 2013-06-21 22:52 UTC (permalink / raw)
To: Li Zefan
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Fengguang Wu
There are several places in kernel/cgroup.c where task->cgroups is
accessed and modified without going through proper RCU accessors.
None is broken as they're all lock protected accesses; however, this
still triggers sparse RCU address space warnings.
* Consistently use task_css_set() for task->cgroups dereferencing.
* Use RCU_INIT_POINTER() to clear task->cgroups to &init_css_set on
exit.
* Remove unnecessary rcu_dereference_raw() from cset->subsys[]
dereference in cgroup_exit().
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reported-by: Fengguang Wu <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
kernel/cgroup.c | 24 +++++++++++++-----------
1 file changed, 13 insertions(+), 11 deletions(-)
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -704,7 +704,7 @@ static struct cgroup *task_cgroup_from_r
* task can't change groups, so the only thing that can happen
* is that it exits and its css is set back to init_css_set.
*/
- cset = task->cgroups;
+ cset = task_css_set(task);
if (cset == &init_css_set) {
res = &root->top_cgroup;
} else {
@@ -1948,7 +1948,7 @@ static void cgroup_task_migrate(struct c
* css_set to init_css_set and dropping the old one.
*/
WARN_ON_ONCE(tsk->flags & PF_EXITING);
- old_cset = tsk->cgroups;
+ old_cset = task_css_set(tsk);
task_lock(tsk);
rcu_assign_pointer(tsk->cgroups, new_cset);
@@ -2071,8 +2071,11 @@ static int cgroup_attach_task(struct cgr
* we use find_css_set, which allocates a new one if necessary.
*/
for (i = 0; i < group_size; i++) {
+ struct css_set *old_cset;
+
tc = flex_array_get(group, i);
- tc->cg = find_css_set(tc->task->cgroups, cgrp);
+ old_cset = task_css_set(tc->task);
+ tc->cg = find_css_set(old_cset, cgrp);
if (!tc->cg) {
retval = -ENOMEM;
goto out_put_css_set_refs;
@@ -2989,7 +2992,7 @@ static void cgroup_enable_task_cg_lists(
* entry won't be deleted though the process has exited.
*/
if (!(p->flags & PF_EXITING) && list_empty(&p->cg_list))
- list_add(&p->cg_list, &p->cgroups->tasks);
+ list_add(&p->cg_list, &task_css_set(p)->tasks);
task_unlock(p);
} while_each_thread(g, p);
read_unlock(&tasklist_lock);
@@ -5046,8 +5049,8 @@ static const struct file_operations proc
void cgroup_fork(struct task_struct *child)
{
task_lock(current);
+ get_css_set(task_css_set(current));
child->cgroups = current->cgroups;
- get_css_set(child->cgroups);
task_unlock(current);
INIT_LIST_HEAD(&child->cg_list);
}
@@ -5081,7 +5084,7 @@ void cgroup_post_fork(struct task_struct
write_lock(&css_set_lock);
task_lock(child);
if (list_empty(&child->cg_list))
- list_add(&child->cg_list, &child->cgroups->tasks);
+ list_add(&child->cg_list, &task_css_set(child)->tasks);
task_unlock(child);
write_unlock(&css_set_lock);
}
@@ -5163,8 +5166,8 @@ void cgroup_exit(struct task_struct *tsk
/* Reassign the task to the init_css_set. */
task_lock(tsk);
- cset = tsk->cgroups;
- tsk->cgroups = &init_css_set;
+ cset = task_css_set(tsk);
+ RCU_INIT_POINTER(tsk->cgroups, &init_css_set);
if (run_callbacks && need_forkexit_callback) {
/*
@@ -5175,8 +5178,7 @@ void cgroup_exit(struct task_struct *tsk
struct cgroup_subsys *ss = subsys[i];
if (ss->exit) {
- struct cgroup *old_cgrp =
- rcu_dereference_raw(cset->subsys[i])->cgroup;
+ struct cgroup *old_cgrp = cset->subsys[i]->cgroup;
struct cgroup *cgrp = task_cgroup(tsk, i);
ss->exit(cgrp, old_cgrp, tsk);
}
@@ -5546,7 +5548,7 @@ static u64 current_css_set_refcount_read
u64 count;
rcu_read_lock();
- count = atomic_read(¤t->cgroups->refcount);
+ count = atomic_read(&task_css_set(current)->refcount);
rcu_read_unlock();
return count;
}
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH cgroup/for-3.11 1/3] cgroup: fix RCU accesses to task->cgroups
[not found] ` <20130621225116.GC3949-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-21 22:52 ` [PATCH cgroup/for-3.11 2/3] cgroup: fix RCU accesses around task->cgroups Tejun Heo
@ 2013-06-25 1:55 ` Li Zefan
2013-06-25 18:48 ` [PATCH v2 " Tejun Heo
2013-06-26 17:49 ` [PATCH " Tejun Heo
3 siblings, 0 replies; 10+ messages in thread
From: Li Zefan @ 2013-06-25 1:55 UTC (permalink / raw)
To: Tejun Heo
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Fengguang Wu
On 2013/6/22 6:51, Tejun Heo wrote:
> task->cgroups is a RCU pointer pointing to struct css_set. A task
> switches to a different css_set on cgroup migration but a css_set
> doesn't change once created and its pointers to cgroup_subsys_states
> aren't RCU protected.
>
> task_subsys_state[_check]() is the macro to acquire css given a task
> and subsys_id pair. It RCU-dereferences task->cgroups->subsys[] not
> task->cgroups, so the RCU pointer task->cgroups ends up being
> dereferenced without read_barrier_depends() after it. It's broken.
>
> Fix it by introducing task_css_set[_check]() which does
> RCU-dereference on task->cgroups. task_subsys_state[_check]() is
> reimplemented to directly dereference ->subsys[] of the css_set
> returned from task_css_set[_check]().
>
> This removes some of sparse RCU warnings in cgroup.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Reported-by: Fengguang Wu <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
> Hello,
>
> Three RCU fixe patches. The first one fixes an actual bug. The other
> two add missing annoations so that sparse doesn't generate spurious
> RCU address space warnings.
>
> Thanks!
>
> include/linux/cgroup.h | 58 ++++++++++++++++++++++++++++++++++++++++---------
> 1 file changed, 48 insertions(+), 10 deletions(-)
>
> --- a/include/linux/cgroup.h
> +++ b/include/linux/cgroup.h
> @@ -638,22 +638,60 @@ static inline struct cgroup_subsys_state
> return cgrp->subsys[subsys_id];
> }
>
> -/*
> - * function to get the cgroup_subsys_state which allows for extra
> - * rcu_dereference_check() conditions, such as locks used during the
> - * cgroup_subsys::attach() methods.
> +/**
> + * task_css_set_check - obtain a task's css_set with extra access conditions
> + * @task: the task to obtain css_set for
> + * @__c: extra condition expression to be passed to rcu_dereference_check()
> + *
> + * A task's css_set is RCU protected, initialized and exited while holding
> + * task_lock(), and can only be modified while holding both cgroup_mutex
> + * and task_lock() while the task is alive. This macro verifies that the
> + * caller is inside proper critical section and returns @task's css_set.
> + *
> + * The caller can also specify additional allowed conditions via @__c, such
> + * as locks used during the cgroup_subsys::attach() methods.
> */
> #ifdef CONFIG_PROVE_RCU
> extern struct mutex cgroup_mutex;
> -#define task_subsys_state_check(task, subsys_id, __c) \
> - rcu_dereference_check((task)->cgroups->subsys[(subsys_id)], \
> - lockdep_is_held(&(task)->alloc_lock) || \
> - lockdep_is_held(&cgroup_mutex) || (__c))
> +#define task_css_set_check(task, __c) \
> + rcu_dereference_check((task)->cgroups, \
> + lockdep_is_held(&(task)->alloc_lock) || \
> + lockdep_is_held(&cgroup_mutex) || (__c))
> #else
> -#define task_subsys_state_check(task, subsys_id, __c) \
> - rcu_dereference((task)->cgroups->subsys[(subsys_id)])
> +#define task_css_set_check(task, __c) \
> + rcu_dereference_raw((task)->cgroups
parenthesis unmatched.
and why not just use rcu_dereference()? I guess it should be equivalent to
rcu_derference_raw() if CONFIG_PROVE_RCU=n ?
> #endif
>
^ permalink raw reply [flat|nested] 10+ messages in thread* [PATCH v2 cgroup/for-3.11 1/3] cgroup: fix RCU accesses to task->cgroups
[not found] ` <20130621225116.GC3949-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-06-21 22:52 ` [PATCH cgroup/for-3.11 2/3] cgroup: fix RCU accesses around task->cgroups Tejun Heo
2013-06-25 1:55 ` [PATCH cgroup/for-3.11 1/3] cgroup: fix RCU accesses to task->cgroups Li Zefan
@ 2013-06-25 18:48 ` Tejun Heo
[not found] ` <20130625184832.GC20051-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-06-26 17:49 ` [PATCH " Tejun Heo
3 siblings, 1 reply; 10+ messages in thread
From: Tejun Heo @ 2013-06-25 18:48 UTC (permalink / raw)
To: Tejun Heo
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Fengguang Wu
task->cgroups is a RCU pointer pointing to struct css_set. A task
switches to a different css_set on cgroup migration but a css_set
doesn't change once created and its pointers to cgroup_subsys_states
aren't RCU protected.
task_subsys_state[_check]() is the macro to acquire css given a task
and subsys_id pair. It RCU-dereferences task->cgroups->subsys[] not
task->cgroups, so the RCU pointer task->cgroups ends up being
dereferenced without read_barrier_depends() after it. It's broken.
Fix it by introducing task_css_set[_check]() which does
RCU-dereference on task->cgroups. task_subsys_state[_check]() is
reimplemented to directly dereference ->subsys[] of the css_set
returned from task_css_set[_check]().
This removes some of sparse RCU warnings in cgroup.
v2: Fixed unbalanced parenthsis and there's no need to use
rcu_dereference_raw() when !CONFIG_PROVE_RCU. Both spotted by Li.
Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Reported-by: Fengguang Wu <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
---
include/linux/cgroup.h | 58 ++++++++++++++++++++++++++++++++++++++++---------
1 file changed, 48 insertions(+), 10 deletions(-)
--- a/include/linux/cgroup.h
+++ b/include/linux/cgroup.h
@@ -634,22 +634,60 @@ static inline struct cgroup_subsys_state
return cgrp->subsys[subsys_id];
}
-/*
- * function to get the cgroup_subsys_state which allows for extra
- * rcu_dereference_check() conditions, such as locks used during the
- * cgroup_subsys::attach() methods.
+/**
+ * task_css_set_check - obtain a task's css_set with extra access conditions
+ * @task: the task to obtain css_set for
+ * @__c: extra condition expression to be passed to rcu_dereference_check()
+ *
+ * A task's css_set is RCU protected, initialized and exited while holding
+ * task_lock(), and can only be modified while holding both cgroup_mutex
+ * and task_lock() while the task is alive. This macro verifies that the
+ * caller is inside proper critical section and returns @task's css_set.
+ *
+ * The caller can also specify additional allowed conditions via @__c, such
+ * as locks used during the cgroup_subsys::attach() methods.
*/
#ifdef CONFIG_PROVE_RCU
extern struct mutex cgroup_mutex;
-#define task_subsys_state_check(task, subsys_id, __c) \
- rcu_dereference_check((task)->cgroups->subsys[(subsys_id)], \
- lockdep_is_held(&(task)->alloc_lock) || \
- lockdep_is_held(&cgroup_mutex) || (__c))
+#define task_css_set_check(task, __c) \
+ rcu_dereference_check((task)->cgroups, \
+ lockdep_is_held(&(task)->alloc_lock) || \
+ lockdep_is_held(&cgroup_mutex) || (__c))
#else
-#define task_subsys_state_check(task, subsys_id, __c) \
- rcu_dereference((task)->cgroups->subsys[(subsys_id)])
+#define task_css_set_check(task, __c) \
+ rcu_dereference((task)->cgroups)
#endif
+/**
+ * task_subsys_state_check - obtain css for (task, subsys) w/ extra access conds
+ * @task: the target task
+ * @subsys_id: the target subsystem ID
+ * @__c: extra condition expression to be passed to rcu_dereference_check()
+ *
+ * Return the cgroup_subsys_state for the (@task, @subsys_id) pair. The
+ * synchronization rules are the same as task_css_set_check().
+ */
+#define task_subsys_state_check(task, subsys_id, __c) \
+ task_css_set_check((task), (__c))->subsys[(subsys_id)]
+
+/**
+ * task_css_set - obtain a task's css_set
+ * @task: the task to obtain css_set for
+ *
+ * See task_css_set_check().
+ */
+static inline struct css_set *task_css_set(struct task_struct *task)
+{
+ return task_css_set_check(task, false);
+}
+
+/**
+ * task_subsys_state - obtain css for (task, subsys)
+ * @task: the target task
+ * @subsys_id: the target subsystem ID
+ *
+ * See task_subsys_state_check().
+ */
static inline struct cgroup_subsys_state *
task_subsys_state(struct task_struct *task, int subsys_id)
{
^ permalink raw reply [flat|nested] 10+ messages in thread* Re: [PATCH cgroup/for-3.11 1/3] cgroup: fix RCU accesses to task->cgroups
[not found] ` <20130621225116.GC3949-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
` (2 preceding siblings ...)
2013-06-25 18:48 ` [PATCH v2 " Tejun Heo
@ 2013-06-26 17:49 ` Tejun Heo
3 siblings, 0 replies; 10+ messages in thread
From: Tejun Heo @ 2013-06-26 17:49 UTC (permalink / raw)
To: Tejun Heo
Cc: cgroups-u79uwXL29TY76Z2rM5mHXA,
containers-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
Fengguang Wu
On Fri, Jun 21, 2013 at 03:51:17PM -0700, Tejun Heo wrote:
> task->cgroups is a RCU pointer pointing to struct css_set. A task
> switches to a different css_set on cgroup migration but a css_set
> doesn't change once created and its pointers to cgroup_subsys_states
> aren't RCU protected.
>
> task_subsys_state[_check]() is the macro to acquire css given a task
> and subsys_id pair. It RCU-dereferences task->cgroups->subsys[] not
> task->cgroups, so the RCU pointer task->cgroups ends up being
> dereferenced without read_barrier_depends() after it. It's broken.
>
> Fix it by introducing task_css_set[_check]() which does
> RCU-dereference on task->cgroups. task_subsys_state[_check]() is
> reimplemented to directly dereference ->subsys[] of the css_set
> returned from task_css_set[_check]().
>
> This removes some of sparse RCU warnings in cgroup.
>
> Signed-off-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Reported-by: Fengguang Wu <fengguang.wu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Cc: stable-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Applied 1-3 to cgroup/for-3.11.
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 10+ messages in thread