From: Tejun Heo <tj@kernel.org>
To: paul@paulmenage.org, rjw@sisk.pl, lizf@cn.fujitsu.com
Cc: linux-pm@lists.linux-foundation.org,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org, fweisbec@gmail.com,
matthltc@us.ibm.com, akpm@linux-foundation.org, oleg@redhat.com,
kamezawa.hiroyu@Jp.fujitsu.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 04/10] cgroup: always lock threadgroup during migration
Date: Tue, 1 Nov 2011 16:46:27 -0700 [thread overview]
Message-ID: <1320191193-8110-5-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
Update cgroup to take advantage of the fack that threadgroup_lock()
guarantees stable threadgroup.
* Lock threadgroup even if the target is a single task. This
guarantees that when the target tasks stay stable during migration
regardless of the target type.
* Remove PF_EXITING early exit optimization from attach_task_by_pid()
and check it in cgroup_task_migrate() instead. The optimization was
for rather cold path to begin with and PF_EXITING state can be
trusted throughout migration by checking it after locking
threadgroup.
* Don't add PF_EXITING tasks to target task array in
cgroup_attach_proc(). This ensures that task migration is performed
only for live tasks.
* Remove -ESRCH failure path from cgroup_task_migrate(). With the
above changes, it's guaranteed to be called only for live tasks.
After the changes, only live tasks are migrated and they're guaranteed
to stay alive until migration is complete. This removes problems
caused by exec and exit racing against cgroup migration including
symmetry among cgroup attach methods and different cgroup methods
racing each other.
v2: Oleg pointed out that one more PF_EXITING check can be removed
from cgroup_attach_proc(). Removed.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
kernel/cgroup.c | 51 +++++++++++++++++++++++----------------------------
1 files changed, 23 insertions(+), 28 deletions(-)
diff --git a/kernel/cgroup.c b/kernel/cgroup.c
index f0e099f..83e10f9 100644
--- a/kernel/cgroup.c
+++ b/kernel/cgroup.c
@@ -1762,7 +1762,7 @@ EXPORT_SYMBOL_GPL(cgroup_path);
*
* 'guarantee' is set if the caller promises that a new css_set for the task
* will already exist. If not set, this function might sleep, and can fail with
- * -ENOMEM. Otherwise, it can only fail with -ESRCH.
+ * -ENOMEM. Must be called with cgroup_mutex and threadgroup locked.
*/
static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
struct task_struct *tsk, bool guarantee)
@@ -1800,13 +1800,9 @@ static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
}
put_css_set(oldcg);
- /* if PF_EXITING is set, the tsk->cgroups pointer is no longer safe. */
+ /* @tsk can't exit as its threadgroup is locked */
task_lock(tsk);
- if (tsk->flags & PF_EXITING) {
- task_unlock(tsk);
- put_css_set(newcg);
- return -ESRCH;
- }
+ WARN_ON_ONCE(tsk->flags & PF_EXITING);
rcu_assign_pointer(tsk->cgroups, newcg);
task_unlock(tsk);
@@ -1832,8 +1828,8 @@ static int cgroup_task_migrate(struct cgroup *cgrp, struct cgroup *oldcgrp,
* @cgrp: the cgroup the task is attaching to
* @tsk: the task to be attached
*
- * Call holding cgroup_mutex. May take task_lock of
- * the task 'tsk' during call.
+ * Call with cgroup_mutex and threadgroup locked. May take task_lock of
+ * @tsk during call.
*/
int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
{
@@ -1842,6 +1838,10 @@ int cgroup_attach_task(struct cgroup *cgrp, struct task_struct *tsk)
struct cgroup *oldcgrp;
struct cgroupfs_root *root = cgrp->root;
+ /* @tsk either already exited or can't exit until the end */
+ if (tsk->flags & PF_EXITING)
+ return -ESRCH;
+
/* Nothing to do if the task is already in that cgroup */
oldcgrp = task_cgroup_from_root(tsk, root);
if (cgrp == oldcgrp)
@@ -2062,6 +2062,10 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
tsk = leader;
i = 0;
do {
+ /* @tsk either already exited or can't exit until the end */
+ if (tsk->flags & PF_EXITING)
+ continue;
+
/* as per above, nr_threads may decrease, but not increase. */
BUG_ON(i >= group_size);
get_task_struct(tsk);
@@ -2116,11 +2120,6 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
continue;
/* get old css_set pointer */
task_lock(tsk);
- if (tsk->flags & PF_EXITING) {
- /* ignore this task if it's going away */
- task_unlock(tsk);
- continue;
- }
oldcg = tsk->cgroups;
get_css_set(oldcg);
task_unlock(tsk);
@@ -2158,9 +2157,8 @@ int cgroup_attach_proc(struct cgroup *cgrp, struct task_struct *leader)
if (ss->attach_task)
ss->attach_task(cgrp, tsk);
}
- /* if the thread is PF_EXITING, it can just get skipped. */
retval = cgroup_task_migrate(cgrp, oldcgrp, tsk, true);
- BUG_ON(retval != 0 && retval != -ESRCH);
+ BUG_ON(retval != 0);
}
/* nothing is sensitive to fork() after this point. */
@@ -2212,8 +2210,8 @@ out_free_group_list:
/*
* Find the task_struct of the task to attach by vpid and pass it along to the
- * function to attach either it or all tasks in its threadgroup. Will take
- * cgroup_mutex; may take task_lock of task.
+ * function to attach either it or all tasks in its threadgroup. Will lock
+ * cgroup_mutex and threadgroup; may take task_lock of task.
*/
static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
{
@@ -2240,11 +2238,6 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
* detect it later.
*/
tsk = tsk->group_leader;
- } else if (tsk->flags & PF_EXITING) {
- /* optimization for the single-task-only case */
- rcu_read_unlock();
- cgroup_unlock();
- return -ESRCH;
}
/*
* even if we're attaching all tasks in the thread group, we
@@ -2268,13 +2261,15 @@ static int attach_task_by_pid(struct cgroup *cgrp, u64 pid, bool threadgroup)
get_task_struct(tsk);
}
- if (threadgroup) {
- threadgroup_lock(tsk);
+ threadgroup_lock(tsk);
+
+ if (threadgroup)
ret = cgroup_attach_proc(cgrp, tsk);
- threadgroup_unlock(tsk);
- } else {
+ else
ret = cgroup_attach_task(cgrp, tsk);
- }
+
+ threadgroup_unlock(tsk);
+
put_task_struct(tsk);
cgroup_unlock();
return ret;
--
1.7.3.1
next prev parent reply other threads:[~2011-11-01 23:48 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-01 23:46 [PATCHSET] cgroup: stable threadgroup during attach & subsys methods consolidation Tejun Heo
2011-11-01 23:46 ` [PATCH 01/10] cgroup: add cgroup_root_mutex Tejun Heo
2011-11-04 8:38 ` KAMEZAWA Hiroyuki
2011-11-01 23:46 ` [PATCH 02/10] threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem Tejun Heo
2011-11-04 8:40 ` KAMEZAWA Hiroyuki
2011-11-04 15:16 ` Tejun Heo
2011-11-01 23:46 ` [PATCH 03/10] threadgroup: extend threadgroup_lock() to cover exit and exec Tejun Heo
2011-11-04 8:45 ` KAMEZAWA Hiroyuki
2011-11-13 16:44 ` Frederic Weisbecker
2011-11-14 13:54 ` Frederic Weisbecker
2011-11-21 22:03 ` Tejun Heo
2011-11-23 14:34 ` Frederic Weisbecker
2011-11-21 21:58 ` Tejun Heo
2011-11-23 14:02 ` Frederic Weisbecker
2011-11-24 21:22 ` Tejun Heo
2011-11-13 18:20 ` Frederic Weisbecker
2011-11-24 22:50 ` [PATCH UPDATED " Tejun Heo
2011-11-25 4:02 ` Linus Torvalds
2011-11-27 19:21 ` Tejun Heo
2011-11-27 21:25 ` Tejun Heo
2011-12-01 19:29 ` Tejun Heo
2011-11-25 14:01 ` Frederic Weisbecker
2011-11-27 19:30 ` Tejun Heo
2011-12-02 16:28 ` Frederic Weisbecker
2011-12-05 18:43 ` Tejun Heo
2011-12-07 15:30 ` Frederic Weisbecker
2011-12-07 18:22 ` Tejun Heo
2011-12-08 20:50 ` [PATCH UPDATED AGAIN " Tejun Heo
2011-12-09 23:42 ` Frederic Weisbecker
2011-12-13 1:33 ` Tejun Heo
2011-12-13 2:17 ` Tejun Heo
2011-11-01 23:46 ` Tejun Heo [this message]
2011-11-04 8:54 ` [PATCH 04/10] cgroup: always lock threadgroup during migration KAMEZAWA Hiroyuki
2011-11-04 15:21 ` Tejun Heo
2011-11-14 18:46 ` Frederic Weisbecker
2011-11-14 18:52 ` Frederic Weisbecker
2011-11-21 22:05 ` Tejun Heo
2011-11-01 23:46 ` [PATCH 05/10] cgroup: subsys->attach_task() should be called after migration Tejun Heo
2011-11-14 20:06 ` Frederic Weisbecker
2011-11-21 22:04 ` Tejun Heo
2011-11-01 23:46 ` [PATCH 06/10] cgroup: improve old cgroup handling in cgroup_attach_proc() Tejun Heo
2011-11-14 20:37 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 07/10] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() Tejun Heo
2011-11-14 21:16 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 08/10] cgroup: don't use subsys->can_attach_task() or ->attach_task() Tejun Heo
2011-11-04 9:08 ` KAMEZAWA Hiroyuki
2011-11-14 23:54 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 09/10] cgroup, cpuset: don't use ss->pre_attach() Tejun Heo
2011-11-15 0:51 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 10/10] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task() Tejun Heo
2011-11-04 9:10 ` KAMEZAWA Hiroyuki
2011-11-15 0:54 ` Frederic Weisbecker
2011-11-21 22:07 ` [PATCHSET] cgroup: stable threadgroup during attach & subsys methods consolidation Tejun Heo
2011-11-22 2:27 ` Li Zefan
2011-11-22 16:20 ` Tejun Heo
2011-11-24 22:51 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1320191193-8110-5-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=fweisbec@gmail.com \
--cc=kamezawa.hiroyu@Jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=lizf@cn.fujitsu.com \
--cc=matthltc@us.ibm.com \
--cc=oleg@redhat.com \
--cc=paul@paulmenage.org \
--cc=rjw@sisk.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox