From: Tejun Heo <tj@kernel.org>
To: paul@paulmenage.org, rjw@sisk.pl, lizf@cn.fujitsu.com
Cc: linux-pm@lists.linux-foundation.org,
linux-kernel@vger.kernel.org,
containers@lists.linux-foundation.org, fweisbec@gmail.com,
matthltc@us.ibm.com, akpm@linux-foundation.org, oleg@redhat.com,
kamezawa.hiroyu@Jp.fujitsu.com, Tejun Heo <tj@kernel.org>
Subject: [PATCH 03/10] threadgroup: extend threadgroup_lock() to cover exit and exec
Date: Tue, 1 Nov 2011 16:46:26 -0700 [thread overview]
Message-ID: <1320191193-8110-4-git-send-email-tj@kernel.org> (raw)
In-Reply-To: <1320191193-8110-1-git-send-email-tj@kernel.org>
threadgroup_lock() protected only protected against new addition to
the threadgroup, which was inherently somewhat incomplete and
problematic for its only user cgroup. On-going migration could race
against exec and exit leading to interesting problems - the symmetry
between various attach methods, task exiting during method execution,
->exit() racing against attach methods, migrating task switching basic
properties during exec and so on.
This patch extends threadgroup_lock() such that it protects against
all three threadgroup altering operations - fork, exit and exec. For
exit, threadgroup_change_begin/end() calls are added to exit path.
For exec, threadgroup_[un]lock() are updated to also grab and release
cred_guard_mutex.
With this change, threadgroup_lock() guarantees that the target
threadgroup will remain stable - no new task will be added, no new
PF_EXITING will be set and exec won't happen.
The next patch will update cgroup so that it can take full advantage
of this change.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Paul Menage <paul@paulmenage.org>
Cc: Li Zefan <lizf@cn.fujitsu.com>
---
include/linux/sched.h | 36 ++++++++++++++++++++++++++++++------
kernel/exit.c | 17 +++++++++++++----
2 files changed, 43 insertions(+), 10 deletions(-)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index aa47d0f..6fb546e 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -635,11 +635,12 @@ struct signal_struct {
#endif
#ifdef CONFIG_CGROUPS
/*
- * The group_rwsem prevents threads from forking with
- * CLONE_THREAD while held for writing. Use this for fork-sensitive
- * threadgroup-wide operations. It's taken for reading in fork.c in
- * copy_process().
- * Currently only needed write-side by cgroups.
+ * group_rwsem prevents new tasks from entering the threadgroup and
+ * member tasks from exiting. fork and exit paths are protected
+ * with this rwsem using threadgroup_change_begin/end(). Users
+ * which require threadgroup to remain stable should use
+ * threadgroup_[un]lock() which also takes care of exec path.
+ * Currently, cgroup is the only user.
*/
struct rw_semaphore group_rwsem;
#endif
@@ -2367,7 +2368,6 @@ static inline void unlock_task_sighand(struct task_struct *tsk,
spin_unlock_irqrestore(&tsk->sighand->siglock, *flags);
}
-/* See the declaration of group_rwsem in signal_struct. */
#ifdef CONFIG_CGROUPS
static inline void threadgroup_change_begin(struct task_struct *tsk)
{
@@ -2377,13 +2377,37 @@ static inline void threadgroup_change_done(struct task_struct *tsk)
{
up_read(&tsk->signal->group_rwsem);
}
+
+/**
+ * threadgroup_lock - lock threadgroup
+ * @tsk: member task of the threadgroup to lock
+ *
+ * Lock the threadgroup @tsk belongs to. No new task is allowed to enter
+ * and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
+ * perform exec. This is useful for cases where the threadgroup needs to
+ * stay stable across blockable operations.
+ *
+ * fork and exit explicitly call threadgroup_change_{begin|end}() for
+ * synchronization. exec is already synchronized with cread_guard_mutex
+ * which we grab here.
+ */
static inline void threadgroup_lock(struct task_struct *tsk)
{
+ /* exec uses exit for de-threading, grab cred_guard_mutex first */
+ mutex_lock(&tsk->signal->cred_guard_mutex);
down_write(&tsk->signal->group_rwsem);
}
+
+/**
+ * threadgroup_unlock - unlock threadgroup
+ * @tsk: member task of the threadgroup to unlock
+ *
+ * Reverse threadgroup_lock().
+ */
static inline void threadgroup_unlock(struct task_struct *tsk)
{
up_write(&tsk->signal->group_rwsem);
+ mutex_unlock(&tsk->signal->cred_guard_mutex);
}
#else
static inline void threadgroup_change_begin(struct task_struct *tsk) {}
diff --git a/kernel/exit.c b/kernel/exit.c
index 2913b35..3565f00 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -938,6 +938,12 @@ NORET_TYPE void do_exit(long code)
schedule();
}
+ /*
+ * @tsk's threadgroup is going through changes - lock out users
+ * which expect stable threadgroup.
+ */
+ threadgroup_change_begin(tsk);
+
exit_irq_thread();
exit_signals(tsk); /* sets PF_EXITING */
@@ -1020,10 +1026,6 @@ NORET_TYPE void do_exit(long code)
kfree(current->pi_state_cache);
#endif
/*
- * Make sure we are holding no locks:
- */
- debug_check_no_locks_held(tsk);
- /*
* We can do this unlocked here. The futex code uses this flag
* just to verify whether the pi state cleanup has been done
* or not. In the worst case it loops once more.
@@ -1040,6 +1042,13 @@ NORET_TYPE void do_exit(long code)
preempt_disable();
exit_rcu();
+
+ /*
+ * Release threadgroup and make sure we are holding no locks.
+ */
+ threadgroup_change_done(tsk);
+ debug_check_no_locks_held(tsk);
+
/* causes final put_task_struct in finish_task_switch(). */
tsk->state = TASK_DEAD;
schedule();
--
1.7.3.1
next prev parent reply other threads:[~2011-11-01 23:46 UTC|newest]
Thread overview: 56+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-11-01 23:46 [PATCHSET] cgroup: stable threadgroup during attach & subsys methods consolidation Tejun Heo
2011-11-01 23:46 ` [PATCH 01/10] cgroup: add cgroup_root_mutex Tejun Heo
2011-11-04 8:38 ` KAMEZAWA Hiroyuki
2011-11-01 23:46 ` [PATCH 02/10] threadgroup: rename signal->threadgroup_fork_lock to ->group_rwsem Tejun Heo
2011-11-04 8:40 ` KAMEZAWA Hiroyuki
2011-11-04 15:16 ` Tejun Heo
2011-11-01 23:46 ` Tejun Heo [this message]
2011-11-04 8:45 ` [PATCH 03/10] threadgroup: extend threadgroup_lock() to cover exit and exec KAMEZAWA Hiroyuki
2011-11-13 16:44 ` Frederic Weisbecker
2011-11-14 13:54 ` Frederic Weisbecker
2011-11-21 22:03 ` Tejun Heo
2011-11-23 14:34 ` Frederic Weisbecker
2011-11-21 21:58 ` Tejun Heo
2011-11-23 14:02 ` Frederic Weisbecker
2011-11-24 21:22 ` Tejun Heo
2011-11-13 18:20 ` Frederic Weisbecker
2011-11-24 22:50 ` [PATCH UPDATED " Tejun Heo
2011-11-25 4:02 ` Linus Torvalds
2011-11-27 19:21 ` Tejun Heo
2011-11-27 21:25 ` Tejun Heo
2011-12-01 19:29 ` Tejun Heo
2011-11-25 14:01 ` Frederic Weisbecker
2011-11-27 19:30 ` Tejun Heo
2011-12-02 16:28 ` Frederic Weisbecker
2011-12-05 18:43 ` Tejun Heo
2011-12-07 15:30 ` Frederic Weisbecker
2011-12-07 18:22 ` Tejun Heo
2011-12-08 20:50 ` [PATCH UPDATED AGAIN " Tejun Heo
2011-12-09 23:42 ` Frederic Weisbecker
2011-12-13 1:33 ` Tejun Heo
2011-12-13 2:17 ` Tejun Heo
2011-11-01 23:46 ` [PATCH 04/10] cgroup: always lock threadgroup during migration Tejun Heo
2011-11-04 8:54 ` KAMEZAWA Hiroyuki
2011-11-04 15:21 ` Tejun Heo
2011-11-14 18:46 ` Frederic Weisbecker
2011-11-14 18:52 ` Frederic Weisbecker
2011-11-21 22:05 ` Tejun Heo
2011-11-01 23:46 ` [PATCH 05/10] cgroup: subsys->attach_task() should be called after migration Tejun Heo
2011-11-14 20:06 ` Frederic Weisbecker
2011-11-21 22:04 ` Tejun Heo
2011-11-01 23:46 ` [PATCH 06/10] cgroup: improve old cgroup handling in cgroup_attach_proc() Tejun Heo
2011-11-14 20:37 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 07/10] cgroup: introduce cgroup_taskset and use it in subsys->can_attach(), cancel_attach() and attach() Tejun Heo
2011-11-14 21:16 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 08/10] cgroup: don't use subsys->can_attach_task() or ->attach_task() Tejun Heo
2011-11-04 9:08 ` KAMEZAWA Hiroyuki
2011-11-14 23:54 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 09/10] cgroup, cpuset: don't use ss->pre_attach() Tejun Heo
2011-11-15 0:51 ` Frederic Weisbecker
2011-11-01 23:46 ` [PATCH 10/10] cgroup: kill subsys->can_attach_task(), pre_attach() and attach_task() Tejun Heo
2011-11-04 9:10 ` KAMEZAWA Hiroyuki
2011-11-15 0:54 ` Frederic Weisbecker
2011-11-21 22:07 ` [PATCHSET] cgroup: stable threadgroup during attach & subsys methods consolidation Tejun Heo
2011-11-22 2:27 ` Li Zefan
2011-11-22 16:20 ` Tejun Heo
2011-11-24 22:51 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1320191193-8110-4-git-send-email-tj@kernel.org \
--to=tj@kernel.org \
--cc=akpm@linux-foundation.org \
--cc=containers@lists.linux-foundation.org \
--cc=fweisbec@gmail.com \
--cc=kamezawa.hiroyu@Jp.fujitsu.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pm@lists.linux-foundation.org \
--cc=lizf@cn.fujitsu.com \
--cc=matthltc@us.ibm.com \
--cc=oleg@redhat.com \
--cc=paul@paulmenage.org \
--cc=rjw@sisk.pl \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox