From: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
To: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
Dave Jones <davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
Linux Kernel
<linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>,
cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Subject: [PATCH] do not abuse ->cred_guard_mutex in threadgroup_lock()
Date: Thu, 21 Mar 2013 17:21:38 +0100 [thread overview]
Message-ID: <20130321162138.GA21859@redhat.com> (raw)
In-Reply-To: <20130309200046.GA8149-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope
of this lock in do_execve() is huge.
And as Dave pointed out this can lead to deadlock, we have the
following dependencies:
do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex
Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.
Note that de_thread() can't sleep with ->group_rwsem held, this
can obviously deadlock with the exiting leader if the writer is
active, so it does threadgroup_change_end() before schedule().
Reported-by: Dave Jones <davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Acked-by: Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Acked-by: Li Zefan <lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
Signed-off-by: Oleg Nesterov <oleg-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
fs/exec.c | 3 +++
include/linux/sched.h | 18 ++++--------------
2 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 20df02c..bea2f7d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -898,11 +898,13 @@ static int de_thread(struct task_struct *tsk)
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
+ threadgroup_change_begin(tsk);
write_lock_irq(&tasklist_lock);
if (likely(leader->exit_state))
break;
__set_current_state(TASK_KILLABLE);
write_unlock_irq(&tasklist_lock);
+ threadgroup_change_end(tsk);
schedule();
if (unlikely(__fatal_signal_pending(tsk)))
goto killed;
@@ -960,6 +962,7 @@ static int de_thread(struct task_struct *tsk)
if (unlikely(leader->ptrace))
__wake_up_parent(leader, leader->parent);
write_unlock_irq(&tasklist_lock);
+ threadgroup_change_end(tsk);
release_task(leader);
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 932a90c..67cfdb5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2486,27 +2486,18 @@ static inline void threadgroup_change_end(struct task_struct *tsk)
*
* Lock the threadgroup @tsk belongs to. No new task is allowed to enter
* and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
- * perform exec. This is useful for cases where the threadgroup needs to
- * stay stable across blockable operations.
+ * change ->group_leader/pid. This is useful for cases where the threadgroup
+ * needs to stay stable across blockable operations.
*
* fork and exit paths explicitly call threadgroup_change_{begin|end}() for
* synchronization. While held, no new task will be added to threadgroup
* and no existing live task will have its PF_EXITING set.
*
- * During exec, a task goes and puts its thread group through unusual
- * changes. After de-threading, exclusive access is assumed to resources
- * which are usually shared by tasks in the same group - e.g. sighand may
- * be replaced with a new one. Also, the exec'ing task takes over group
- * leader role including its pid. Exclude these changes while locked by
- * grabbing cred_guard_mutex which is used to synchronize exec path.
+ * de_thread() does threadgroup_change_{begin|end}() when a non-leader
+ * sub-thread becomes a new leader.
*/
static inline void threadgroup_lock(struct task_struct *tsk)
{
- /*
- * exec uses exit for de-threading nesting group_rwsem inside
- * cred_guard_mutex. Grab cred_guard_mutex first.
- */
- mutex_lock(&tsk->signal->cred_guard_mutex);
down_write(&tsk->signal->group_rwsem);
}
@@ -2519,7 +2510,6 @@ static inline void threadgroup_lock(struct task_struct *tsk)
static inline void threadgroup_unlock(struct task_struct *tsk)
{
up_write(&tsk->signal->group_rwsem);
- mutex_unlock(&tsk->signal->cred_guard_mutex);
}
#else
static inline void threadgroup_change_begin(struct task_struct *tsk) {}
--
1.5.5.1
WARNING: multiple messages have this Message-ID (diff)
From: Oleg Nesterov <oleg@redhat.com>
To: Andrew Morton <akpm@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>, Dave Jones <davej@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
cgroups@vger.kernel.org, Li Zefan <lizefan@huawei.com>
Subject: [PATCH] do not abuse ->cred_guard_mutex in threadgroup_lock()
Date: Thu, 21 Mar 2013 17:21:38 +0100 [thread overview]
Message-ID: <20130321162138.GA21859@redhat.com> (raw)
In-Reply-To: <20130309200046.GA8149@redhat.com>
threadgroup_lock() takes signal->cred_guard_mutex to ensure that
thread_group_leader() is stable. This doesn't look nice, the scope
of this lock in do_execve() is huge.
And as Dave pointed out this can lead to deadlock, we have the
following dependencies:
do_execve: cred_guard_mutex -> i_mutex
cgroup_mount: i_mutex -> cgroup_mutex
attach_task_by_pid: cgroup_mutex -> cred_guard_mutex
Change de_thread() to take threadgroup_change_begin() around the
switch-the-leader code and change threadgroup_lock() to avoid
->cred_guard_mutex.
Note that de_thread() can't sleep with ->group_rwsem held, this
can obviously deadlock with the exiting leader if the writer is
active, so it does threadgroup_change_end() before schedule().
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Tejun Heo <tj@kernel.org>
Acked-by: Li Zefan <lizefan@huawei.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
fs/exec.c | 3 +++
include/linux/sched.h | 18 ++++--------------
2 files changed, 7 insertions(+), 14 deletions(-)
diff --git a/fs/exec.c b/fs/exec.c
index 20df02c..bea2f7d 100644
--- a/fs/exec.c
+++ b/fs/exec.c
@@ -898,11 +898,13 @@ static int de_thread(struct task_struct *tsk)
sig->notify_count = -1; /* for exit_notify() */
for (;;) {
+ threadgroup_change_begin(tsk);
write_lock_irq(&tasklist_lock);
if (likely(leader->exit_state))
break;
__set_current_state(TASK_KILLABLE);
write_unlock_irq(&tasklist_lock);
+ threadgroup_change_end(tsk);
schedule();
if (unlikely(__fatal_signal_pending(tsk)))
goto killed;
@@ -960,6 +962,7 @@ static int de_thread(struct task_struct *tsk)
if (unlikely(leader->ptrace))
__wake_up_parent(leader, leader->parent);
write_unlock_irq(&tasklist_lock);
+ threadgroup_change_end(tsk);
release_task(leader);
}
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 932a90c..67cfdb5 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2486,27 +2486,18 @@ static inline void threadgroup_change_end(struct task_struct *tsk)
*
* Lock the threadgroup @tsk belongs to. No new task is allowed to enter
* and member tasks aren't allowed to exit (as indicated by PF_EXITING) or
- * perform exec. This is useful for cases where the threadgroup needs to
- * stay stable across blockable operations.
+ * change ->group_leader/pid. This is useful for cases where the threadgroup
+ * needs to stay stable across blockable operations.
*
* fork and exit paths explicitly call threadgroup_change_{begin|end}() for
* synchronization. While held, no new task will be added to threadgroup
* and no existing live task will have its PF_EXITING set.
*
- * During exec, a task goes and puts its thread group through unusual
- * changes. After de-threading, exclusive access is assumed to resources
- * which are usually shared by tasks in the same group - e.g. sighand may
- * be replaced with a new one. Also, the exec'ing task takes over group
- * leader role including its pid. Exclude these changes while locked by
- * grabbing cred_guard_mutex which is used to synchronize exec path.
+ * de_thread() does threadgroup_change_{begin|end}() when a non-leader
+ * sub-thread becomes a new leader.
*/
static inline void threadgroup_lock(struct task_struct *tsk)
{
- /*
- * exec uses exit for de-threading nesting group_rwsem inside
- * cred_guard_mutex. Grab cred_guard_mutex first.
- */
- mutex_lock(&tsk->signal->cred_guard_mutex);
down_write(&tsk->signal->group_rwsem);
}
@@ -2519,7 +2510,6 @@ static inline void threadgroup_lock(struct task_struct *tsk)
static inline void threadgroup_unlock(struct task_struct *tsk)
{
up_write(&tsk->signal->group_rwsem);
- mutex_unlock(&tsk->signal->cred_guard_mutex);
}
#else
static inline void threadgroup_change_begin(struct task_struct *tsk) {}
--
1.5.5.1
next prev parent reply other threads:[~2013-03-21 16:21 UTC|newest]
Thread overview: 45+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-03-06 22:36 lockdep trace from prepare_bprm_creds Dave Jones
[not found] ` <20130306223657.GA7392-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-07 17:25 ` Oleg Nesterov
2013-03-07 17:25 ` Oleg Nesterov
2013-03-07 18:01 ` Tejun Heo
[not found] ` <20130307180139.GD29601-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-03-07 18:03 ` Tejun Heo
2013-03-07 18:03 ` Tejun Heo
[not found] ` <20130307180332.GE29601-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-03-07 19:12 ` Oleg Nesterov
2013-03-07 19:12 ` Oleg Nesterov
[not found] ` <20130307191242.GA18265-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-07 19:38 ` Tejun Heo
2013-03-07 19:38 ` Tejun Heo
[not found] ` <20130307193820.GB3209-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-03-09 2:11 ` Li Zefan
2013-03-09 2:11 ` Li Zefan
[not found] ` <513A9A67.60909-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-03-09 3:29 ` Tejun Heo
2013-03-09 3:29 ` Tejun Heo
[not found] ` <20130309032936.GT14556-9pTldWuhBndy/B6EtB590w@public.gmane.org>
2013-03-09 7:47 ` Li Zefan
2013-03-09 7:47 ` Li Zefan
[not found] ` <513AE918.7020704-hv44wF8Li93QT0dZR+AlfA@public.gmane.org>
2013-03-09 20:00 ` [PATCH 0/1] do not abuse ->cred_guard_mutex in threadgroup_lock() Oleg Nesterov
2013-03-09 20:00 ` Oleg Nesterov
2013-03-09 20:01 ` [PATCH 1/1] " Oleg Nesterov
[not found] ` <20130309200106.GB8149-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-09 20:15 ` Tejun Heo
2013-03-09 20:15 ` Tejun Heo
2013-03-11 1:50 ` Li Zefan
2013-03-11 1:50 ` Li Zefan
[not found] ` <20130309200046.GA8149-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-21 16:21 ` Oleg Nesterov [this message]
2013-03-21 16:21 ` [PATCH] " Oleg Nesterov
[not found] ` <20130321162138.GA21859-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-21 22:06 ` Andrew Morton
2013-03-21 22:06 ` Andrew Morton
[not found] ` <20130321150626.a7934d989fb80d835fa92255-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
2013-03-22 13:20 ` Oleg Nesterov
2013-03-22 13:20 ` Oleg Nesterov
2013-03-19 22:02 ` [PATCH cgroup/for-3.10] cgroup: make cgroup_mutex outer to threadgroup_lock Tejun Heo
[not found] ` <20130319220246.GR3042-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-03-20 0:58 ` Li Zefan
2013-03-20 0:58 ` Li Zefan
2013-03-20 15:03 ` Tejun Heo
[not found] ` <20130320150351.GW3042-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-03-20 18:35 ` Oleg Nesterov
2013-03-20 18:35 ` Oleg Nesterov
[not found] ` <20130320183523.GA29365-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-20 18:42 ` Tejun Heo
2013-03-20 18:42 ` Tejun Heo
[not found] ` <CAOS58YPxGXt+iq1GZ4hryqm1Z_p+r7eRRC7ruUDDd=LQrWtAxg-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2013-03-21 16:17 ` Oleg Nesterov
2013-03-21 16:17 ` Oleg Nesterov
2013-03-07 18:21 ` lockdep trace from prepare_bprm_creds Tejun Heo
2013-03-07 18:21 ` Tejun Heo
[not found] ` <20130307182140.GF29601-Gd/HAXX7CRxy/B6EtB590w@public.gmane.org>
2013-03-07 18:32 ` Oleg Nesterov
2013-03-07 18:32 ` Oleg Nesterov
[not found] ` <20130307183213.GA18022-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2013-03-07 19:33 ` Tejun Heo
2013-03-07 19:33 ` Tejun Heo
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20130321162138.GA21859@redhat.com \
--to=oleg-h+wxahxf7alqt0dzr+alfa@public.gmane.org \
--cc=akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org \
--cc=cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=davej-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org \
--cc=linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org \
--cc=lizefan-hv44wF8Li93QT0dZR+AlfA@public.gmane.org \
--cc=tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.