* [PATCH 0/2] sched/autogroup: use-after-free fixes
@ 2016-11-14 18:45 Oleg Nesterov
2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:45 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
Cc: hartsjc, vbendel, vlovejoy, linux-kernel
Two simple fixes for -stable.
Note that this logic was wrong from the very beginning and even if sysctl
doesn't change, see 2/2.
And I think it needs some cleanups in any case, but lets do later/separately.
Please review,
Oleg.
include/linux/sched.h | 2 ++
kernel/exit.c | 1 +
kernel/sched/auto_group.c | 36 ++++++++++++++++++++++++++++--------
3 files changed, 31 insertions(+), 8 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task()
2016-11-14 18:45 [PATCH 0/2] sched/autogroup: use-after-free fixes Oleg Nesterov
@ 2016-11-14 18:46 ` Oleg Nesterov
2016-11-22 12:28 ` [tip:sched/urgent] sched/autogroup: Fix autogroup_move_group() to " tip-bot for Oleg Nesterov
2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:46 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
Cc: hartsjc, vbendel, vlovejoy, linux-kernel
The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.
However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:
int main(void)
{
int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);
assert(sctl > 0);
if (fork()) {
wait(NULL); // destroy the child's ag/tg
pause();
}
assert(pwrite(sctl, "1\n", 2, 0) == 2);
assert(setsid() > 0);
if (fork())
pause();
kill(getppid(), SIGKILL);
sleep(1);
// The child has gone, the grandchild runs with kref == 1
assert(pwrite(sctl, "0\n", 2, 0) == 2);
assert(setsid() > 0);
// runs with the freed ag/tg
for (;;)
sleep(1);
return 0;
}
crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.
Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
---
kernel/sched/auto_group.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index a5d966c..ad2b19a 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -111,14 +111,11 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
{
if (tg != &root_task_group)
return false;
-
/*
- * We can only assume the task group can't go away on us if
- * autogroup_move_group() can see us on ->thread_group list.
+ * If we race with autogroup_move_group() the caller can use the old
+ * value of signal->autogroup but in this case sched_move_task() will
+ * be called again before autogroup_kref_put().
*/
- if (p->flags & PF_EXITING)
- return false;
-
return true;
}
@@ -138,13 +135,17 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
}
p->signal->autogroup = autogroup_kref_get(ag);
-
- if (!READ_ONCE(sysctl_sched_autogroup_enabled))
- goto out;
-
+ /*
+ * We can't avoid sched_move_task() after we changed signal->autogroup,
+ * this process can already run with task_group() == prev->tg or we can
+ * race with cgroup code which can read autogroup = prev under rq->lock.
+ * In the latter case for_each_thread() can not miss a migrating thread,
+ * cpu_cgroup_attach() must not be possible after cgroup_exit() and it
+ * can't be removed from thread list, we hold ->siglock.
+ */
for_each_thread(p, t)
sched_move_task(t);
-out:
+
unlock_task_sighand(p, &flags);
autogroup_kref_put(prev);
}
--
2.5.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg
2016-11-14 18:45 [PATCH 0/2] sched/autogroup: use-after-free fixes Oleg Nesterov
2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
@ 2016-11-14 18:46 ` Oleg Nesterov
2016-11-14 18:58 ` Oleg Nesterov
2016-11-22 12:29 ` [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads tip-bot for Oleg Nesterov
1 sibling, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:46 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
Cc: hartsjc, vbendel, vlovejoy, linux-kernel
Exactly because for_each_thread() in autogroup_move_group() can't see it
and update its ->sched_task_group before _put() and possibly free().
So the exiting task needs another sched_move_task() before exit_notify()
and we need to re-introduce the PF_EXITING (or similar) check removed by
the previous change for another reason.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
---
include/linux/sched.h | 2 ++
kernel/exit.c | 1 +
kernel/sched/auto_group.c | 19 +++++++++++++++++++
3 files changed, 22 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..e9c009d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2567,6 +2567,7 @@ extern void sched_autogroup_create_attach(struct task_struct *p);
extern void sched_autogroup_detach(struct task_struct *p);
extern void sched_autogroup_fork(struct signal_struct *sig);
extern void sched_autogroup_exit(struct signal_struct *sig);
+extern void sched_autogroup_exit_task(struct task_struct *p);
#ifdef CONFIG_PROC_FS
extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
extern int proc_sched_autogroup_set_nice(struct task_struct *p, int nice);
@@ -2576,6 +2577,7 @@ static inline void sched_autogroup_create_attach(struct task_struct *p) { }
static inline void sched_autogroup_detach(struct task_struct *p) { }
static inline void sched_autogroup_fork(struct signal_struct *sig) { }
static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit_task(struct task_struct *p) { }
#endif
extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/kernel/exit.c b/kernel/exit.c
index f3dd46d..76e263e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -837,6 +837,7 @@ void __noreturn do_exit(long code)
*/
perf_event_exit_task(tsk);
+ sched_autogroup_exit_task(tsk);
cgroup_exit(tsk);
/*
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index ad2b19a..f1c8fd5 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -115,10 +115,26 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
* If we race with autogroup_move_group() the caller can use the old
* value of signal->autogroup but in this case sched_move_task() will
* be called again before autogroup_kref_put().
+ *
+ * However, there is no way sched_autogroup_exit_task() could tell us
+ * to avoid autogroup->tg, so we abuse PF_EXITING flag for this case.
*/
+ if (p->flags & PF_EXITING)
+ return false;
+
return true;
}
+void sched_autogroup_exit_task(struct task_struct *p)
+{
+ /*
+ * We are going to call exit_notify() and autogroup_move_group() can't
+ * see this thread after that: we can no longer use signal->autogroup.
+ * See the PF_EXITING check in task_wants_autogroup().
+ */
+ sched_move_task(p);
+}
+
static void
autogroup_move_group(struct task_struct *p, struct autogroup *ag)
{
@@ -142,6 +158,9 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
* In the latter case for_each_thread() can not miss a migrating thread,
* cpu_cgroup_attach() must not be possible after cgroup_exit() and it
* can't be removed from thread list, we hold ->siglock.
+ *
+ * If an exiting thread was already removed from thread list we rely on
+ * sched_autogroup_exit_task().
*/
for_each_thread(p, t)
sched_move_task(t);
--
2.5.0
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg
2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
@ 2016-11-14 18:58 ` Oleg Nesterov
2016-11-22 12:29 ` [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads tip-bot for Oleg Nesterov
1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:58 UTC (permalink / raw)
To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
Cc: hartsjc, vbendel, vlovejoy, linux-kernel
On 11/14, Oleg Nesterov wrote:
>
> +void sched_autogroup_exit_task(struct task_struct *p)
> +{
> + /*
> + * We are going to call exit_notify() and autogroup_move_group() can't
> + * see this thread after that: we can no longer use signal->autogroup.
> + * See the PF_EXITING check in task_wants_autogroup().
> + */
> + sched_move_task(p);
We only need it if task_group_is_autogroup(task_group(p)) but then this code
will need even more comments, lets keep it simple.
Oleg.
^ permalink raw reply [flat|nested] 6+ messages in thread
* [tip:sched/urgent] sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
@ 2016-11-22 12:28 ` tip-bot for Oleg Nesterov
0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-11-22 12:28 UTC (permalink / raw)
To: linux-tip-commits
Cc: mingo, peterz, linux-kernel, vlovejoy, tglx, oleg, efault,
torvalds, hpa
Commit-ID: 18f649ef344127ef6de23a5a4272dbe2fdb73dde
Gitweb: http://git.kernel.org/tip/18f649ef344127ef6de23a5a4272dbe2fdb73dde
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Mon, 14 Nov 2016 19:46:09 +0100
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 22 Nov 2016 12:33:42 +0100
sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.
However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:
int main(void)
{
int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);
assert(sctl > 0);
if (fork()) {
wait(NULL); // destroy the child's ag/tg
pause();
}
assert(pwrite(sctl, "1\n", 2, 0) == 2);
assert(setsid() > 0);
if (fork())
pause();
kill(getppid(), SIGKILL);
sleep(1);
// The child has gone, the grandchild runs with kref == 1
assert(pwrite(sctl, "0\n", 2, 0) == 2);
assert(setsid() > 0);
// runs with the freed ag/tg
for (;;)
sleep(1);
return 0;
}
crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.
Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
kernel/sched/auto_group.c | 23 ++++++++++++-----------
1 file changed, 12 insertions(+), 11 deletions(-)
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index a5d966c..ad2b19a 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -111,14 +111,11 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
{
if (tg != &root_task_group)
return false;
-
/*
- * We can only assume the task group can't go away on us if
- * autogroup_move_group() can see us on ->thread_group list.
+ * If we race with autogroup_move_group() the caller can use the old
+ * value of signal->autogroup but in this case sched_move_task() will
+ * be called again before autogroup_kref_put().
*/
- if (p->flags & PF_EXITING)
- return false;
-
return true;
}
@@ -138,13 +135,17 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
}
p->signal->autogroup = autogroup_kref_get(ag);
-
- if (!READ_ONCE(sysctl_sched_autogroup_enabled))
- goto out;
-
+ /*
+ * We can't avoid sched_move_task() after we changed signal->autogroup,
+ * this process can already run with task_group() == prev->tg or we can
+ * race with cgroup code which can read autogroup = prev under rq->lock.
+ * In the latter case for_each_thread() can not miss a migrating thread,
+ * cpu_cgroup_attach() must not be possible after cgroup_exit() and it
+ * can't be removed from thread list, we hold ->siglock.
+ */
for_each_thread(p, t)
sched_move_task(t);
-out:
+
unlock_task_sighand(p, &flags);
autogroup_kref_put(prev);
}
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads
2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
2016-11-14 18:58 ` Oleg Nesterov
@ 2016-11-22 12:29 ` tip-bot for Oleg Nesterov
1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-11-22 12:29 UTC (permalink / raw)
To: linux-tip-commits
Cc: efault, tglx, oleg, mingo, hpa, peterz, torvalds, linux-kernel
Commit-ID: 8e5bfa8c1f8471aa4a2d30be631ef2b50e10abaf
Gitweb: http://git.kernel.org/tip/8e5bfa8c1f8471aa4a2d30be631ef2b50e10abaf
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Mon, 14 Nov 2016 19:46:12 +0100
Committer: Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 22 Nov 2016 12:33:43 +0100
sched/autogroup: Do not use autogroup->tg in zombie threads
Exactly because for_each_thread() in autogroup_move_group() can't see it
and update its ->sched_task_group before _put() and possibly free().
So the exiting task needs another sched_move_task() before exit_notify()
and we need to re-introduce the PF_EXITING (or similar) check removed by
the previous change for another reason.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Cc: vlovejoy@redhat.com
Link: http://lkml.kernel.org/r/20161114184612.GA15968@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
include/linux/sched.h | 2 ++
kernel/exit.c | 1 +
kernel/sched/auto_group.c | 19 +++++++++++++++++++
3 files changed, 22 insertions(+)
diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..e9c009d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2567,6 +2567,7 @@ extern void sched_autogroup_create_attach(struct task_struct *p);
extern void sched_autogroup_detach(struct task_struct *p);
extern void sched_autogroup_fork(struct signal_struct *sig);
extern void sched_autogroup_exit(struct signal_struct *sig);
+extern void sched_autogroup_exit_task(struct task_struct *p);
#ifdef CONFIG_PROC_FS
extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
extern int proc_sched_autogroup_set_nice(struct task_struct *p, int nice);
@@ -2576,6 +2577,7 @@ static inline void sched_autogroup_create_attach(struct task_struct *p) { }
static inline void sched_autogroup_detach(struct task_struct *p) { }
static inline void sched_autogroup_fork(struct signal_struct *sig) { }
static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit_task(struct task_struct *p) { }
#endif
extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/kernel/exit.c b/kernel/exit.c
index 9d68c45..3076f30 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -836,6 +836,7 @@ void __noreturn do_exit(long code)
*/
perf_event_exit_task(tsk);
+ sched_autogroup_exit_task(tsk);
cgroup_exit(tsk);
/*
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index ad2b19a..f1c8fd5 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -115,10 +115,26 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
* If we race with autogroup_move_group() the caller can use the old
* value of signal->autogroup but in this case sched_move_task() will
* be called again before autogroup_kref_put().
+ *
+ * However, there is no way sched_autogroup_exit_task() could tell us
+ * to avoid autogroup->tg, so we abuse PF_EXITING flag for this case.
*/
+ if (p->flags & PF_EXITING)
+ return false;
+
return true;
}
+void sched_autogroup_exit_task(struct task_struct *p)
+{
+ /*
+ * We are going to call exit_notify() and autogroup_move_group() can't
+ * see this thread after that: we can no longer use signal->autogroup.
+ * See the PF_EXITING check in task_wants_autogroup().
+ */
+ sched_move_task(p);
+}
+
static void
autogroup_move_group(struct task_struct *p, struct autogroup *ag)
{
@@ -142,6 +158,9 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
* In the latter case for_each_thread() can not miss a migrating thread,
* cpu_cgroup_attach() must not be possible after cgroup_exit() and it
* can't be removed from thread list, we hold ->siglock.
+ *
+ * If an exiting thread was already removed from thread list we rely on
+ * sched_autogroup_exit_task().
*/
for_each_thread(p, t)
sched_move_task(t);
^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-11-22 12:31 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-14 18:45 [PATCH 0/2] sched/autogroup: use-after-free fixes Oleg Nesterov
2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
2016-11-22 12:28 ` [tip:sched/urgent] sched/autogroup: Fix autogroup_move_group() to " tip-bot for Oleg Nesterov
2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
2016-11-14 18:58 ` Oleg Nesterov
2016-11-22 12:29 ` [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads tip-bot for Oleg Nesterov
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).