linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] sched/autogroup: use-after-free fixes
@ 2016-11-14 18:45 Oleg Nesterov
  2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
  2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
  0 siblings, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:45 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
  Cc: hartsjc, vbendel, vlovejoy, linux-kernel

Two simple fixes for -stable.

Note that this logic was wrong from the very beginning and even if sysctl
doesn't change, see 2/2.

And I think it needs some cleanups in any case, but lets do later/separately.

Please review,

Oleg.

 include/linux/sched.h     |  2 ++
 kernel/exit.c             |  1 +
 kernel/sched/auto_group.c | 36 ++++++++++++++++++++++++++++--------
 3 files changed, 31 insertions(+), 8 deletions(-)

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task()
  2016-11-14 18:45 [PATCH 0/2] sched/autogroup: use-after-free fixes Oleg Nesterov
@ 2016-11-14 18:46 ` Oleg Nesterov
  2016-11-22 12:28   ` [tip:sched/urgent] sched/autogroup: Fix autogroup_move_group() to " tip-bot for Oleg Nesterov
  2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
  1 sibling, 1 reply; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:46 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
  Cc: hartsjc, vbendel, vlovejoy, linux-kernel

The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.

However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:

	int main(void)
	{
		int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);

		assert(sctl > 0);
		if (fork()) {
			wait(NULL); // destroy the child's ag/tg
			pause();
		}

		assert(pwrite(sctl, "1\n", 2, 0) == 2);
		assert(setsid() > 0);
		if (fork())
			pause();

		kill(getppid(), SIGKILL);
		sleep(1);

		// The child has gone, the grandchild runs with kref == 1
		assert(pwrite(sctl, "0\n", 2, 0) == 2);
		assert(setsid() > 0);

		// runs with the freed ag/tg
		for (;;)
			sleep(1);

		return 0;
	}

crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.

Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
---
 kernel/sched/auto_group.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index a5d966c..ad2b19a 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -111,14 +111,11 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
 {
 	if (tg != &root_task_group)
 		return false;
-
 	/*
-	 * We can only assume the task group can't go away on us if
-	 * autogroup_move_group() can see us on ->thread_group list.
+	 * If we race with autogroup_move_group() the caller can use the old
+	 * value of signal->autogroup but in this case sched_move_task() will
+	 * be called again before autogroup_kref_put().
 	 */
-	if (p->flags & PF_EXITING)
-		return false;
-
 	return true;
 }
 
@@ -138,13 +135,17 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 	}
 
 	p->signal->autogroup = autogroup_kref_get(ag);
-
-	if (!READ_ONCE(sysctl_sched_autogroup_enabled))
-		goto out;
-
+	/*
+	 * We can't avoid sched_move_task() after we changed signal->autogroup,
+	 * this process can already run with task_group() == prev->tg or we can
+	 * race with cgroup code which can read autogroup = prev under rq->lock.
+	 * In the latter case for_each_thread() can not miss a migrating thread,
+	 * cpu_cgroup_attach() must not be possible after cgroup_exit() and it
+	 * can't be removed from thread list, we hold ->siglock.
+	 */
 	for_each_thread(p, t)
 		sched_move_task(t);
-out:
+
 	unlock_task_sighand(p, &flags);
 	autogroup_kref_put(prev);
 }
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg
  2016-11-14 18:45 [PATCH 0/2] sched/autogroup: use-after-free fixes Oleg Nesterov
  2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
@ 2016-11-14 18:46 ` Oleg Nesterov
  2016-11-14 18:58   ` Oleg Nesterov
  2016-11-22 12:29   ` [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads tip-bot for Oleg Nesterov
  1 sibling, 2 replies; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:46 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
  Cc: hartsjc, vbendel, vlovejoy, linux-kernel

Exactly because for_each_thread() in autogroup_move_group() can't see it
and update its ->sched_task_group before _put() and possibly free().

So the exiting task needs another sched_move_task() before exit_notify()
and we need to re-introduce the PF_EXITING (or similar) check removed by
the previous change for another reason.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Cc: stable@vger.kernel.org
---
 include/linux/sched.h     |  2 ++
 kernel/exit.c             |  1 +
 kernel/sched/auto_group.c | 19 +++++++++++++++++++
 3 files changed, 22 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..e9c009d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2567,6 +2567,7 @@ extern void sched_autogroup_create_attach(struct task_struct *p);
 extern void sched_autogroup_detach(struct task_struct *p);
 extern void sched_autogroup_fork(struct signal_struct *sig);
 extern void sched_autogroup_exit(struct signal_struct *sig);
+extern void sched_autogroup_exit_task(struct task_struct *p);
 #ifdef CONFIG_PROC_FS
 extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
 extern int proc_sched_autogroup_set_nice(struct task_struct *p, int nice);
@@ -2576,6 +2577,7 @@ static inline void sched_autogroup_create_attach(struct task_struct *p) { }
 static inline void sched_autogroup_detach(struct task_struct *p) { }
 static inline void sched_autogroup_fork(struct signal_struct *sig) { }
 static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit_task(struct task_struct *p) { }
 #endif
 
 extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/kernel/exit.c b/kernel/exit.c
index f3dd46d..76e263e 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -837,6 +837,7 @@ void __noreturn do_exit(long code)
 	 */
 	perf_event_exit_task(tsk);
 
+	sched_autogroup_exit_task(tsk);
 	cgroup_exit(tsk);
 
 	/*
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index ad2b19a..f1c8fd5 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -115,10 +115,26 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
 	 * If we race with autogroup_move_group() the caller can use the old
 	 * value of signal->autogroup but in this case sched_move_task() will
 	 * be called again before autogroup_kref_put().
+	 *
+	 * However, there is no way sched_autogroup_exit_task() could tell us
+	 * to avoid autogroup->tg, so we abuse PF_EXITING flag for this case.
 	 */
+	if (p->flags & PF_EXITING)
+		return false;
+
 	return true;
 }
 
+void sched_autogroup_exit_task(struct task_struct *p)
+{
+	/*
+	 * We are going to call exit_notify() and autogroup_move_group() can't
+	 * see this thread after that: we can no longer use signal->autogroup.
+	 * See the PF_EXITING check in task_wants_autogroup().
+	 */
+	sched_move_task(p);
+}
+
 static void
 autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 {
@@ -142,6 +158,9 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 	 * In the latter case for_each_thread() can not miss a migrating thread,
 	 * cpu_cgroup_attach() must not be possible after cgroup_exit() and it
 	 * can't be removed from thread list, we hold ->siglock.
+	 *
+	 * If an exiting thread was already removed from thread list we rely on
+	 * sched_autogroup_exit_task().
 	 */
 	for_each_thread(p, t)
 		sched_move_task(t);
-- 
2.5.0

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg
  2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
@ 2016-11-14 18:58   ` Oleg Nesterov
  2016-11-22 12:29   ` [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: Oleg Nesterov @ 2016-11-14 18:58 UTC (permalink / raw)
  To: Ingo Molnar, Linus Torvalds, Mike Galbraith, Peter Zijlstra
  Cc: hartsjc, vbendel, vlovejoy, linux-kernel

On 11/14, Oleg Nesterov wrote:
>
> +void sched_autogroup_exit_task(struct task_struct *p)
> +{
> +	/*
> +	 * We are going to call exit_notify() and autogroup_move_group() can't
> +	 * see this thread after that: we can no longer use signal->autogroup.
> +	 * See the PF_EXITING check in task_wants_autogroup().
> +	 */
> +	sched_move_task(p);

We only need it if task_group_is_autogroup(task_group(p)) but then this code
will need even more comments, lets keep it simple.

Oleg.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [tip:sched/urgent] sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()
  2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
@ 2016-11-22 12:28   ` tip-bot for Oleg Nesterov
  0 siblings, 0 replies; 6+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-11-22 12:28 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: mingo, peterz, linux-kernel, vlovejoy, tglx, oleg, efault,
	torvalds, hpa

Commit-ID:  18f649ef344127ef6de23a5a4272dbe2fdb73dde
Gitweb:     http://git.kernel.org/tip/18f649ef344127ef6de23a5a4272dbe2fdb73dde
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Mon, 14 Nov 2016 19:46:09 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 22 Nov 2016 12:33:42 +0100

sched/autogroup: Fix autogroup_move_group() to never skip sched_move_task()

The PF_EXITING check in task_wants_autogroup() is no longer needed. Remove
it, but see the next patch.

However the comment is correct in that autogroup_move_group() must always
change task_group() for every thread so the sysctl_ check is very wrong;
we can race with cgroups and even sys_setsid() is not safe because a task
running with task_group() == ag->tg must participate in refcounting:

	int main(void)
	{
		int sctl = open("/proc/sys/kernel/sched_autogroup_enabled", O_WRONLY);

		assert(sctl > 0);
		if (fork()) {
			wait(NULL); // destroy the child's ag/tg
			pause();
		}

		assert(pwrite(sctl, "1\n", 2, 0) == 2);
		assert(setsid() > 0);
		if (fork())
			pause();

		kill(getppid(), SIGKILL);
		sleep(1);

		// The child has gone, the grandchild runs with kref == 1
		assert(pwrite(sctl, "0\n", 2, 0) == 2);
		assert(setsid() > 0);

		// runs with the freed ag/tg
		for (;;)
			sleep(1);

		return 0;
	}

crashes the kernel. It doesn't really need sleep(1), it doesn't matter if
autogroup_move_group() actually frees the task_group or this happens later.

Reported-by: Vern Lovejoy <vlovejoy@redhat.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Link: http://lkml.kernel.org/r/20161114184609.GA15965@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 kernel/sched/auto_group.c | 23 ++++++++++++-----------
 1 file changed, 12 insertions(+), 11 deletions(-)

diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index a5d966c..ad2b19a 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -111,14 +111,11 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
 {
 	if (tg != &root_task_group)
 		return false;
-
 	/*
-	 * We can only assume the task group can't go away on us if
-	 * autogroup_move_group() can see us on ->thread_group list.
+	 * If we race with autogroup_move_group() the caller can use the old
+	 * value of signal->autogroup but in this case sched_move_task() will
+	 * be called again before autogroup_kref_put().
 	 */
-	if (p->flags & PF_EXITING)
-		return false;
-
 	return true;
 }
 
@@ -138,13 +135,17 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 	}
 
 	p->signal->autogroup = autogroup_kref_get(ag);
-
-	if (!READ_ONCE(sysctl_sched_autogroup_enabled))
-		goto out;
-
+	/*
+	 * We can't avoid sched_move_task() after we changed signal->autogroup,
+	 * this process can already run with task_group() == prev->tg or we can
+	 * race with cgroup code which can read autogroup = prev under rq->lock.
+	 * In the latter case for_each_thread() can not miss a migrating thread,
+	 * cpu_cgroup_attach() must not be possible after cgroup_exit() and it
+	 * can't be removed from thread list, we hold ->siglock.
+	 */
 	for_each_thread(p, t)
 		sched_move_task(t);
-out:
+
 	unlock_task_sighand(p, &flags);
 	autogroup_kref_put(prev);
 }

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads
  2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
  2016-11-14 18:58   ` Oleg Nesterov
@ 2016-11-22 12:29   ` tip-bot for Oleg Nesterov
  1 sibling, 0 replies; 6+ messages in thread
From: tip-bot for Oleg Nesterov @ 2016-11-22 12:29 UTC (permalink / raw)
  To: linux-tip-commits
  Cc: efault, tglx, oleg, mingo, hpa, peterz, torvalds, linux-kernel

Commit-ID:  8e5bfa8c1f8471aa4a2d30be631ef2b50e10abaf
Gitweb:     http://git.kernel.org/tip/8e5bfa8c1f8471aa4a2d30be631ef2b50e10abaf
Author:     Oleg Nesterov <oleg@redhat.com>
AuthorDate: Mon, 14 Nov 2016 19:46:12 +0100
Committer:  Ingo Molnar <mingo@kernel.org>
CommitDate: Tue, 22 Nov 2016 12:33:43 +0100

sched/autogroup: Do not use autogroup->tg in zombie threads

Exactly because for_each_thread() in autogroup_move_group() can't see it
and update its ->sched_task_group before _put() and possibly free().

So the exiting task needs another sched_move_task() before exit_notify()
and we need to re-introduce the PF_EXITING (or similar) check removed by
the previous change for another reason.

Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: hartsjc@redhat.com
Cc: vbendel@redhat.com
Cc: vlovejoy@redhat.com
Link: http://lkml.kernel.org/r/20161114184612.GA15968@redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
---
 include/linux/sched.h     |  2 ++
 kernel/exit.c             |  1 +
 kernel/sched/auto_group.c | 19 +++++++++++++++++++
 3 files changed, 22 insertions(+)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 348f51b..e9c009d 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -2567,6 +2567,7 @@ extern void sched_autogroup_create_attach(struct task_struct *p);
 extern void sched_autogroup_detach(struct task_struct *p);
 extern void sched_autogroup_fork(struct signal_struct *sig);
 extern void sched_autogroup_exit(struct signal_struct *sig);
+extern void sched_autogroup_exit_task(struct task_struct *p);
 #ifdef CONFIG_PROC_FS
 extern void proc_sched_autogroup_show_task(struct task_struct *p, struct seq_file *m);
 extern int proc_sched_autogroup_set_nice(struct task_struct *p, int nice);
@@ -2576,6 +2577,7 @@ static inline void sched_autogroup_create_attach(struct task_struct *p) { }
 static inline void sched_autogroup_detach(struct task_struct *p) { }
 static inline void sched_autogroup_fork(struct signal_struct *sig) { }
 static inline void sched_autogroup_exit(struct signal_struct *sig) { }
+static inline void sched_autogroup_exit_task(struct task_struct *p) { }
 #endif
 
 extern int yield_to(struct task_struct *p, bool preempt);
diff --git a/kernel/exit.c b/kernel/exit.c
index 9d68c45..3076f30 100644
--- a/kernel/exit.c
+++ b/kernel/exit.c
@@ -836,6 +836,7 @@ void __noreturn do_exit(long code)
 	 */
 	perf_event_exit_task(tsk);
 
+	sched_autogroup_exit_task(tsk);
 	cgroup_exit(tsk);
 
 	/*
diff --git a/kernel/sched/auto_group.c b/kernel/sched/auto_group.c
index ad2b19a..f1c8fd5 100644
--- a/kernel/sched/auto_group.c
+++ b/kernel/sched/auto_group.c
@@ -115,10 +115,26 @@ bool task_wants_autogroup(struct task_struct *p, struct task_group *tg)
 	 * If we race with autogroup_move_group() the caller can use the old
 	 * value of signal->autogroup but in this case sched_move_task() will
 	 * be called again before autogroup_kref_put().
+	 *
+	 * However, there is no way sched_autogroup_exit_task() could tell us
+	 * to avoid autogroup->tg, so we abuse PF_EXITING flag for this case.
 	 */
+	if (p->flags & PF_EXITING)
+		return false;
+
 	return true;
 }
 
+void sched_autogroup_exit_task(struct task_struct *p)
+{
+	/*
+	 * We are going to call exit_notify() and autogroup_move_group() can't
+	 * see this thread after that: we can no longer use signal->autogroup.
+	 * See the PF_EXITING check in task_wants_autogroup().
+	 */
+	sched_move_task(p);
+}
+
 static void
 autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 {
@@ -142,6 +158,9 @@ autogroup_move_group(struct task_struct *p, struct autogroup *ag)
 	 * In the latter case for_each_thread() can not miss a migrating thread,
 	 * cpu_cgroup_attach() must not be possible after cgroup_exit() and it
 	 * can't be removed from thread list, we hold ->siglock.
+	 *
+	 * If an exiting thread was already removed from thread list we rely on
+	 * sched_autogroup_exit_task().
 	 */
 	for_each_thread(p, t)
 		sched_move_task(t);

^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2016-11-22 12:31 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-11-14 18:45 [PATCH 0/2] sched/autogroup: use-after-free fixes Oleg Nesterov
2016-11-14 18:46 ` [PATCH 1/2] sched/autogroup: autogroup_move_group() must never skip sched_move_task() Oleg Nesterov
2016-11-22 12:28   ` [tip:sched/urgent] sched/autogroup: Fix autogroup_move_group() to " tip-bot for Oleg Nesterov
2016-11-14 18:46 ` [PATCH 2/2] sched/autogroup: a zombie thread must not use autogroup->tg Oleg Nesterov
2016-11-14 18:58   ` Oleg Nesterov
2016-11-22 12:29   ` [tip:sched/urgent] sched/autogroup: Do not use autogroup->tg in zombie threads tip-bot for Oleg Nesterov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).