From: <gregkh@linuxfoundation.org>
To: peterz@infradead.org, gregkh@linuxfoundation.org,
mingo@kernel.org, oleg@redhat.com, tglx@linutronix.de,
torvalds@linux-foundation.org
Cc: <stable@vger.kernel.org>, <stable-commits@vger.kernel.org>
Subject: Patch "sched/core: Fix TASK_DEAD race in finish_task_switch()" has been added to the 4.2-stable tree
Date: Tue, 13 Oct 2015 15:58:54 -0700 [thread overview]
Message-ID: <1444777134529@kroah.com> (raw)
This is a note to let you know that I've just added the patch titled
sched/core: Fix TASK_DEAD race in finish_task_switch()
to the 4.2-stable tree which can be found at:
http://www.kernel.org/git/?p=linux/kernel/git/stable/stable-queue.git;a=summary
The filename of the patch is:
sched-core-fix-task_dead-race-in-finish_task_switch.patch
and it can be found in the queue-4.2 subdirectory.
If you, or anyone else, feels it should not be added to the stable tree,
please let <stable@vger.kernel.org> know about it.
>From 95913d97914f44db2b81271c2e2ebd4d2ac2df83 Mon Sep 17 00:00:00 2001
From: Peter Zijlstra <peterz@infradead.org>
Date: Tue, 29 Sep 2015 14:45:09 +0200
Subject: sched/core: Fix TASK_DEAD race in finish_task_switch()
From: Peter Zijlstra <peterz@infradead.org>
commit 95913d97914f44db2b81271c2e2ebd4d2ac2df83 upstream.
So the problem this patch is trying to address is as follows:
CPU0 CPU1
context_switch(A, B)
ttwu(A)
LOCK A->pi_lock
A->on_cpu == 0
finish_task_switch(A)
prev_state = A->state <-.
WMB |
A->on_cpu = 0; |
UNLOCK rq0->lock |
| context_switch(C, A)
`-- A->state = TASK_DEAD
prev_state == TASK_DEAD
put_task_struct(A)
context_switch(A, C)
finish_task_switch(A)
A->state == TASK_DEAD
put_task_struct(A)
The argument being that the WMB will allow the load of A->state on CPU0
to cross over and observe CPU1's store of A->state, which will then
result in a double-drop and use-after-free.
Now the comment states (and this was true once upon a long time ago)
that we need to observe A->state while holding rq->lock because that
will order us against the wakeup; however the wakeup will not in fact
acquire (that) rq->lock; it takes A->pi_lock these days.
We can obviously fix this by upgrading the WMB to an MB, but that is
expensive, so we'd rather avoid that.
The alternative this patch takes is: smp_store_release(&A->on_cpu, 0),
which avoids the MB on some archs, but not important ones like ARM.
Reported-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: linux-kernel@vger.kernel.org
Cc: manfred@colorfullife.com
Cc: will.deacon@arm.com
Fixes: e4a52bcb9a18 ("sched: Remove rq->lock from the first half of ttwu()")
Link: http://lkml.kernel.org/r/20150929124509.GG3816@twins.programming.kicks-ass.net
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
---
kernel/sched/core.c | 10 +++++-----
kernel/sched/sched.h | 5 +++--
2 files changed, 8 insertions(+), 7 deletions(-)
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2461,11 +2461,11 @@ static struct rq *finish_task_switch(str
* If a task dies, then it sets TASK_DEAD in tsk->state and calls
* schedule one last time. The schedule call will never return, and
* the scheduled task must drop that reference.
- * The test for TASK_DEAD must occur while the runqueue locks are
- * still held, otherwise prev could be scheduled on another cpu, die
- * there before we look at prev->state, and then the reference would
- * be dropped twice.
- * Manfred Spraul <manfred@colorfullife.com>
+ *
+ * We must observe prev->state before clearing prev->on_cpu (in
+ * finish_lock_switch), otherwise a concurrent wakeup can get prev
+ * running on another CPU and we could rave with its RUNNING -> DEAD
+ * transition, resulting in a double drop.
*/
prev_state = prev->state;
vtime_task_switch(prev);
--- a/kernel/sched/sched.h
+++ b/kernel/sched/sched.h
@@ -1091,9 +1091,10 @@ static inline void finish_lock_switch(st
* After ->on_cpu is cleared, the task can be moved to a different CPU.
* We must ensure this doesn't happen until the switch is completely
* finished.
+ *
+ * Pairs with the control dependency and rmb in try_to_wake_up().
*/
- smp_wmb();
- prev->on_cpu = 0;
+ smp_store_release(&prev->on_cpu, 0);
#endif
#ifdef CONFIG_DEBUG_SPINLOCK
/* this is a valid case when another task releases the spinlock */
Patches currently in stable-queue which might be from peterz@infradead.org are
queue-4.2/time-fix-timekeeping_freqadjust-s-incorrect-use-of-abs-instead-of-abs64.patch
queue-4.2/x86-platform-fix-geode-lx-timekeeping-in-the-generic-x86-build.patch
queue-4.2/x86-efi-fix-boot-crash-by-mapping-efi-memmap-entries-bottom-up-at-runtime-instead-of-top-down.patch
queue-4.2/x86-mm-set-nx-on-gap-between-__ex_table-and-rodata.patch
queue-4.2/perf-x86-intel-fix-constraint-access.patch
queue-4.2/sched-access-local-runqueue-directly-in-single_task_running.patch
queue-4.2/locking-qspinlock-x86-only-emit-the-test-and-set-fallback-when-building-guest-support.patch
queue-4.2/revert-cgroup-simplify-threadgroup-locking.patch
queue-4.2/x86-kexec-fix-kexec-crash-in-syscall-kexec_file_load.patch
queue-4.2/revert-sched-cgroup-replace-signal_struct-group_rwsem-with-a-global-percpu_rwsem.patch
queue-4.2/locking-qspinlock-x86-fix-performance-regression-under-unaccelerated-vms.patch
queue-4.2/sched-core-fix-task_dead-race-in-finish_task_switch.patch
reply other threads:[~2015-10-13 22:59 UTC|newest]
Thread overview: [no followups] expand[flat|nested] mbox.gz Atom feed
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1444777134529@kroah.com \
--to=gregkh@linuxfoundation.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=stable-commits@vger.kernel.org \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.