From: Peter Zijlstra <peterz@infradead.org>
To: Oleg Nesterov <oleg@redhat.com>
Cc: Sasha Levin <sasha.levin@oracle.com>,
Ingo Molnar <mingo@kernel.org>,
John Stultz <john.stultz@linaro.org>,
Thomas Gleixner <tglx@linutronix.de>,
Frederic Weisbecker <fweisbec@gmail.com>,
LKML <linux-kernel@vger.kernel.org>,
Dave Jones <davej@redhat.com>,
Andrey Ryabinin <a.ryabinin@samsung.com>
Subject: Re: finish_task_switch && prev_state (Was: sched, timers: use after free in __lock_task_sighand when exiting a process)
Date: Tue, 29 Jul 2014 11:22:37 +0200 [thread overview]
Message-ID: <20140729092237.GU12054@laptop.lan> (raw)
In-Reply-To: <20140729091018.GT20603@laptop.programming.kicks-ass.net>
On Tue, Jul 29, 2014 at 11:10:18AM +0200, Peter Zijlstra wrote:
> On Tue, Jul 15, 2014 at 04:25:25PM +0200, Oleg Nesterov wrote:
>
> > And probably I missed something again, but it seems that this logic is broken
> > with __ARCH_WANT_UNLOCKED_CTXSW.
> >
> > Of course, even if I am right this is pure theoretical, but smp_wmb() before
> > "->on_cpu = 0" is not enough and we need a full barrier ?
>
> (long delay there, forgot about this thread, sorry)
>
> Yes, I think I see that.. but now I think the comment is further wrong.
>
> Its not rq->lock that is important, remember, a concurrent wakeup onto
> another CPU does not require our rq->lock at all.
>
> It is the ->on_cpu = 0 store that is important (for both the
> UNLOCKED_CTXSW cases). As soon as that store comes through the task can
> start running on the remote cpu.
>
> Now the below patch 'fixes' this but at the cost of adding a full
> barrier which is somewhat unfortunate to say the least.
>
> wmb's are free on x86 and generally cheaper than mbs, so it would to
> find another solution to this problem...
Something like so then?
---
kernel/sched/core.c | 36 ++++++++++++++++++++----------------
1 file changed, 20 insertions(+), 16 deletions(-)
diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2676866b4394..179390f7380d 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -2190,6 +2190,7 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
* finish_task_switch - clean up after a task-switch
* @rq: runqueue associated with task-switch
* @prev: the thread we just switched away from.
+ * @prev_state: the state of @prev before we switched away from it.
*
* finish_task_switch must be called after the context switch, paired
* with a prepare_task_switch call before the context switch.
@@ -2201,26 +2202,14 @@ prepare_task_switch(struct rq *rq, struct task_struct *prev,
* with the lock held can cause deadlocks; see schedule() for
* details.)
*/
-static void finish_task_switch(struct rq *rq, struct task_struct *prev)
+static void
+finish_task_switch(struct rq *rq, struct task_struct *prev, long prev_state)
__releases(rq->lock)
{
struct mm_struct *mm = rq->prev_mm;
- long prev_state;
rq->prev_mm = NULL;
- /*
- * A task struct has one reference for the use as "current".
- * If a task dies, then it sets TASK_DEAD in tsk->state and calls
- * schedule one last time. The schedule call will never return, and
- * the scheduled task must drop that reference.
- * The test for TASK_DEAD must occur while the runqueue locks are
- * still held, otherwise prev could be scheduled on another cpu, die
- * there before we look at prev->state, and then the reference would
- * be dropped twice.
- * Manfred Spraul <manfred@colorfullife.com>
- */
- prev_state = prev->state;
vtime_task_switch(prev);
finish_arch_switch(prev);
perf_event_task_sched_in(prev, current);
@@ -2279,7 +2268,7 @@ asmlinkage __visible void schedule_tail(struct task_struct *prev)
{
struct rq *rq = this_rq();
- finish_task_switch(rq, prev);
+ finish_task_switch(rq, prev, 0);
/*
* FIXME: do we need to worry about rq being invalidated by the
@@ -2304,6 +2293,21 @@ context_switch(struct rq *rq, struct task_struct *prev,
struct task_struct *next)
{
struct mm_struct *mm, *oldmm;
+ /*
+ * A task struct has one reference for the use as "current".
+ * If a task dies, then it sets TASK_DEAD in tsk->state and calls
+ * schedule one last time. The schedule call will never return, and
+ * the scheduled task must drop that reference.
+ *
+ * We must observe prev->state before clearing prev->on_cpu (in
+ * finish_lock_switch), otherwise a concurrent wakeup can get prev
+ * running on another CPU and we could race with its RUNNING -> DEAD
+ * transition, and then the reference would be dropped twice.
+ *
+ * We avoid the race by observing prev->state while it is still
+ * current.
+ */
+ long prev_state = prev->state;
prepare_task_switch(rq, prev, next);
@@ -2347,7 +2351,7 @@ context_switch(struct rq *rq, struct task_struct *prev,
* CPUs since it called schedule(), thus the 'rq' on its stack
* frame will be invalid.
*/
- finish_task_switch(this_rq(), prev);
+ finish_task_switch(this_rq(), prev, prev_state);
}
/*
next prev parent reply other threads:[~2014-07-29 9:22 UTC|newest]
Thread overview: 17+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-07-13 21:51 sched, timers: use after free in __lock_task_sighand when exiting a process Sasha Levin
2014-07-13 23:45 ` Sasha Levin
2014-07-14 9:04 ` Peter Zijlstra
2014-07-14 9:34 ` Andrey Ryabinin
2014-07-14 9:58 ` Peter Zijlstra
2014-07-14 10:25 ` Andrey Ryabinin
2014-07-14 14:49 ` Oleg Nesterov
2014-07-14 15:13 ` Oleg Nesterov
2014-07-14 15:31 ` Andrey Ryabinin
2014-07-14 16:01 ` finish_task_switch && prev_state (Was: sched, timers: use after free in __lock_task_sighand when exiting a process) Oleg Nesterov
2014-07-15 13:12 ` Oleg Nesterov
2014-07-15 13:23 ` Peter Zijlstra
2014-07-15 14:25 ` Oleg Nesterov
2014-07-29 9:10 ` Peter Zijlstra
2014-07-29 9:22 ` Peter Zijlstra [this message]
2014-07-29 15:53 ` Oleg Nesterov
2014-07-15 13:28 ` sched, timers: use after free in __lock_task_sighand when exiting a process Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140729092237.GU12054@laptop.lan \
--to=peterz@infradead.org \
--cc=a.ryabinin@samsung.com \
--cc=davej@redhat.com \
--cc=fweisbec@gmail.com \
--cc=john.stultz@linaro.org \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@kernel.org \
--cc=oleg@redhat.com \
--cc=sasha.levin@oracle.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox