* [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
@ 2010-03-15 9:10 Oleg Nesterov
2010-03-24 15:41 ` Peter Zijlstra
2010-04-02 19:12 ` [tip:sched/core] sched: move_task_off_dead_cpu(): Take " tip-bot for Oleg Nesterov
0 siblings, 2 replies; 7+ messages in thread
From: Oleg Nesterov @ 2010-03-15 9:10 UTC (permalink / raw)
To: Peter Zijlstra, Ingo Molnar
Cc: Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan, Miao Xie,
Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
---
kernel/sched.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
--- 34-rc1/kernel/sched.c~2_MTODC_TAKE_RQ_LOCK 2010-03-15 09:40:16.000000000 +0100
+++ 34-rc1/kernel/sched.c 2010-03-15 09:40:31.000000000 +0100
@@ -5509,29 +5509,29 @@ static int migration_thread(void *data)
}
#ifdef CONFIG_HOTPLUG_CPU
-
-static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu)
-{
- int ret;
-
- local_irq_disable();
- ret = __migrate_task(p, src_cpu, dest_cpu);
- local_irq_enable();
- return ret;
-}
-
/*
* Figure out where task on dead CPU should go, use force if necessary.
*/
static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
{
- int dest_cpu;
-
+ struct rq *rq = cpu_rq(dead_cpu);
+ int needs_cpu, dest_cpu;
+ unsigned long flags;
again:
- dest_cpu = select_fallback_rq(dead_cpu, p);
+ local_irq_save(flags);
+
+ raw_spin_lock(&rq->lock);
+ needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
+ if (needs_cpu)
+ dest_cpu = select_fallback_rq(dead_cpu, p);
+ raw_spin_unlock(&rq->lock);
/* It can have affinity changed while we were choosing. */
- if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
+ if (needs_cpu)
+ needs_cpu = !__migrate_task(p, dead_cpu, dest_cpu);
+ local_irq_restore(flags);
+
+ if (unlikely(needs_cpu))
goto again;
}
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-15 9:10 [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq() Oleg Nesterov
@ 2010-03-24 15:41 ` Peter Zijlstra
2010-03-24 16:07 ` Oleg Nesterov
2010-04-02 19:12 ` [tip:sched/core] sched: move_task_off_dead_cpu(): Take " tip-bot for Oleg Nesterov
1 sibling, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2010-03-24 15:41 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> {
> + struct rq *rq = cpu_rq(dead_cpu);
> + int needs_cpu, dest_cpu;
> + unsigned long flags;
> again:
> + local_irq_save(flags);
> +
> + raw_spin_lock(&rq->lock);
> + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
^
kernel/sched.c:5445: warning: ‘dest_cpu’ may be used uninitialized in this function
> + if (needs_cpu)
> + dest_cpu = select_fallback_rq(dead_cpu, p);
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 15:41 ` Peter Zijlstra
@ 2010-03-24 16:07 ` Oleg Nesterov
2010-03-24 16:17 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2010-03-24 16:07 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On 03/24, Peter Zijlstra wrote:
>
> On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> > static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> > {
> > + struct rq *rq = cpu_rq(dead_cpu);
> > + int needs_cpu, dest_cpu;
> > + unsigned long flags;
> > again:
> > + local_irq_save(flags);
> > +
> > + raw_spin_lock(&rq->lock);
> > + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
>
> ^
> kernel/sched.c:5445: warning: ‘dest_cpu’ may be used uninitialized in this function
Hmm. looks like my gcc is more friendly...
OK. certainly I'll send the updated patch, if this series passes
your review otherwise.
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 16:07 ` Oleg Nesterov
@ 2010-03-24 16:17 ` Peter Zijlstra
2010-03-24 16:33 ` Oleg Nesterov
0 siblings, 1 reply; 7+ messages in thread
From: Peter Zijlstra @ 2010-03-24 16:17 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On Wed, 2010-03-24 at 17:07 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > On Mon, 2010-03-15 at 10:10 +0100, Oleg Nesterov wrote:
> > > static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
> > > {
> > > + struct rq *rq = cpu_rq(dead_cpu);
> > > + int needs_cpu, dest_cpu;
> > > + unsigned long flags;
> > > again:
> > > + local_irq_save(flags);
> > > +
> > > + raw_spin_lock(&rq->lock);
> > > + needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
> >
> > ^
> > kernel/sched.c:5445: warning: ‘dest_cpu’ may be used uninitialized in this function
>
> Hmm. looks like my gcc is more friendly...
Hrm, that and I'm apparently unable to read, it said dest_cpu, not
dead_cpu.. a well, I'll slam an __maybe_unused in.
> OK. certainly I'll send the updated patch, if this series passes
> your review otherwise.
Yeah, you made a few good points in 0/6, am now staring at the code on
how to close those holes, hope to post something sensible soon.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 16:17 ` Peter Zijlstra
@ 2010-03-24 16:33 ` Oleg Nesterov
2010-03-26 9:06 ` Peter Zijlstra
0 siblings, 1 reply; 7+ messages in thread
From: Oleg Nesterov @ 2010-03-24 16:33 UTC (permalink / raw)
To: Peter Zijlstra
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On 03/24, Peter Zijlstra wrote:
>
> Yeah, you made a few good points in 0/6, am now staring at the code on
> how to close those holes, hope to post something sensible soon.
Yes, great.
Speaking of 0/6, I forgot to ask a couple more question...
try_to_wake_up() does task_rq_lock() which checks TASK_WAKING. Perhaps
it shouldn't ? I mean, perhaps try_to_wake_up() can take rq->lock without
checking task->state. It can never race with the owner of TASK_WAKING,
before anything else we check "p->state & state".
And. Without the change above, any owner of TASK_WAKING must disable
preemption and clear irqs.
What do you think?
And a stupid question. While doing these changes I was really, really
puzzled by task_rq_lock() which does
local_irq_save(*flags);
rq = task_rq(p);
raw_spin_lock(&rq->lock);
to the point, I even tried to read the comment which says:
Note the ordering: we can safely lookup the task_rq without
explicitly disabling preemption.
Could you please explain what does this mean? IOW, why can't we do
rq = task_rq(p);
raw_spin_lock_irqsave(&rq->lock, flags);
instead?
Of course, this doesn't really matter, but I'd like to understand
what I have missed here.
Thanks,
Oleg.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq()
2010-03-24 16:33 ` Oleg Nesterov
@ 2010-03-26 9:06 ` Peter Zijlstra
0 siblings, 0 replies; 7+ messages in thread
From: Peter Zijlstra @ 2010-03-26 9:06 UTC (permalink / raw)
To: Oleg Nesterov
Cc: Ingo Molnar, Ben Blum, Jiri Slaby, Lai Jiangshan, Li Zefan,
Miao Xie, Paul Menage, Rafael J. Wysocki, Tejun Heo, linux-kernel
On Wed, 2010-03-24 at 17:33 +0100, Oleg Nesterov wrote:
> On 03/24, Peter Zijlstra wrote:
> >
> > Yeah, you made a few good points in 0/6, am now staring at the code on
> > how to close those holes, hope to post something sensible soon.
>
> Yes, great.
>
> Speaking of 0/6, I forgot to ask a couple more question...
>
> try_to_wake_up() does task_rq_lock() which checks TASK_WAKING. Perhaps
> it shouldn't ? I mean, perhaps try_to_wake_up() can take rq->lock without
> checking task->state. It can never race with the owner of TASK_WAKING,
> before anything else we check "p->state & state".
You're right, but creating a special task_rq_lock() for ttwu() went a
little far, but now that we can remove all that again, this too should
be good again.
> And a stupid question. While doing these changes I was really, really
> puzzled by task_rq_lock() which does
>
> local_irq_save(*flags);
> rq = task_rq(p);
> raw_spin_lock(&rq->lock);
>
> to the point, I even tried to read the comment which says:
>
> Note the ordering: we can safely lookup the task_rq without
> explicitly disabling preemption.
>
> Could you please explain what does this mean? IOW, why can't we do
>
> rq = task_rq(p);
> raw_spin_lock_irqsave(&rq->lock, flags);
>
> instead?
I'm not sure why that is the case, v2.6.14:kernel/sched.c already has
that. Ingo can you remember any reason for this or should we change the
code like Oleg suggests?
^ permalink raw reply [flat|nested] 7+ messages in thread
* [tip:sched/core] sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()
2010-03-15 9:10 [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq() Oleg Nesterov
2010-03-24 15:41 ` Peter Zijlstra
@ 2010-04-02 19:12 ` tip-bot for Oleg Nesterov
1 sibling, 0 replies; 7+ messages in thread
From: tip-bot for Oleg Nesterov @ 2010-04-02 19:12 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, oleg, tglx, mingo
Commit-ID: 1445c08d06c5594895b4fae952ef8a457e89c390
Gitweb: http://git.kernel.org/tip/1445c08d06c5594895b4fae952ef8a457e89c390
Author: Oleg Nesterov <oleg@redhat.com>
AuthorDate: Mon, 15 Mar 2010 10:10:10 +0100
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Fri, 2 Apr 2010 20:12:01 +0200
sched: move_task_off_dead_cpu(): Take rq->lock around select_fallback_rq()
move_task_off_dead_cpu()->select_fallback_rq() reads/updates ->cpus_allowed
lockless. We can race with set_cpus_allowed() running in parallel.
Change it to take rq->lock around select_fallback_rq(). Note that it is not
trivial to move this spin_lock() into select_fallback_rq(), we must recheck
the task was not migrated after we take the lock and other callers do not
need this lock.
To avoid the races with other callers of select_fallback_rq() which rely on
TASK_WAKING, we also check p->state != TASK_WAKING and do nothing otherwise.
The owner of TASK_WAKING must update ->cpus_allowed and choose the correct
CPU anyway, and the subsequent __migrate_task() is just meaningless because
p->se.on_rq must be false.
Alternatively, we could change select_task_rq() to take rq->lock right
after it calls sched_class->select_task_rq(), but this looks a bit ugly.
Also, change it to not assume irqs are disabled and absorb __migrate_task_irq().
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100315091010.GA9131@redhat.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 30 +++++++++++++++---------------
1 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index c0b3ebc..27774b5 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -5448,29 +5448,29 @@ static int migration_thread(void *data)
}
#ifdef CONFIG_HOTPLUG_CPU
-
-static int __migrate_task_irq(struct task_struct *p, int src_cpu, int dest_cpu)
-{
- int ret;
-
- local_irq_disable();
- ret = __migrate_task(p, src_cpu, dest_cpu);
- local_irq_enable();
- return ret;
-}
-
/*
* Figure out where task on dead CPU should go, use force if necessary.
*/
static void move_task_off_dead_cpu(int dead_cpu, struct task_struct *p)
{
- int dest_cpu;
-
+ struct rq *rq = cpu_rq(dead_cpu);
+ int needs_cpu, uninitialized_var(dest_cpu);
+ unsigned long flags;
again:
- dest_cpu = select_fallback_rq(dead_cpu, p);
+ local_irq_save(flags);
+
+ raw_spin_lock(&rq->lock);
+ needs_cpu = (task_cpu(p) == dead_cpu) && (p->state != TASK_WAKING);
+ if (needs_cpu)
+ dest_cpu = select_fallback_rq(dead_cpu, p);
+ raw_spin_unlock(&rq->lock);
/* It can have affinity changed while we were choosing. */
- if (unlikely(!__migrate_task_irq(p, dead_cpu, dest_cpu)))
+ if (needs_cpu)
+ needs_cpu = !__migrate_task(p, dead_cpu, dest_cpu);
+ local_irq_restore(flags);
+
+ if (unlikely(needs_cpu))
goto again;
}
^ permalink raw reply related [flat|nested] 7+ messages in thread
end of thread, other threads:[~2010-04-02 19:12 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-15 9:10 [PATCH 2/6] move_task_off_dead_cpu: take rq->lock around select_fallback_rq() Oleg Nesterov
2010-03-24 15:41 ` Peter Zijlstra
2010-03-24 16:07 ` Oleg Nesterov
2010-03-24 16:17 ` Peter Zijlstra
2010-03-24 16:33 ` Oleg Nesterov
2010-03-26 9:06 ` Peter Zijlstra
2010-04-02 19:12 ` [tip:sched/core] sched: move_task_off_dead_cpu(): Take " tip-bot for Oleg Nesterov
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.