* [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
@ 2009-11-13 9:33 Tejun Heo
2009-11-13 9:55 ` Ingo Molnar
2009-11-15 10:29 ` [tip:sched/urgent] sched, kvm: Fix " tip-bot for Tejun Heo
0 siblings, 2 replies; 9+ messages in thread
From: Tejun Heo @ 2009-11-13 9:33 UTC (permalink / raw)
To: Avi Kivity; +Cc: lkml, Ingo Molnar, Linus Torvalds
In finish_task_switch(), fire_sched_in_preempt_notifiers() is called
after finish_lock_switch(). However, depending on architecture,
preemption can be enabled after finish_lock_switch() which breaks the
semantics of preempt notifiers. Move it before finish_arch_switch().
This also makes in notifiers symmetric to out notifiers in terms of
locking - now both are called under rq lock.
NOT_SIGNED_OFF_YET
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Avi Kivity <avi@redhat.com>
---
Avi, I think kvm should be fine with this but I haven't tested it.
Does this look okay to you? If so, can you please route this through
kvm tree with my signoff?
Thanks.
kernel/sched.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
Index: work/kernel/sched.c
===================================================================
--- work.orig/kernel/sched.c
+++ work/kernel/sched.c
@@ -2751,9 +2751,9 @@ static void finish_task_switch(struct rq
prev_state = prev->state;
finish_arch_switch(prev);
perf_event_task_sched_in(current, cpu_of(rq));
+ fire_sched_in_preempt_notifiers(current);
finish_lock_switch(rq, prev);
- fire_sched_in_preempt_notifiers(current);
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
^ permalink raw reply [flat|nested] 9+ messages in thread* Re: [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
2009-11-13 9:33 [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers Tejun Heo
@ 2009-11-13 9:55 ` Ingo Molnar
2009-11-14 13:06 ` Avi Kivity
2009-11-15 10:29 ` [tip:sched/urgent] sched, kvm: Fix " tip-bot for Tejun Heo
1 sibling, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2009-11-13 9:55 UTC (permalink / raw)
To: Tejun Heo; +Cc: Avi Kivity, lkml, Ingo Molnar, Linus Torvalds
* Tejun Heo <tj@kernel.org> wrote:
> In finish_task_switch(), fire_sched_in_preempt_notifiers() is called
> after finish_lock_switch(). However, depending on architecture,
> preemption can be enabled after finish_lock_switch() which breaks the
> semantics of preempt notifiers. Move it before finish_arch_switch().
> This also makes in notifiers symmetric to out notifiers in terms of
> locking - now both are called under rq lock.
>
> NOT_SIGNED_OFF_YET
> Cc: Ingo Molnar <mingo@redhat.com>
> Cc: Avi Kivity <avi@redhat.com>
> ---
> Avi, I think kvm should be fine with this but I haven't tested it.
> Does this look okay to you? If so, can you please route this through
> kvm tree with my signoff?
I'd like to have Avi's Ack for it, but we want to do sched.c changes via
the scheduler tree.
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
2009-11-13 9:55 ` Ingo Molnar
@ 2009-11-14 13:06 ` Avi Kivity
2009-11-30 9:09 ` Tejun Heo
0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-11-14 13:06 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Tejun Heo, lkml, Ingo Molnar, Linus Torvalds
On 11/13/2009 11:55 AM, Ingo Molnar wrote:
> * Tejun Heo<tj@kernel.org> wrote:
>
>
>> In finish_task_switch(), fire_sched_in_preempt_notifiers() is called
>> after finish_lock_switch(). However, depending on architecture,
>> preemption can be enabled after finish_lock_switch() which breaks the
>> semantics of preempt notifiers. Move it before finish_arch_switch().
>> This also makes in notifiers symmetric to out notifiers in terms of
>> locking - now both are called under rq lock.
>>
>>
> I'd like to have Avi's Ack for it,
Acked-by: Avi Kivity <avi@redhat.com>
> but we want to do sched.c changes via
> the scheduler tree.
>
Definitely.
--
Do not meddle in the internals of kernels, for they are subtle and quick to panic.
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
2009-11-14 13:06 ` Avi Kivity
@ 2009-11-30 9:09 ` Tejun Heo
2009-11-30 9:45 ` Avi Kivity
0 siblings, 1 reply; 9+ messages in thread
From: Tejun Heo @ 2009-11-30 9:09 UTC (permalink / raw)
To: Avi Kivity; +Cc: Ingo Molnar, lkml, Ingo Molnar, Linus Torvalds
On 11/14/2009 10:06 PM, Avi Kivity wrote:
> On 11/13/2009 11:55 AM, Ingo Molnar wrote:
>> * Tejun Heo<tj@kernel.org> wrote:
>>> In finish_task_switch(), fire_sched_in_preempt_notifiers() is called
>>> after finish_lock_switch(). However, depending on architecture,
>>> preemption can be enabled after finish_lock_switch() which breaks the
>>> semantics of preempt notifiers. Move it before finish_arch_switch().
>>> This also makes in notifiers symmetric to out notifiers in terms of
>>> locking - now both are called under rq lock.
>>>
>>>
>> I'd like to have Avi's Ack for it,
>
> Acked-by: Avi Kivity <avi@redhat.com>
>
>> but we want to do sched.c changes via
>> the scheduler tree.
So, this one was a bust. Given that the only platform which might
have been affected by the original bug is ia64 and it's very unlikely
to happen. I think reverting this from sched/urgent would be the
right thing to do at this point if the branch is headed for another
push to Linus. Avi, what do you think?
Thanks.
--
tejun
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
2009-11-30 9:09 ` Tejun Heo
@ 2009-11-30 9:45 ` Avi Kivity
2009-11-30 10:00 ` Ingo Molnar
0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-11-30 9:45 UTC (permalink / raw)
To: Tejun Heo; +Cc: Ingo Molnar, lkml, Ingo Molnar, Linus Torvalds
On 11/30/2009 11:09 AM, Tejun Heo wrote:
>
>>
>>> but we want to do sched.c changes via
>>> the scheduler tree.
>>>
> So, this one was a bust. Given that the only platform which might
> have been affected by the original bug is ia64 and it's very unlikely
> to happen. I think reverting this from sched/urgent would be the
> right thing to do at this point if the branch is headed for another
> push to Linus. Avi, what do you think?
>
I agree.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
2009-11-30 9:45 ` Avi Kivity
@ 2009-11-30 10:00 ` Ingo Molnar
2009-11-30 10:11 ` Avi Kivity
0 siblings, 1 reply; 9+ messages in thread
From: Ingo Molnar @ 2009-11-30 10:00 UTC (permalink / raw)
To: Avi Kivity; +Cc: Tejun Heo, lkml, Ingo Molnar, Linus Torvalds
* Avi Kivity <avi@redhat.com> wrote:
> On 11/30/2009 11:09 AM, Tejun Heo wrote:
> >
> >>
> >>> but we want to do sched.c changes via
> >>>the scheduler tree.
> >So, this one was a bust. Given that the only platform which might
> >have been affected by the original bug is ia64 and it's very unlikely
> >to happen. I think reverting this from sched/urgent would be the
> >right thing to do at this point if the branch is headed for another
> >push to Linus. Avi, what do you think?
>
> I agree.
that sched/urgent commit is not queued up for 2.6.32 (Linus indicated
there wont be an -rc9), it spills over to the 2.6.33 merge window.
Ingo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
2009-11-30 10:00 ` Ingo Molnar
@ 2009-11-30 10:11 ` Avi Kivity
2009-11-30 11:41 ` [PATCH UPDATED " Tejun Heo
0 siblings, 1 reply; 9+ messages in thread
From: Avi Kivity @ 2009-11-30 10:11 UTC (permalink / raw)
To: Ingo Molnar; +Cc: Tejun Heo, lkml, Ingo Molnar, Linus Torvalds
On 11/30/2009 12:00 PM, Ingo Molnar wrote:
>
> that sched/urgent commit is not queued up for 2.6.32 (Linus indicated
> there wont be an -rc9), it spills over to the 2.6.33 merge window.
>
It shouldn't go into .33 either, until we figure out how to fix it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCH UPDATED 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers
2009-11-30 10:11 ` Avi Kivity
@ 2009-11-30 11:41 ` Tejun Heo
0 siblings, 0 replies; 9+ messages in thread
From: Tejun Heo @ 2009-11-30 11:41 UTC (permalink / raw)
To: Avi Kivity; +Cc: Ingo Molnar, lkml, Ingo Molnar, Linus Torvalds
498657a478c60be092208422fefa9c7b248729c2 incorrectly assumed that
preempt wasn't disabled around context_switch() and thus was fixing
imaginary problem. It also broke kvm because it depended on
->sched_in() to be called with irq enabled so that it can do smp calls
from there.
Revert the incorrect commit and add comment describing different
contexts under with the two callbacks are invoked.
Avi: spotted transposed in/out in the added comment.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Avi Kivity <avi@redhat.com>
---
Avi, thanks for spotting it.
include/linux/preempt.h | 5 +++++
kernel/sched.c | 2 +-
2 files changed, 6 insertions(+), 1 deletion(-)
diff --git a/include/linux/preempt.h b/include/linux/preempt.h
index 72b1a10..2e681d9 100644
--- a/include/linux/preempt.h
+++ b/include/linux/preempt.h
@@ -105,6 +105,11 @@ struct preempt_notifier;
* @sched_out: we've just been preempted
* notifier: struct preempt_notifier for the task being preempted
* next: the task that's kicking us out
+ *
+ * Please note that sched_in and out are called under different
+ * contexts. sched_out is called with rq lock held and irq disabled
+ * while sched_in is called without rq lock and irq enabled. This
+ * difference is intentional and depended upon by its users.
*/
struct preempt_ops {
void (*sched_in)(struct preempt_notifier *notifier, int cpu);
diff --git a/kernel/sched.c b/kernel/sched.c
index 3c91f11..e36c868 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2758,9 +2758,9 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
prev_state = prev->state;
finish_arch_switch(prev);
perf_event_task_sched_in(current, cpu_of(rq));
- fire_sched_in_preempt_notifiers(current);
finish_lock_switch(rq, prev);
+ fire_sched_in_preempt_notifiers(current);
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [tip:sched/urgent] sched, kvm: Fix race condition involving sched_in_preempt_notifers
2009-11-13 9:33 [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers Tejun Heo
2009-11-13 9:55 ` Ingo Molnar
@ 2009-11-15 10:29 ` tip-bot for Tejun Heo
1 sibling, 0 replies; 9+ messages in thread
From: tip-bot for Tejun Heo @ 2009-11-15 10:29 UTC (permalink / raw)
To: linux-tip-commits
Cc: linux-kernel, hpa, mingo, a.p.zijlstra, tj, tglx, avi, mingo
Commit-ID: 498657a478c60be092208422fefa9c7b248729c2
Gitweb: http://git.kernel.org/tip/498657a478c60be092208422fefa9c7b248729c2
Author: Tejun Heo <tj@kernel.org>
AuthorDate: Fri, 13 Nov 2009 18:33:53 +0900
Committer: Ingo Molnar <mingo@elte.hu>
CommitDate: Sun, 15 Nov 2009 09:59:54 +0100
sched, kvm: Fix race condition involving sched_in_preempt_notifers
In finish_task_switch(), fire_sched_in_preempt_notifiers() is
called after finish_lock_switch().
However, depending on architecture, preemption can be enabled after
finish_lock_switch() which breaks the semantics of preempt
notifiers.
So move it before finish_arch_switch(). This also makes the in-
notifiers symmetric to out- notifiers in terms of locking - now
both are called under rq lock.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Avi Kivity <avi@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4AFD2801.7020900@kernel.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
kernel/sched.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/kernel/sched.c b/kernel/sched.c
index 701eca4..cea2bea 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -2758,9 +2758,9 @@ static void finish_task_switch(struct rq *rq, struct task_struct *prev)
prev_state = prev->state;
finish_arch_switch(prev);
perf_event_task_sched_in(current, cpu_of(rq));
+ fire_sched_in_preempt_notifiers(current);
finish_lock_switch(rq, prev);
- fire_sched_in_preempt_notifiers(current);
if (mm)
mmdrop(mm);
if (unlikely(prev_state == TASK_DEAD)) {
^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2009-11-30 11:42 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-11-13 9:33 [PATCH 2.6.32-rc6] sched, kvm: fix race condition involving sched_in_preempt_notifers Tejun Heo
2009-11-13 9:55 ` Ingo Molnar
2009-11-14 13:06 ` Avi Kivity
2009-11-30 9:09 ` Tejun Heo
2009-11-30 9:45 ` Avi Kivity
2009-11-30 10:00 ` Ingo Molnar
2009-11-30 10:11 ` Avi Kivity
2009-11-30 11:41 ` [PATCH UPDATED " Tejun Heo
2009-11-15 10:29 ` [tip:sched/urgent] sched, kvm: Fix " tip-bot for Tejun Heo
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox