From: peterz@infradead.org (Peter Zijlstra)
To: linux-arm-kernel@lists.infradead.org
Subject: [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM
Date: Thu, 26 May 2011 14:21:51 +0200 [thread overview]
Message-ID: <1306412511.1200.90.camel@twins> (raw)
In-Reply-To: <1306409575.1200.71.camel@twins>
On Thu, 2011-05-26 at 13:32 +0200, Peter Zijlstra wrote:
>
> The bad news is of course that I've got a little more head-scratching to
> do, will keep you informed.
OK, that wasn't too hard.. (/me crosses fingers and prays Marc doesn't
find more funnies ;-).
Does the below cure all woes?
---
Subject: sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu May 26 14:21:33 CEST 2011
Marc reported that e4a52bcb9 (sched: Remove rq->lock from the first
half of ttwu()) broke his ARM-SMP machine. Now ARM is one of the few
__ARCH_WANT_INTERRUPTS_ON_CTXSW users, so that exception in the ttwu()
code was suspect.
Yong found that the interrupt could hit hits after context_switch() changes
current but before it clears p->on_cpu, if that interrupt were to
attempt a wake-up of p we would indeed find ourselves spinning in IRQ
context.
Sort this by reverting to the old behaviour for this situation and
perform a full remote wake-up.
Cc: Frank Rowand <frank.rowand@am.sony.com>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: Marc Zyngier <Marc.Zyngier@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2573,7 +2573,26 @@ static void ttwu_queue_remote(struct tas
if (!next)
smp_send_reschedule(cpu);
}
-#endif
+
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+static int ttwu_activate_remote(struct task_struct *p, int wake_flags)
+{
+ struct rq *rq;
+ int ret = 0;
+
+ rq = __task_rq_lock(p);
+ if (p->on_cpu) {
+ ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+ ttwu_do_wakeup(rq, p, wake_flags);
+ ret = 1;
+ }
+ __task_rq_unlock(rq);
+
+ return ret;
+
+}
+#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
+#endif /* CONFIG_SMP */
static void ttwu_queue(struct task_struct *p, int cpu)
{
@@ -2631,17 +2650,17 @@ try_to_wake_up(struct task_struct *p, un
while (p->on_cpu) {
#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
/*
- * If called from interrupt context we could have landed in the
- * middle of schedule(), in this case we should take care not
- * to spin on ->on_cpu if p is current, since that would
- * deadlock.
+ * In case the architecture enables interrupts in
+ * context_switch(), we cannot busy wait, since that
+ * would lead to live-locks when an interrupt hits and
+ * tries to wake up @prev. So bail and do a complete
+ * remote wakeup.
*/
- if (p == current) {
- ttwu_queue(p, cpu);
+ if (ttwu_activate_remote(p, wake_flags))
goto stat;
- }
-#endif
+#else
cpu_relax();
+#endif
}
/*
* Pairs with the smp_wmb() in finish_lock_switch().
WARNING: multiple messages have this Message-ID (diff)
From: Peter Zijlstra <peterz@infradead.org>
To: Marc Zyngier <Marc.Zyngier@arm.com>
Cc: Yong Zhang <yong.zhang0@gmail.com>, Ingo Molnar <mingo@elte.hu>,
Frank Rowand <frank.rowand@am.sony.com>,
linux-kernel@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM
Date: Thu, 26 May 2011 14:21:51 +0200 [thread overview]
Message-ID: <1306412511.1200.90.camel@twins> (raw)
In-Reply-To: <1306409575.1200.71.camel@twins>
On Thu, 2011-05-26 at 13:32 +0200, Peter Zijlstra wrote:
>
> The bad news is of course that I've got a little more head-scratching to
> do, will keep you informed.
OK, that wasn't too hard.. (/me crosses fingers and prays Marc doesn't
find more funnies ;-).
Does the below cure all woes?
---
Subject: sched: Fix ttwu() for __ARCH_WANT_INTERRUPTS_ON_CTXSW
From: Peter Zijlstra <a.p.zijlstra@chello.nl>
Date: Thu May 26 14:21:33 CEST 2011
Marc reported that e4a52bcb9 (sched: Remove rq->lock from the first
half of ttwu()) broke his ARM-SMP machine. Now ARM is one of the few
__ARCH_WANT_INTERRUPTS_ON_CTXSW users, so that exception in the ttwu()
code was suspect.
Yong found that the interrupt could hit hits after context_switch() changes
current but before it clears p->on_cpu, if that interrupt were to
attempt a wake-up of p we would indeed find ourselves spinning in IRQ
context.
Sort this by reverting to the old behaviour for this situation and
perform a full remote wake-up.
Cc: Frank Rowand <frank.rowand@am.sony.com>
Cc: Yong Zhang <yong.zhang0@gmail.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Reported-by: Marc Zyngier <Marc.Zyngier@arm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
---
kernel/sched.c | 37 ++++++++++++++++++++++++++++---------
1 file changed, 28 insertions(+), 9 deletions(-)
Index: linux-2.6/kernel/sched.c
===================================================================
--- linux-2.6.orig/kernel/sched.c
+++ linux-2.6/kernel/sched.c
@@ -2573,7 +2573,26 @@ static void ttwu_queue_remote(struct tas
if (!next)
smp_send_reschedule(cpu);
}
-#endif
+
+#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
+static int ttwu_activate_remote(struct task_struct *p, int wake_flags)
+{
+ struct rq *rq;
+ int ret = 0;
+
+ rq = __task_rq_lock(p);
+ if (p->on_cpu) {
+ ttwu_activate(rq, p, ENQUEUE_WAKEUP);
+ ttwu_do_wakeup(rq, p, wake_flags);
+ ret = 1;
+ }
+ __task_rq_unlock(rq);
+
+ return ret;
+
+}
+#endif /* __ARCH_WANT_INTERRUPTS_ON_CTXSW */
+#endif /* CONFIG_SMP */
static void ttwu_queue(struct task_struct *p, int cpu)
{
@@ -2631,17 +2650,17 @@ try_to_wake_up(struct task_struct *p, un
while (p->on_cpu) {
#ifdef __ARCH_WANT_INTERRUPTS_ON_CTXSW
/*
- * If called from interrupt context we could have landed in the
- * middle of schedule(), in this case we should take care not
- * to spin on ->on_cpu if p is current, since that would
- * deadlock.
+ * In case the architecture enables interrupts in
+ * context_switch(), we cannot busy wait, since that
+ * would lead to live-locks when an interrupt hits and
+ * tries to wake up @prev. So bail and do a complete
+ * remote wakeup.
*/
- if (p == current) {
- ttwu_queue(p, cpu);
+ if (ttwu_activate_remote(p, wake_flags))
goto stat;
- }
-#endif
+#else
cpu_relax();
+#endif
}
/*
* Pairs with the smp_wmb() in finish_lock_switch().
next prev parent reply other threads:[~2011-05-26 12:21 UTC|newest]
Thread overview: 102+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-05-24 18:13 [BUG] "sched: Remove rq->lock from the first half of ttwu()" locks up on ARM Marc Zyngier
2011-05-24 18:13 ` Marc Zyngier
2011-05-24 21:32 ` Peter Zijlstra
2011-05-24 21:32 ` Peter Zijlstra
2011-05-24 21:39 ` Ingo Molnar
2011-05-24 21:39 ` Ingo Molnar
2011-05-25 12:23 ` Marc Zyngier
2011-05-25 12:23 ` Marc Zyngier
2011-05-25 17:08 ` Peter Zijlstra
2011-05-25 17:08 ` Peter Zijlstra
2011-05-25 21:15 ` Peter Zijlstra
2011-05-25 21:15 ` Peter Zijlstra
2011-05-26 7:29 ` Yong Zhang
2011-05-26 7:29 ` Yong Zhang
2011-05-26 10:32 ` Peter Zijlstra
2011-05-26 10:32 ` Peter Zijlstra
2011-05-26 11:02 ` Marc Zyngier
2011-05-26 11:02 ` Marc Zyngier
2011-05-26 11:32 ` Peter Zijlstra
2011-05-26 11:32 ` Peter Zijlstra
2011-05-26 12:21 ` Peter Zijlstra [this message]
2011-05-26 12:21 ` Peter Zijlstra
2011-05-26 12:26 ` Ingo Molnar
2011-05-26 12:26 ` Ingo Molnar
2011-05-26 12:31 ` Russell King - ARM Linux
2011-05-26 12:31 ` Russell King - ARM Linux
2011-05-26 12:37 ` Peter Zijlstra
2011-05-26 12:37 ` Peter Zijlstra
2011-05-26 12:50 ` Ingo Molnar
2011-05-26 12:50 ` Ingo Molnar
2011-05-26 13:36 ` Russell King - ARM Linux
2011-05-26 13:36 ` Russell King - ARM Linux
2011-05-26 14:45 ` Catalin Marinas
2011-05-26 14:45 ` Catalin Marinas
2011-05-27 12:06 ` Ingo Molnar
2011-05-27 12:06 ` Ingo Molnar
2011-05-27 17:55 ` Russell King - ARM Linux
2011-05-27 17:55 ` Russell King - ARM Linux
2011-05-27 19:41 ` Nicolas Pitre
2011-05-27 19:41 ` Nicolas Pitre
2011-05-27 20:52 ` Russell King - ARM Linux
2011-05-27 20:52 ` Russell King - ARM Linux
2011-05-28 13:13 ` Peter Zijlstra
2011-05-28 13:13 ` Peter Zijlstra
2011-05-31 11:08 ` Michal Simek
2011-05-31 11:08 ` Michal Simek
2011-05-31 13:22 ` Peter Zijlstra
2011-05-31 13:22 ` Peter Zijlstra
2011-05-31 13:37 ` Michal Simek
2011-05-31 13:37 ` Michal Simek
2011-05-31 13:52 ` Peter Zijlstra
2011-05-31 13:52 ` Peter Zijlstra
2011-05-31 14:08 ` Michal Simek
2011-05-31 14:08 ` Michal Simek
2011-05-31 14:29 ` Peter Zijlstra
2011-05-31 14:29 ` Peter Zijlstra
2011-05-29 10:21 ` Catalin Marinas
2011-05-29 10:21 ` Catalin Marinas
2011-05-29 10:26 ` Russell King - ARM Linux
2011-05-29 10:26 ` Russell King - ARM Linux
2011-05-29 12:01 ` Catalin Marinas
2011-05-29 12:01 ` Catalin Marinas
2011-05-29 13:19 ` Russell King - ARM Linux
2011-05-29 13:19 ` Russell King - ARM Linux
2011-05-29 21:21 ` Catalin Marinas
2011-05-29 21:21 ` Catalin Marinas
2011-05-29 9:51 ` Catalin Marinas
2011-05-29 9:51 ` Catalin Marinas
2011-06-06 10:29 ` Pavel Machek
2011-06-06 10:29 ` Pavel Machek
2011-05-26 14:56 ` Marc Zyngier
2011-05-26 14:56 ` Marc Zyngier
2011-05-26 15:45 ` Oleg Nesterov
2011-05-26 15:45 ` Oleg Nesterov
2011-05-26 15:59 ` Peter Zijlstra
2011-05-26 15:59 ` Peter Zijlstra
2011-05-26 16:09 ` Peter Zijlstra
2011-05-26 16:09 ` Peter Zijlstra
2011-05-26 16:20 ` Marc Zyngier
2011-05-26 16:20 ` Marc Zyngier
2011-05-26 16:32 ` Peter Zijlstra
2011-05-26 16:32 ` Peter Zijlstra
2011-05-27 8:01 ` Marc Zyngier
2011-05-27 8:01 ` Marc Zyngier
2011-05-26 16:22 ` Marc Zyngier
2011-05-26 16:22 ` Marc Zyngier
2011-05-26 17:04 ` Oleg Nesterov
2011-05-26 17:04 ` Oleg Nesterov
2011-05-26 17:17 ` Peter Zijlstra
2011-05-26 17:17 ` Peter Zijlstra
2011-05-26 17:23 ` Peter Zijlstra
2011-05-26 17:23 ` Peter Zijlstra
2011-05-26 17:49 ` Oleg Nesterov
2011-05-26 17:49 ` Oleg Nesterov
2011-05-27 7:01 ` Yong Zhang
2011-05-27 7:01 ` Yong Zhang
2011-05-27 15:23 ` Santosh Shilimkar
2011-05-27 15:23 ` Santosh Shilimkar
2011-05-27 15:29 ` Marc Zyngier
2011-05-27 15:29 ` Marc Zyngier
2011-05-27 15:30 ` Santosh Shilimkar
2011-05-27 15:30 ` Santosh Shilimkar
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1306412511.1200.90.camel@twins \
--to=peterz@infradead.org \
--cc=linux-arm-kernel@lists.infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.