Re: Intermittent early panic in try_to_wake_up

public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed

From: Mike Galbraith <efault@gmx.de>
To: Kevin Winchester <kjwinchester@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>,
	Peter Zijlstra <a.p.zijlstra@chello.nl>,
	LKML <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	Yinghai Lu <yinghai@kernel.org>
Subject: Re: Intermittent early panic in try_to_wake_up
Date: Fri, 06 Nov 2009 06:34:10 +0100	[thread overview]
Message-ID: <1257485650.6233.17.camel@marge.simson.net> (raw)
In-Reply-To: <4AF36364.9090004@gmail.com>

Hi Kevin,

On Thu, 2009-11-05 at 19:44 -0400, Kevin Winchester wrote:
> Mike Galbraith wrote:
> > On Thu, 2009-10-22 at 20:33 -0300, Kevin Winchester wrote:
> >> Hi,
> >>
> >> A week or two ago I saw a panic on boot in try_to_wake_up, but it was not
> >> reproducible and I had not written down any trace information.  This
> >> evening I saw it twice more, but then on the third boot things worked fine.
> >> This time I copied down the stack trace:
> >>
> >> try_to_wake_up+0x2e/0x102
> >> wake_up_process+0x10/0x12
> >> kthread_create+0x88/0x12c
> >> ?ksoftirqd+0x00/0xb7
> >> cpu_callback+0x42/0x8f
> >> ?spawn_ksoftirqd+0x0/0x39
> >> spawn_ksoftirqd+0x17/0x39
> >> do_one_initcall+0x58/0x147
> >>
> >> The first time it happened, I remember checking the git logs and it was
> >> shortly after:
> >>
> >> commit f5dc37530ba8a35aae0f7f4f13781d1904f71e94
> >> Author: Mike Galbraith <efault@gmx.de>
> >> Date:   Fri Oct 9 08:35:03 2009 +0200
> >>
> >>     sched: Update the clock of runqueue select_task_rq() selected
> >>     
> >>     In try_to_wake_up(), we update the runqueue clock, but
> >>     select_task_rq() may select a different runqueue than the one we
> >>     updated, leaving the new runqueue's clock stale for a bit.
> >>     
> >>     This patch cures occasional huge latencies reported by latencytop
> >>     when coming out of idle on a mostly idle NO_HZ box.
> >>     
> >>     Signed-off-by: Mike Galbraith <efault@gmx.de>
> >>     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >>     LKML-Reference: <1255070103.7639.30.camel@marge.simson.net>
> >>     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> >>
> >>
> >> ...so perhaps that has something to do with it.
> > 
> > I don't think that's very likely.  Box did explode near my grubby
> > fingerprints though.
> > 
> >> Config below.  Any help would be appreciated.
> > 
> > Building with your config, try_to_wake_up+0x2e is around..
> > 
> > (gdb) list *try_to_wake_up+0x2e
> > 0xffffffff81029107 is in try_to_wake_up (kernel/sched.c:2324).
> > 2319            this_cpu = get_cpu();
> > 2320
> > 2321            smp_wmb();
> > 2322            rq = orig_rq = task_rq_lock(p, &flags);
> > 2323            update_rq_clock(rq);
> > 2324            if (!(p->state & state))
> > 2325                    goto out;
> > 2326
> > 2327            if (p->se.on_rq)
> > 2328                    goto out_running;
> > 
> > I don't see how any of that can explode without something very bad
> > having happened to ksoftirqd before we tried to wake it.
> > 
> 
> I thought this problem had solved itself, but I've hit it again three times
> in the last few days.
> 
> I've expanded the CC list a little (based on get_maintainer for
> kernel/softirq.c, since that seems to be involved somehow, and Rafael since
> this definitely seems to be a regression), to see if anyone else has any
> ideas.

I may have found the bad thing that could have happened to ksoftirqd.

If you feel like testing, try the below.  We were altering the task
struct outside of locks, which is not interrupt etc safe.  It cures a
problem I ran into, and will hopefully cure yours as well.


sched: fix runqueue locking buglet.

Calling set_task_cpu() with the runqueue unlocked is unsafe.  Add cpu_rq_lock()
locking primitive, and lock the runqueue.  Also, update rq->clock before calling
set_task_cpu(), as it could be stale.

Running netperf UDP_STREAM with two pinned tasks with tip 1b9508f applied emitted
the thoroughly unbelievable result that ratelimiting newidle could produce twice
the throughput of the virgin kernel.  Reverting to locking the runqueue prior to
runqueue selection restored benchmarking sanity, as did this patchlet.

Signed-off-by: Mike Galbraith <efault@gmx.de>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <new-submission>

---
 kernel/sched.c |   32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

Index: linux-2.6.32.git/kernel/sched.c
===================================================================
--- linux-2.6.32.git.orig/kernel/sched.c
+++ linux-2.6.32.git/kernel/sched.c
@@ -1011,6 +1011,24 @@ static struct rq *this_rq_lock(void)
 	return rq;
 }
 
+/*
+ * cpu_rq_lock - lock the runqueue a given cpu and disable interrupts.
+ */
+static struct rq *cpu_rq_lock(int cpu, unsigned long *flags)
+	__acquires(rq->lock)
+{
+	struct rq *rq = cpu_rq(cpu);
+
+	spin_lock_irqsave(&rq->lock, *flags);
+	return rq;
+}
+
+static inline void cpu_rq_unlock(struct rq *rq, unsigned long *flags)
+	__releases(rq->lock)
+{
+	spin_unlock_irqrestore(&rq->lock, *flags);
+}
+
 #ifdef CONFIG_SCHED_HRTICK
 /*
  * Use HR-timers to deliver accurate preemption points.
@@ -2342,16 +2360,17 @@ static int try_to_wake_up(struct task_st
 	if (task_contributes_to_load(p))
 		rq->nr_uninterruptible--;
 	p->state = TASK_WAKING;
+	preempt_disable();
 	task_rq_unlock(rq, &flags);
 
 	cpu = p->sched_class->select_task_rq(p, SD_BALANCE_WAKE, wake_flags);
-	if (cpu != orig_cpu)
-		set_task_cpu(p, cpu);
-
-	rq = task_rq_lock(p, &flags);
-
-	if (rq != orig_rq)
+	if (cpu != orig_cpu) {
+		rq = cpu_rq_lock(cpu, &flags);
 		update_rq_clock(rq);
+		set_task_cpu(p, cpu);
+	} else
+		rq = task_rq_lock(p, &flags);
+	preempt_enable_no_resched();
 
 	if (rq->idle_stamp) {
 		u64 delta = rq->clock - rq->idle_stamp;
@@ -2365,7 +2384,6 @@ static int try_to_wake_up(struct task_st
 	}
 
 	WARN_ON(p->state != TASK_WAKING);
-	cpu = task_cpu(p);
 
 #ifdef CONFIG_SCHEDSTATS
 	schedstat_inc(rq, ttwu_count);

next prev parent reply	other threads:[~2009-11-06  5:34 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-22 23:33 Intermittent early panic in try_to_wake_up Kevin Winchester
2009-10-23  9:23 ` Mike Galbraith
2009-11-05 23:44   ` Kevin Winchester
2009-11-06  5:34     ` Mike Galbraith [this message]
2009-11-06 23:49       ` Kevin Winchester
2009-11-07  4:45         ` Mike Galbraith
2009-11-07 16:24           ` Kevin Winchester
2009-11-07 16:35             ` Peter Zijlstra
2009-11-07 19:01               ` Rafael J. Wysocki
2009-11-08 17:28                 ` Pavel Machek
2009-11-08 18:44                   ` Rafael J. Wysocki
2009-11-08  8:29               ` Con Kolivas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1257485650.6233.17.camel@marge.simson.net \
    --to=efault@gmx.de \
    --cc=a.p.zijlstra@chello.nl \
    --cc=akpm@linux-foundation.org \
    --cc=kjwinchester@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=rjw@sisk.pl \
    --cc=rostedt@goodmis.org \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox