From: Con Kolivas <kernel@kolivas.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Kevin Winchester <kjwinchester@gmail.com>,
Mike Galbraith <efault@gmx.de>, Ingo Molnar <mingo@elte.hu>,
LKML <linux-kernel@vger.kernel.org>,
"Rafael J. Wysocki" <rjw@sisk.pl>,
Steven Rostedt <rostedt@goodmis.org>,
Andrew Morton <akpm@linux-foundation.org>,
"Paul E. McKenney <paulmck"@linux.vnet.i
Subject: Re: Intermittent early panic in try_to_wake_up
Date: Sun, 8 Nov 2009 19:29:53 +1100 [thread overview]
Message-ID: <200911081929.53418.kernel@kolivas.org> (raw)
In-Reply-To: <1257611754.4108.12.camel@laptop>
On Sun, 8 Nov 2009 03:35:54 Peter Zijlstra wrote:
> On Sat, 2009-11-07 at 12:24 -0400, Kevin Winchester wrote:
> > Mike Galbraith wrote:
> > > On Fri, 2009-11-06 at 19:49 -0400, Kevin Winchester wrote:
> > >> The patch below does not apply to mainline, unless I'm doing something
> > >> wrong. It's against -tip, I assume? Is it just as applicable to
> > >> mainline?
> > >
> > > It was mainline, but I had the scheduler pull request and another in
> > > for testing as well. Linus has pulled, so it'll apply now, with
> > > offsets.
> >
> > It did end up applying, but did not have any effect. Looking at the
> > patch again, I see that it appears to only affect CONFIG_SMP, which I am
> > not running (and in fact it adds a build warning for the !SMP case). So
> > there was not much chance of it fixing anything, I suppose.
> >
> > Any other ideas? I don't have a serial console, and the trace scrolls
> > off my console, so I don't know if any debug printks would help. Would
> > it help if I copied the entire panic message entirely, including the Code
> > section? I can try that the next time it happens.
>
> Use vga=ask boot_delay=100 select the highest res possible.
>
> Possibly you could use a digital (video) camera to record the output.
>
For what it's worth I've seen this on BFS and assumed it was a bfs issue until
I spotted this thread so I'll tell you what I discovered when I was
investigating it, but unfortunately I did not find the root cause.
Incredibly the bug happened in try_to_wake_up where the task struct that was
in the call function (p) gets dereferenced before the rq lock is grabbed. Then
when the rq lock is attempted to be grabbed it has no p to reference.
Further investigation showed it to always be ksoftirqd spawning on bootup only
and never in any other situation. The factors that were common was that there
would always be a conditional resched that occurred and that's how it would
get lost. I tried stepping through the boot process on kvm but always came up
stumped as to how on earth it even happened. The only common variable was that
it -only- ever happened with voluntary preempt enabled, and not with full
preempt or no-preempt. cond_resched is called 2 or 3 times during the boot
sequence via might_sleep by that stage, but if I removed each might_sleep one
at a time it would just happen from a different might_sleep, suggesting we
weren't sleeping when we shouldn't. Since I'm anti-fan of voluntary preempt, I
gave up trying to find the root cause and put this nonsense workaround in
__cond_resched :
static void __cond_resched(void)
{
if (unlikely(system_state != SYSTEM_RUNNING))
return;
And it's still there in BFS, but it fixes the problem, in case someone wanted
to use voluntary with bfs. I've long since lost the config that caused the
problem reliably and can't guarantee that it's the same thing happening on
mainline, but figured the information might be helpful.
Regards,
--
-ck
prev parent reply other threads:[~2009-11-08 8:29 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-10-22 23:33 Intermittent early panic in try_to_wake_up Kevin Winchester
2009-10-23 9:23 ` Mike Galbraith
2009-11-05 23:44 ` Kevin Winchester
2009-11-06 5:34 ` Mike Galbraith
2009-11-06 23:49 ` Kevin Winchester
2009-11-07 4:45 ` Mike Galbraith
2009-11-07 16:24 ` Kevin Winchester
2009-11-07 16:35 ` Peter Zijlstra
2009-11-07 19:01 ` Rafael J. Wysocki
2009-11-08 17:28 ` Pavel Machek
2009-11-08 18:44 ` Rafael J. Wysocki
2009-11-08 8:29 ` Con Kolivas [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=200911081929.53418.kernel@kolivas.org \
--to=kernel@kolivas.org \
--cc="Paul E. McKenney <paulmck"@linux.vnet.i \
--cc=akpm@linux-foundation.org \
--cc=efault@gmx.de \
--cc=kjwinchester@gmail.com \
--cc=linux-kernel@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=rjw@sisk.pl \
--cc=rostedt@goodmis.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox