public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Con Kolivas <kernel@kolivas.org>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Kevin Winchester <kjwinchester@gmail.com>,
	Mike Galbraith <efault@gmx.de>, Ingo Molnar <mingo@elte.hu>,
	LKML <linux-kernel@vger.kernel.org>,
	"Rafael J. Wysocki" <rjw@sisk.pl>,
	Steven Rostedt <rostedt@goodmis.org>,
	Andrew Morton <akpm@linux-foundation.org>,
	"Paul E. McKenney <paulmck"@linux.vnet.i
Subject: Re: Intermittent early panic in try_to_wake_up
Date: Sun, 8 Nov 2009 19:29:53 +1100	[thread overview]
Message-ID: <200911081929.53418.kernel@kolivas.org> (raw)
In-Reply-To: <1257611754.4108.12.camel@laptop>

On Sun, 8 Nov 2009 03:35:54 Peter Zijlstra wrote:
> On Sat, 2009-11-07 at 12:24 -0400, Kevin Winchester wrote:
> > Mike Galbraith wrote:
> > > On Fri, 2009-11-06 at 19:49 -0400, Kevin Winchester wrote:
> > >> The patch below does not apply to mainline, unless I'm doing something
> > >> wrong. It's against -tip, I assume?  Is it just as applicable to
> > >> mainline?
> > >
> > > It was mainline, but I had the scheduler pull request and another in
> > > for testing as well.  Linus has pulled, so it'll apply now, with
> > > offsets.
> >
> > It did end up applying, but did not have any effect.  Looking at the
> > patch again, I see that it appears to only affect CONFIG_SMP, which I am
> > not running (and in fact it adds a build warning for the !SMP case).  So
> > there was not much chance of it fixing anything, I suppose.
> >
> > Any other ideas?  I don't have a serial console, and the trace scrolls
> > off my console, so I don't know if any debug printks would help.  Would
> > it help if I copied the entire panic message entirely, including the Code
> > section? I can try that the next time it happens.
> 
> Use vga=ask boot_delay=100 select the highest res possible.
> 
> Possibly you could use a digital (video) camera to record the output.
> 

For what it's worth I've seen this on BFS and assumed it was a bfs issue until 
I spotted this thread so I'll tell you what I discovered when I was 
investigating it, but unfortunately I did not find the root cause.

Incredibly the bug happened in try_to_wake_up where the task struct that was 
in the call function (p) gets dereferenced before the rq lock is grabbed. Then 
when the rq lock is attempted to be grabbed it has no p to reference.

Further investigation showed it to always be ksoftirqd spawning on bootup only 
and never in any other situation. The factors that were common was that there 
would always be a conditional resched that occurred and that's how it would 
get lost. I tried stepping through the boot process on kvm but always came up 
stumped as to how on earth it even happened. The only common variable was that 
it -only- ever happened with voluntary preempt enabled, and not with full 
preempt or no-preempt. cond_resched is called 2 or 3 times during the boot 
sequence via might_sleep by that stage, but if I removed each might_sleep one 
at a time it would just happen from a different might_sleep, suggesting we 
weren't sleeping when we shouldn't. Since I'm anti-fan of voluntary preempt, I 
gave up trying to find the root cause and put this nonsense workaround in 
__cond_resched :

static void __cond_resched(void)
{
	if (unlikely(system_state != SYSTEM_RUNNING))
		return;

And it's still there in BFS, but it fixes the problem, in case someone wanted 
to use voluntary with bfs. I've long since lost the config that caused the 
problem reliably and can't guarantee that it's the same thing happening on 
mainline, but figured the information might be helpful.

Regards,
-- 
-ck

      parent reply	other threads:[~2009-11-08  8:29 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-10-22 23:33 Intermittent early panic in try_to_wake_up Kevin Winchester
2009-10-23  9:23 ` Mike Galbraith
2009-11-05 23:44   ` Kevin Winchester
2009-11-06  5:34     ` Mike Galbraith
2009-11-06 23:49       ` Kevin Winchester
2009-11-07  4:45         ` Mike Galbraith
2009-11-07 16:24           ` Kevin Winchester
2009-11-07 16:35             ` Peter Zijlstra
2009-11-07 19:01               ` Rafael J. Wysocki
2009-11-08 17:28                 ` Pavel Machek
2009-11-08 18:44                   ` Rafael J. Wysocki
2009-11-08  8:29               ` Con Kolivas [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=200911081929.53418.kernel@kolivas.org \
    --to=kernel@kolivas.org \
    --cc="Paul E. McKenney <paulmck"@linux.vnet.i \
    --cc=akpm@linux-foundation.org \
    --cc=efault@gmx.de \
    --cc=kjwinchester@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=rjw@sisk.pl \
    --cc=rostedt@goodmis.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox