public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Jan Kara <jack@suse.cz>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Jan Kara <jack@suse.cz>, Peter Hurley <peter@hurleysoftware.com>,
	pmladek@suse.cz, Andrew Morton <akpm@linux-foundation.org>,
	Jet Chen <jet.chen@intel.com>,
	LKML <linux-kernel@vger.kernel.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: console: lockup on boot
Date: Thu, 12 Jun 2014 10:26:45 +0200	[thread overview]
Message-ID: <20140612082645.GF9511@quack.suse.cz> (raw)
In-Reply-To: <53991958.4070106@oracle.com>

On Wed 11-06-14 23:07:04, Sasha Levin wrote:
> On 06/11/2014 05:31 PM, Jan Kara wrote:
> > On Wed 11-06-14 22:34:36, Jan Kara wrote:
> >> > On Wed 11-06-14 10:55:55, Sasha Levin wrote:
> >>> > > On 06/10/2014 11:59 AM, Peter Hurley wrote:
> >>>> > > > On 06/06/2014 03:05 PM, Sasha Levin wrote:
> >>>>> > > >> On 05/30/2014 10:07 AM, Jan Kara wrote:
> >>>>>> > > >>> On Fri 30-05-14 09:58:14, Peter Hurley wrote:
> >>>>>>>> > > >>>>> On 05/30/2014 09:11 AM, Sasha Levin wrote:
> >>>>>>>>>> > > >>>>>>> Hi all,
> >>>>>>>>>> > > >>>>>>>
> >>>>>>>>>> > > >>>>>>> I sometime see lockups when booting my KVM guest with the latest -next kernel,
> >>>>>>>>>> > > >>>>>>> it basically hangs right when it should start 'init', and after a while I get
> >>>>>>>>>> > > >>>>>>> the following spew:
> >>>>>>>>>> > > >>>>>>>
> >>>>>>>>>> > > >>>>>>> [   30.790833] BUG: spinlock lockup suspected on CPU#1, swapper/1/0
> >>>>>>>> > > >>>>>
> >>>>>>>> > > >>>>> Maybe related to this report: https://lkml.org/lkml/2014/5/30/26
> >>>>>>>> > > >>>>> from Jet Chen which was bisected to
> >>>>>>>> > > >>>>>
> >>>>>>>> > > >>>>> commit bafe980f5afc7ccc693fd8c81c8aa5a02fbb5ae0
> >>>>>>>> > > >>>>> Author:     Jan Kara <jack@suse.cz>
> >>>>>>>> > > >>>>> AuthorDate: Thu May 22 10:43:35 2014 +1000
> >>>>>>>> > > >>>>> Commit:     Stephen Rothwell <sfr@canb.auug.org.au>
> >>>>>>>> > > >>>>> CommitDate: Thu May 22 10:43:35 2014 +1000
> >>>>>>>> > > >>>>>
> >>>>>>>> > > >>>>>      printk: enable interrupts before calling console_trylock_for_printk()
> >>>>>>>> > > >>>>>          We need interrupts disabled when calling console_trylock_for_printk() only
> >>>>>>>> > > >>>>>      so that cpu id we pass to can_use_console() remains valid (for other
> >>>>>>>> > > >>>>>      things console_sem provides all the exclusion we need and deadlocks on
> >>>>>>>> > > >>>>>      console_sem due to interrupts are impossible because we use
> >>>>>>>> > > >>>>>      down_trylock()).  However if we are rescheduled, we are guaranteed to run
> >>>>>>>> > > >>>>>      on an online cpu so we can easily just get the cpu id in
> >>>>>>>> > > >>>>>      can_use_console().
> >>>>>>>> > > >>>>>          We can lose a bit of performance when we enable interrupts in
> >>>>>>>> > > >>>>>      vprintk_emit() and then disable them again in console_unlock() but OTOH it
> >>>>>>>> > > >>>>>      can somewhat reduce interrupt latency caused by console_unlock()
> >>>>>>>> > > >>>>>      especially since later in the patch series we will want to spin on
> >>>>>>>> > > >>>>>      console_sem in console_trylock_for_printk().
> >>>>>>>> > > >>>>>          Signed-off-by: Jan Kara <jack@suse.cz>
> >>>>>>>> > > >>>>>      Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> >>>>>>>> > > >>>>>
> >>>>>>>> > > >>>>> ?
> >>>>>> > > >>>    Yeah, very likely. I think I see the problem, I'll send the fix shortly.
> >>>>> > > >>
> >>>>> > > >> Hi Jan,
> >>>>> > > >>
> >>>>> > > >> It seems that the issue I'm seeing is different from the "[prink]  BUG: spinlock
> >>>>> > > >> lockup suspected on CPU#0, swapper/1".
> >>>>> > > >>
> >>>>> > > >> Is there anything else I could try here? The issue is very common during testing.
> >>>> > > > 
> >>>> > > > Sasha,
> >>>> > > > 
> >>>> > > > Is this bisectable? Maybe that's the best way forward here.
> >>> > > 
> >>> > > I've ran a bisection again and ended up at the same commit as Jet Chen
> >>> > > (the commit unfortunately already made it to Linus's tree).
> >>> > > 
> >>> > > Note that I did try Jan's proposed fix and that didn't solve the issue
> >>> > > for me, I believe we're seeing different issues caused by the same
> >>> > > commit.
> >> >   Sorry it has been busy time lately and I didn't have as much time to look
> >> > into this as would be needed.
> >   Oops, pressed send too early... So I have two debug patches for you. Can
> > you try whether the problem reproduces with the first one or with both of
> > them applied?
> 
> The first patch fixed it (I assumed that there's no need to try the second).
  Good. So that shows that it is the increased lockdep coverage which is
causing problems - with my patch, lockdep is able to identify some problem
because console drivers are now called with lockdep enabled. But because
the problem was found in some difficult context, lockdep just hung the
machine when trying to report it... Sadly the stacktraces you posted don't
tell us what lockdep found.

Adding Peter Zijlstra to CC. Peter, any idea how lockdep could report
problems when holding logbuf_lock? One possibility would be to extend
logbuf_cpu recursion logic to every holder of logbuf_lock. That will at
least avoid the spinlock recursion killing the machine but we won't be able
to see what lockdep found...

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

  reply	other threads:[~2014-06-12  8:26 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-05-30 13:11 console: lockup on boot Sasha Levin
2014-05-30 13:58 ` Peter Hurley
2014-05-30 14:07   ` Jan Kara
2014-06-06 19:05     ` Sasha Levin
2014-06-10 15:59       ` Peter Hurley
2014-06-11 14:55         ` Sasha Levin
2014-06-11 15:34           ` Peter Hurley
2014-06-11 20:31             ` Jan Kara
2014-06-11 17:38           ` Linus Torvalds
2014-06-11 17:44             ` Linus Torvalds
2014-06-11 20:34           ` Jan Kara
2014-06-11 21:31             ` Jan Kara
2014-06-12  3:07               ` Sasha Levin
2014-06-12  8:26                 ` Jan Kara [this message]
2014-06-12  8:54                   ` Mike Galbraith
2014-07-08 13:02                     ` Peter Zijlstra
2014-06-19 17:28                 ` Jan Kara

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140612082645.GF9511@quack.suse.cz \
    --to=jack@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=jet.chen@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peter@hurleysoftware.com \
    --cc=peterz@infradead.org \
    --cc=pmladek@suse.cz \
    --cc=sasha.levin@oracle.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox