All of lore.kernel.org
 help / color / mirror / Atom feed
From: Marcin Slusarz <marcin.slusarz@gmail.com>
To: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: LKML <linux-kernel@vger.kernel.org>, Ingo Molnar <mingo@elte.hu>
Subject: Re: 2.6.25-rc: complete lockup on boot/start of X (bisected)
Date: Sun, 23 Mar 2008 19:46:10 +0100	[thread overview]
Message-ID: <20080323184427.GA6691@joi> (raw)
In-Reply-To: <1206287670.6437.113.camel@lappy>

On Sun, Mar 23, 2008 at 04:54:29PM +0100, Peter Zijlstra wrote:
> On Sun, 2008-03-23 at 16:44 +0100, Marcin Slusarz wrote:
> > On Sun, Mar 02, 2008 at 08:58:37PM +0100, Peter Zijlstra wrote:
> > > 
> > > On Sun, 2008-03-02 at 20:47 +0100, Marcin Slusarz wrote:
> > > > On Sun, Mar 02, 2008 at 08:11:11PM +0100, Peter Zijlstra wrote:
> > > > > 
> > > > > On Sun, 2008-03-02 at 20:00 +0100, Marcin Slusarz wrote:
> > > > > > Hi
> > > > > > Since early 2.6.25 days I'm having strange lockup on boot. As it happens
> > > > > > rarely (in ~10% of boots), I couldn't bisect it. No kernel panic, SysRq
> > > > > > didn't work, so I couldn't provide any useful informations to LK community.
> > > > > > I hoped someone else would fix it... :)
> > > > > > 
> > > > > > It's rc3 so I decided to narrow it down myself. I enabled netconsole 
> > > > > > to see whether some other informations are printed before lockup.
> > > > > > It didn't help, but I noticed that lockup happens much more frequenly! (~50%)
> > > > > > So I bisected it down to:
> > > > > > 
> > > > > > 8f4d37ec073c17e2d4aa8851df5837d798606d6f is first bad commit
> > > > > > commit 8f4d37ec073c17e2d4aa8851df5837d798606d6f
> > > > > > Author: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > > Date:   Fri Jan 25 21:08:29 2008 +0100
> > > > > > 
> > > > > >     sched: high-res preemption tick
> > > > > > 
> > > > > >     Use HR-timers (when available) to deliver an accurate preemption tick.
> > > > > > 
> > > > > >     The regular scheduler tick that runs at 1/HZ can be too coarse when nice
> > > > > >     level are used. The fairness system will still keep the cpu utilisation 'fair'
> > > > > >     by then delaying the task that got an excessive amount of CPU time but try to
> > > > > >     minimize this by delivering preemption points spot-on.
> > > > > > 
> > > > > >     The average frequency of this extra interrupt is sched_latency / nr_latency.
> > > > > >     Which need not be higher than 1/HZ, its just that the distribution within the
> > > > > >     sched_latency period is important.
> > > > > > 
> > > > > >     Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > > >     Signed-off-by: Ingo Molnar <mingo@elte.hu>
> > > > > > 
> > > > > > :040000 040000 ab225228500f7a19d5ad20ca12ca3fc8ff5f5ad1 f1742e1d225a72aecea9d6961ed989b5943d31d8 M      arch
> > > > > > :040000 040000 25d85e4ef7a71b0cc76801a2526ebeb4dce180fe ae61510186b4fad708ef0211ac169decba16d4e5 M      include
> > > > > > :040000 040000 9247cec7dd506c648ac027c17e5a07145aa41b26 950832cc1dc4d30923f593ecec883a06b45d62e9 M      kernel
> > > > > > 
> > > > > > I can't revert it on top of rc3 because of conflicts.
> > > > > 
> > > > > This should do, I guess. Weird though, I haven't had trouble with this
> > > > > patch in a long long while. Nor I suppose has Ingo's QA setup.
> > > > Ok. It did the trick. But it's temporary fix, right?
> > > 
> > > Yeah, for a proper fix I'd need to understand what goes wrong.. and that
> > > requires I get more information. Hopefully I can reproduce your issue.
> > > 
> > > > > Will try if I can reproduce using your .config.
> > > > I think this lockup might depend on use of dhcp and/or parallel
> > > > starting of services...
> > > 
> > > It _should_ not.. :-) I can try dhcp quite easily, if nothing comes up I
> > > can try installing gentoo on a test box, stage3 installs are easy
> > > enough.
> > 
> > I'm still having this lockup on 2.6.25-rc6 (028011e1391eab27e7bc113c2ac08d4f55584a75).
> > What informations do you need?
> 
> Does the NMI watchdog (append nmi_watchdog=2) report anything?
> 
> I've never been able to reproduce myself :-/

4 different lockups:
http://alan.umcs.lublin.pl/~mslusarz/kernel/2008.03.23-lockup/ 

Are there any downsides of using nmi_watchdog=2 all the time?

ps: Documentation/nmi_watchdog.txt says: "Currently, local APIC mode
(nmi_watchdog=2) does not work on x86-64.". It's not true, so maybe someone
should update this file?

  reply	other threads:[~2008-03-23 18:46 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-03-02 19:00 2.6.25-rc: complete lockup on boot/start of X (bisected) Marcin Slusarz
2008-03-02 19:11 ` Peter Zijlstra
2008-03-02 19:47   ` Marcin Slusarz
2008-03-02 19:58     ` Peter Zijlstra
2008-03-23 15:44       ` Marcin Slusarz
2008-03-23 15:54         ` Peter Zijlstra
2008-03-23 18:46           ` Marcin Slusarz [this message]
2008-03-23 19:06             ` Peter Zijlstra
2008-03-23 19:09               ` Peter Zijlstra
2008-03-23 19:57                 ` Marcin Slusarz
2008-03-23 20:06               ` [PATCH] documentation: nmi_watchdog=2 works on x86_64 (was: 2.6.25-rc: complete lockup on boot/start of X (bisected)) Marcin Slusarz
2008-03-23 20:15                 ` Yinghai Lu
2008-03-27  9:51                 ` Ingo Molnar
2008-03-02 20:00 ` 2.6.25-rc: complete lockup on boot/start of X (bisected) Arjan van de Ven
2008-03-02 20:08   ` Marcin Slusarz

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20080323184427.GA6691@joi \
    --to=marcin.slusarz@gmail.com \
    --cc=a.p.zijlstra@chello.nl \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@elte.hu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.