Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
       [not found] <506429.22669.qm@web82107.mail.mud.yahoo.com>
@ 2008-08-09  7:34 ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2008-08-09  7:34 UTC (permalink / raw)
  To: David Witbrodt
  Cc: linux-kernel, Yinghai Lu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Paul E. McKenney, netdev

On Fri, 2008-08-08 at 18:23 -0700, David Witbrodt wrote:
> I have tracked the regression down to an RCU problem.
> 
> I added some printk()'s to the function inet_register_protosw() in
> net/ipv4/af_inet.c, as seen in this diff:
> 
> ===== BEGIN DIFF ==========
>       * non-permanent entry.  This means that when we remove this entry, the
>       * system automatically returns to the old behavior.
>       */
> +    printk ("     Adding new protocol\n");
>      list_add_rcu(&p->list, last_perm);
> +
>  out:
> +    printk ("     Unlocking spinlock\n");
>      spin_unlock_bh(&inetsw_lock);
>  
> +    printk ("     Calling synchronize_net()\n");
>      synchronize_net();
>  
>      return;
> ===== END DIFF ==========
> 
> A kernel built with these changes freezes with "Calling synchronize_net()"
> as the last printed line.
> 
> I located the function synchronize_net() in net/core/dev.c, and it was easy
> to add some printk()'s there:
> 
> ===== BEGIN DIFF ==========
>  
> void synchronize_net(void)
>  {
> +    printk ("   synchronize_net(): calling might_sleep()\n");
>      might_sleep();
> +
> +    printk ("   synchronize_net(): calling synchronize_rcu()\n");
>      synchronize_rcu();
>  }
> ===== END DIFF ==========
> 
> The kernel built with these changes froze with "synchronize_net(): 
> calling synchronize_rcu()" as the last line on the screen.
> 
> After reading some documentation in Documentation/RCU/, it looks like 
> something is misusing RCU -- and, according to the Documentation, those kinds 
> of mistakes are easy to make.  Maybe necessary calls to
>  
>     rcu_read_lock()
>     rcu_read_unlock()
> 
> are missing, and something about my hardware is triggering a freeze that 
> doesn't occur on most hardware.
> 
> 
> For some reason, turning off the HPET by booting with "hpet=disabled" keeps
> the freeze from happening.  Just reading a couple of those docs about RCU
> made me dizzy, so I hope someone familiar with RCU issues will take a look
> at the code in the files I've listed.  Surely you guys can take it from here
> now?!
> 
> If not, just give me some experimental code changes to make to get my 2.6.26
> and 2.6.27 kernels working again without disabling HPET!!!


The typical way to deadlock like this is do something like:

 rcu_read_lock();

   synchronize_rcu();

 rcu_read_unlock();

While I cannot immediately see any such usage in the function you
quoted, it could be on of the callers.. let me browse some code..

Can't seem to find anything like that.

What's weird though - is that HPET makes any difference on these network
code paths.

Could we end up calling rcu too soon? I doubt we bring up ipv4 before
rcu..



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
@ 2008-08-09 12:39 David Witbrodt
  2008-08-09 13:56 ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: David Witbrodt @ 2008-08-09 12:39 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: linux-kernel, Yinghai Lu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Paul E. McKenney, netdev

> On Fri, 2008-08-08 at 18:23 -0700, David Witbrodt wrote:
> > I have tracked the regression down to an RCU problem.
> > [...]
> > After reading some documentation in Documentation/RCU/, it looks like 
> > something is misusing RCU -- and, according to the Documentation, those kinds 
> > of mistakes are easy to make.  Maybe necessary calls to
> >  
> >     rcu_read_lock()
> >     rcu_read_unlock()
> > 
> > are missing, and something about my hardware is triggering a freeze that 
> > doesn't occur on most hardware.
> > 
> > 
> > For some reason, turning off the HPET by booting with "hpet=disabled" keeps
> > the freeze from happening.  Just reading a couple of those docs about RCU
> > made me dizzy, so I hope someone familiar with RCU issues will take a look
> > at the code in the files I've listed.  Surely you guys can take it from here
> > now?!
> > 
> > If not, just give me some experimental code changes to make to get my 2.6.26
> > and 2.6.27 kernels working again without disabling HPET!!!
> 
> 
> The typical way to deadlock like this is do something like:
> 
> rcu_read_lock();
> 
>    synchronize_rcu();
> 
> rcu_read_unlock();
> 
> While I cannot immediately see any such usage in the function you
> quoted, it could be on of the callers.. let me browse some code..
> 
> Can't seem to find anything like that.
> 
> What's weird though - is that HPET makes any difference on these network
> code paths.
> 
> Could we end up calling rcu too soon? I doubt we bring up ipv4 before
> rcu..

I'm _way_ over my head in this discussion, but here's some more food
for thought.  Last weekend, when I first tried 2.6.26 and discovered the
freeze, I thought an error of my own in .config was causing it.  Before
I ever sought help, I made about a dozen experiments with different
.config files.

One series of those experiments involved turning off most of the kernel...
including CONFIG_INET.  The kernel still froze, but when entering 
pci_init().  (This info can be read in my original post to the Debian BTS,
which I have provided links for a couple of times in this LKML thread.  I
even went further and removed enough that the freeze was avoided, but so
much of the kernel was missing that my init scripts couldn't mount a hard
disk any more.  Trying to restore enough to allow HD mounting just brought
back the freeze.)

I am completely ignorant about how the kernel works, so any guesses I have
are probably worthless... but I'll throw some out anyway:

1.  Maybe HPET is used (if present) for timing by RCU, so disabling it
forces RCU to work differently.  (Pure guess here:  I know nothing about
RCU, and haven't even tried looking at its code.)

2.  Maybe my hardware is broken.  We need see one initcall return that
report over 280,000 msecs... when the entire boot->freeze time was about
3 secs.  On the other hand, 2.6.25 (and before) work just fine with HPET
enabled.

3. I was able to find the commit that introduced the freeze
(3def3d6ddf43dbe20c00c3cbc38dfacc8586998f), so there has to be a connection
between that commit and the RCU problem.  Is it possible that a prexisting
error or oversight in the code was merely exposed by that commit?  (And 
only on certain hardware?)  Or does that code itself contain the error?

4. Another bug has been posted on the Debian BTS, which is worked around
by disabling HPET.  The user provided some links to bugzilla.kernel.org
where David Brownell is fighting with some HPET/RTC issues (but no mention
of RCU):
http://bugzilla.kernel.org/show_bug.cgi?id=11111
http://bugzilla.kernel.org/show_bug.cgi?id=11153

I honestly don't know whether this is related to my problem or not.  :-(

If any has any test code I can run to detect massive HPET breakage on
these motherboards, I'll be glad to do so.  Or any other experimental
code changes, for that matter.

Thanks again,
Dave W.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-09 12:39 David Witbrodt
@ 2008-08-09 13:56 ` Paul E. McKenney
  2008-08-11 11:25   ` Ingo Molnar
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2008-08-09 13:56 UTC (permalink / raw)
  To: David Witbrodt
  Cc: Peter Zijlstra, linux-kernel, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, netdev

On Sat, Aug 09, 2008 at 05:39:26AM -0700, David Witbrodt wrote:
> 
> 
> > On Fri, 2008-08-08 at 18:23 -0700, David Witbrodt wrote:
> > > I have tracked the regression down to an RCU problem.
> > > [...]
> > > After reading some documentation in Documentation/RCU/, it looks like 
> > > something is misusing RCU -- and, according to the Documentation, those kinds 
> > > of mistakes are easy to make.  Maybe necessary calls to
> > >  
> > >     rcu_read_lock()
> > >     rcu_read_unlock()
> > > 
> > > are missing, and something about my hardware is triggering a freeze that 
> > > doesn't occur on most hardware.
> > > 
> > > 
> > > For some reason, turning off the HPET by booting with "hpet=disabled" keeps
> > > the freeze from happening.  Just reading a couple of those docs about RCU
> > > made me dizzy, so I hope someone familiar with RCU issues will take a look
> > > at the code in the files I've listed.  Surely you guys can take it from here
> > > now?!
> > > 
> > > If not, just give me some experimental code changes to make to get my 2.6.26
> > > and 2.6.27 kernels working again without disabling HPET!!!
> > 
> > 
> > The typical way to deadlock like this is do something like:
> > 
> > rcu_read_lock();
> > 
> >    synchronize_rcu();
> > 
> > rcu_read_unlock();
> > 
> > While I cannot immediately see any such usage in the function you
> > quoted, it could be on of the callers.. let me browse some code..
> > 
> > Can't seem to find anything like that.
> > 
> > What's weird though - is that HPET makes any difference on these network
> > code paths.
> > 
> > Could we end up calling rcu too soon? I doubt we bring up ipv4 before
> > rcu..
> 
> I'm _way_ over my head in this discussion, but here's some more food
> for thought.  Last weekend, when I first tried 2.6.26 and discovered the
> freeze, I thought an error of my own in .config was causing it.  Before
> I ever sought help, I made about a dozen experiments with different
> .config files.
> 
> One series of those experiments involved turning off most of the kernel...
> including CONFIG_INET.  The kernel still froze, but when entering 
> pci_init().  (This info can be read in my original post to the Debian BTS,
> which I have provided links for a couple of times in this LKML thread.  I
> even went further and removed enough that the freeze was avoided, but so
> much of the kernel was missing that my init scripts couldn't mount a hard
> disk any more.  Trying to restore enough to allow HD mounting just brought
> back the freeze.)
> 
> I am completely ignorant about how the kernel works, so any guesses I have
> are probably worthless... but I'll throw some out anyway:
> 
> 1.  Maybe HPET is used (if present) for timing by RCU, so disabling it
> forces RCU to work differently.  (Pure guess here:  I know nothing about
> RCU, and haven't even tried looking at its code.)

RCU doesn't use HPET directly.  Most of its time-dependent behavior
comes from its being invoked from the scheduling-clock interrupt.

> 2.  Maybe my hardware is broken.  We need see one initcall return that
> report over 280,000 msecs... when the entire boot->freeze time was about
> 3 secs.  On the other hand, 2.6.25 (and before) work just fine with HPET
> enabled.

For CONFIG_CLASSIC_RCU and !CONFIG_PREEMPT, in-kernel infinite spin loops
will cause synchronize_rcu() to hang.  For other RCU configurations,
spinning with interrupts disabled will result in similar hangs.  Invoking
synchronize_rcu() very early in boot (before rcu_init() has been called)
will of course also hang.

Could you please let me know whether your config has CONFIG_CLASSIC_RCU
or CONFIG_PREEMPT_RCU?

> 3. I was able to find the commit that introduced the freeze
> (3def3d6ddf43dbe20c00c3cbc38dfacc8586998f), so there has to be a connection
> between that commit and the RCU problem.  Is it possible that a prexisting
> error or oversight in the code was merely exposed by that commit?  (And 
> only on certain hardware?)  Or does that code itself contain the error?

Thank you for finding the commit -- should be quite helpful!!!

A quick look reveals what appears to be reader-writer locking rather
than RCU.  It does run in early boot before rcu_init(), so if it managed
to call synchronize_rcu() somehow you indeed would see a hang.  I do
not see such a call, but then again, I don't know this code much at all.

This is the second time in as many days that motivated RCU's working
correctly before rcu_init()...  Hmmm...

> 4. Another bug has been posted on the Debian BTS, which is worked around
> by disabling HPET.  The user provided some links to bugzilla.kernel.org
> where David Brownell is fighting with some HPET/RTC issues (but no mention
> of RCU):
> http://bugzilla.kernel.org/show_bug.cgi?id=11111
> http://bugzilla.kernel.org/show_bug.cgi?id=11153
> 
> I honestly don't know whether this is related to my problem or not.  :-(

Nor me.

> If any has any test code I can run to detect massive HPET breakage on
> these motherboards, I'll be glad to do so.  Or any other experimental
> code changes, for that matter.

If you can answer my CONFIG_CLASSIC_RCU vs. CONFIG_PREEMPT_RCU question
above, I should be able to provide you a diagnostic patch that would say
which CPU RCU was waiting on.  At least assuming that at least one CPU
was still taking the scheduling-clock interrupt, that is.  ;-)

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
@ 2008-08-09 22:35 David Witbrodt
  2008-08-10 15:15 ` Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: David Witbrodt @ 2008-08-09 22:35 UTC (permalink / raw)
  To: paulmck
  Cc: Peter Zijlstra, linux-kernel, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, netdev

OK, sorry for several hours of delay, but I had to work this
morning and just got home.

> > I am completely ignorant about how the kernel works, so any guesses I have
> > are probably worthless... but I'll throw some out anyway:
> > 
> > 1.  Maybe HPET is used (if present) for timing by RCU, so disabling it
> > forces RCU to work differently.  (Pure guess here:  I know nothing about
> > RCU, and haven't even tried looking at its code.)
> 
> RCU doesn't use HPET directly.  Most of its time-dependent behavior
> comes from its being invoked from the scheduling-clock interrupt.

  OK.  It was just a guess, anyway, but in my weak attempts to apply logic
to the problem I thought:  a locking issue would not go away merely by
disabling HPET, but if HPET touches the inner workings of RCU (or something
on which RCU depends) then it would make sense that disabling HPET causes
RCU to behave differently.
  I was just brainstorming, though....

> > 2.  Maybe my hardware is broken.  We need see one initcall return that
> > report over 280,000 msecs... when the entire boot->freeze time was about
> > 3 secs.  On the other hand, 2.6.25 (and before) work just fine with HPET
> > enabled.
> 
> For CONFIG_CLASSIC_RCU and !CONFIG_PREEMPT, in-kernel infinite spin loops
> will cause synchronize_rcu() to hang.  For other RCU configurations,
> spinning with interrupts disabled will result in similar hangs.  Invoking
> synchronize_rcu() very early in boot (before rcu_init() has been called)
> will of course also hang.
> 
> Could you please let me know whether your config has CONFIG_CLASSIC_RCU
> or CONFIG_PREEMPT_RCU?

[My apologies for the poor writing above.  The sentence "We need see one 
initcall return that report over 280,000 msecs..." was supposed to say
"We *DID* see one initcall return that *reported* over 280,000 msecs..."
In other words, something funky is going on with this machine's timers
in the crashing kernels.]

OK, I don't believe Paul was here for the beginning of this thread on
Monday, so before supplying the info requested I need to provide some
context on my situation.  I have one machine ("desktop") which works fine
with 2.6.2[67] kernels, with mboard = "Gigabyte GA-M59SLI-S5"; and I have
two machines ("fileserver", "webserver") on which 2.6.2[67] kernels freeze,
both with mboard = "ECS AMD690GM-M2".  I also am interested in getting the
Debian stock kernel working for their upcoming stable release, as well as
getting my own custom kernels working again.

First, here is the .config info for the Debian stock kernel called
"linux-image-2.6.26-1-amd64":
====================
$ egrep 'HPET|RCU|PREEMPT' config-2.6.26-1-amd64
CONFIG_PREEMPT_NOTIFIERS=y
CONFIG_CLASSIC_RCU=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_HPET=y
CONFIG_HPET_MMAP=y
# CONFIG_RCU_TORTURE_TEST is not set
====================
This kernel freezes on webserver/fileserver, but runs fine on desktop.  (The
binary is identical, having moved it from desktop to the others via NFS instead
of downloading a separate instance from the Debian repositories.)

Here is info from the custom .config for my FREEZING fileserver machine, which
is not the same as the desktop, and not the same as Debian stock:
====================
$ egrep 'HPET|RCU|PREEMPT' config-2.6.26-2s11950.080804.fileserver.uvesafb
CONFIG_CLASSIC_RCU=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_HPET=y
CONFIG_HPET_RTC_IRQ=y
CONFIG_HPET_MMAP=y
====================
This was derived from the working .config for 2.6.25 on fileserver:
====================
$ egrep 'HPET|RCU|PREEMPT' config-2.6.25-7.080720.fileserver.uvesafb
CONFIG_CLASSIC_RCU=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
CONFIG_PREEMPT_NONE=y
# CONFIG_PREEMPT_VOLUNTARY is not set
# CONFIG_PREEMPT is not set
CONFIG_HPET=y
# CONFIG_HPET_RTC_IRQ is not set
CONFIG_HPET_MMAP=y
====================

After reading Paul's email, but before replying, I applied the changes
to PREEMPT and PREEMPT_RCU and built 2.6.27-rc2 from my git tree on
fileserver.  This kernel FREEZES on fileserver, like the custom and
Debian stock 2.6.26 kernels mentioned above:
====================
$ egrep 'HPET|RCU|PREEMPT' config-2.6.27-rc2.080809.preempt+rcu
# CONFIG_CLASSIC_RCU is not set
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_TRACE=y
CONFIG_HPET=y
CONFIG_HPET_RTC_IRQ=y
CONFIG_HPET_MMAP=y
# CONFIG_PREEMPT_TRACER is not set
====================

Here is info from the custom .config for my WORKING desktop machine, which
is not the same as fileserver/webserver, and not the same as Debian stock:
====================
$ egrep 'HPET|RCU|PREEMPT' config-2.6.26-1.080801.desktop.uvesafb
CONFIG_CLASSIC_RCU=y
CONFIG_HPET_TIMER=y
CONFIG_HPET_EMULATE_RTC=y
# CONFIG_PREEMPT_NONE is not set
# CONFIG_PREEMPT_VOLUNTARY is not set
CONFIG_PREEMPT=y
# CONFIG_PREEMPT_RCU is not set
CONFIG_HPET=y
CONFIG_HPET_RTC_IRQ=y
CONFIG_HPET_MMAP=y
====================

(My custom configurations originated with the Debian stock config, but I
disabled drivers and features irrelevant for my hardware, then tweaked
each .config according to each machine's specific hardware and usage.
All machines work fine using my custom configs for 2.6.25 kernels and
earlier.)

> > 3. I was able to find the commit that introduced the freeze
> > (3def3d6ddf43dbe20c00c3cbc38dfacc8586998f), so there has to be a connection
> > between that commit and the RCU problem.  Is it possible that a prexisting
> > error or oversight in the code was merely exposed by that commit?  (And 
> > only on certain hardware?)  Or does that code itself contain the error?
> 
> Thank you for finding the commit -- should be quite helpful!!!
> 
> A quick look reveals what appears to be reader-writer locking rather
> than RCU.  It does run in early boot before rcu_init(), so if it managed
> to call synchronize_rcu() somehow you indeed would see a hang.  I do
> not see such a call, but then again, I don't know this code much at all.
> 
> This is the second time in as many days that motivated RCU's working
> correctly before rcu_init()...  Hmmm...

Again, I think Paul was not here for the previous messages in this thread.  A
bit of recap may be in order:

The commit that first causes the freeze (and I assume that no commits since
would also cause a freeze, but that is unknown at this point) touches 3
files:

arch/x86/kernel/e820_64.c:  Here, the algorithm was altered to remove
several calls to a function called request_resource(<args>), replacing them
with a single call to insert_resource(<args>).  I have no idea whether this
change is problematic, by I observe that "request" sounds read-only, while
"insert" implies read-write behavior.  (NB: this file no longer exists, and
its contents have been merged into 'e820.c'.)

arch/x86/kernel/setup_64.c:  Here, several calls of insert_resource(<args>)
are added in 2 functions.

include/asm-x86/e820_64.h:  Here, a function prototype is modified to reflect
changes made in 'e820_64.c'.

Booting the 2.6.26 kernels on fileserver with "debug initcall_debug" reveals
that the last function called before the freeze is called "inet_init()".
(The inet_init() function itself is not important here; one desperate 
experiment I tried, disabling most of the kernel... including CONFIG_NET...
caused the freeze to occur in pci_init() instead.)  The inet_init() function
is located in net/ipv4/af_inet.c, and freezes in a loop which calls
inet_register_protosw(<arg>):

===== BEGIN CODE EXCERPT ========
    /* Register the socket-side information for inet_create. */
    for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
        INIT_LIST_HEAD(r);

    for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)
        inet_register_protosw(q);
===== END EXCERPT ========

The inet_register_protosw(<arg>) function calls list_add_rcu(<args>) in a
block of code enclosed between spin_lock_bh(<arg>) and spin_unlock_bh(<arg>).
Again, I don't know what I'm doing, but it looks like this is where
inet_init() touchs RCU features.  Just before inet_register_protosw() hits
"return;" it calls, synchronize_net(); this is a tiny function, which calls
might_sleep() and synchronize_rcu().

At synchronize_rcu(), the freeze occurs.  It occurs on the first iteration
of inet_register_protosw(<arg>) as well.

To quote Daffy Duck:  "Something's amiss here...."  I lack the knowledge
and skills to know whether commit 3def3d... is really to blame, or whether
the changes it made simply revealed breakage in the other code which was
already present.  Indeed, none of you seem to be having any problem at all;
nor am I, on my "desktop" machine!

> > If any has any test code I can run to detect massive HPET breakage on
> > these motherboards, I'll be glad to do so.  Or any other experimental
> > code changes, for that matter.
> 
> If you can answer my CONFIG_CLASSIC_RCU vs. CONFIG_PREEMPT_RCU question
> above, I should be able to provide you a diagnostic patch that would say
> which CPU RCU was waiting on.  At least assuming that at least one CPU
> was still taking the scheduling-clock interrupt, that is.  ;-)

[More poor grammar apologies:  "If any has any test code..." ==> 
"If *anyone has any test code..."]

Thank you for the help.  This problem is frustrating, but incredibly
interesting to me.  I have never had this sort of problem with any previous
kernel, so I have never had an opportunity to play bug-catcher before.  By
pursuing the matter this far, I have learned elementary usage of 'git', I
have had a chance to peek at the kernel source code itself, and have even
successfully inserted code (only harmless printk()'s, though) and built the
modified kernel without errors afterward!  Without this regression, I would
have had none of this fun!

A few closing comments, then:

1.  I don't think the PREEMPT options in .config are to blame.  The Debian
stock 2.6.26 kernel runs on "desktop", but freezes on "fileserver".  That
makes it look like a hardware issue, but 2.6.25 ran fine.  [init_headache()]

2.  Commit 3def3d... draws the line between 2.6.25 working on "fileserver"
and pre-2.6.26 not working on "fileserver".  The changes in e820.c seem to
modify a function called e820_reserve_resources() from requesting resources
to inserting resources.  (The changes in setup.c don't affect me, since the
additional call of insert_resource() is in a block depending on CONFIG_KEXEC,
which is disable in my custom kernels.)  Something about this commit causes
inet_init() -- which calls inet_register_protosw(), which calls 
synchronize_net(), which calls synchronize_rcu() -- to freeze.
[init_migraine()]

3.  Whatever the cause -- whether the commit is doing something wrong, or
whether it just exposed something else that wasn't right to begin with --
the problem can just be made to go away by using "hpet=disabled" as a boot
parameter.  [init_apoplexy()]

4.  The problem seems to only manifest itself on an ECS AMD690GM-M2
motherboard, since of the thousands of users of Debian Sid I am the only
one reporting a problem on the Debian BTS, and no one else on the LKML is
experiencing it either.  [init_fatal_aneurism()]

However, even though I am the only one plagued by this problem, it is clear
that this hardware ran 2.6.25 just fine.  Maybe the full extent of the
problem is yet to be seen, since the vast majority of Linux users run
distributions with older kernels.  So, I'm viewing this as a chance for
me to finally be able to contribute, until one of 3 things is discovered:
the problem is my fault, the problem is my hardware's fault, or the problem
is a bug in the kernel.

Thanks Paul (and Peter and Yinghai),
Dave W.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-09 22:35 HPET regression in 2.6.26 versus 2.6.25 -- RCU problem David Witbrodt
@ 2008-08-10 15:15 ` Paul E. McKenney
  2008-08-11  1:35   ` [PATCH diagnostic] " Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2008-08-10 15:15 UTC (permalink / raw)
  To: David Witbrodt
  Cc: Peter Zijlstra, linux-kernel, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, netdev

On Sat, Aug 09, 2008 at 03:35:48PM -0700, David Witbrodt wrote:
> OK, sorry for several hours of delay, but I had to work this
> morning and just got home.
> 
> 
> 
> 
> > > I am completely ignorant about how the kernel works, so any guesses I have
> > > are probably worthless... but I'll throw some out anyway:
> > > 
> > > 1.  Maybe HPET is used (if present) for timing by RCU, so disabling it
> > > forces RCU to work differently.  (Pure guess here:  I know nothing about
> > > RCU, and haven't even tried looking at its code.)
> > 
> > RCU doesn't use HPET directly.  Most of its time-dependent behavior
> > comes from its being invoked from the scheduling-clock interrupt.
> 
>   OK.  It was just a guess, anyway, but in my weak attempts to apply logic
> to the problem I thought:  a locking issue would not go away merely by
> disabling HPET, but if HPET touches the inner workings of RCU (or something
> on which RCU depends) then it would make sense that disabling HPET causes
> RCU to behave differently.
>   I was just brainstorming, though....

One other possibility would be something like:

	rcu_read_lock();
	/* something that waits for the HPET. */
	rcu_read_unlock();

I don't know of any such code sequence, but if one did exist somewhere
in the kernel, then HPET failure could stall a synchronize_rcu().

> > > 2.  Maybe my hardware is broken.  We need see one initcall return that
> > > report over 280,000 msecs... when the entire boot->freeze time was about
> > > 3 secs.  On the other hand, 2.6.25 (and before) work just fine with HPET
> > > enabled.
> > 
> > For CONFIG_CLASSIC_RCU and !CONFIG_PREEMPT, in-kernel infinite spin loops
> > will cause synchronize_rcu() to hang.  For other RCU configurations,
> > spinning with interrupts disabled will result in similar hangs.  Invoking
> > synchronize_rcu() very early in boot (before rcu_init() has been called)
> > will of course also hang.
> > 
> > Could you please let me know whether your config has CONFIG_CLASSIC_RCU
> > or CONFIG_PREEMPT_RCU?
> 
> [My apologies for the poor writing above.  The sentence "We need see one 
> initcall return that report over 280,000 msecs..." was supposed to say
> "We *DID* see one initcall return that *reported* over 280,000 msecs..."
> In other words, something funky is going on with this machine's timers
> in the crashing kernels.]

No need to apologize -- I did understand your intent.

> OK, I don't believe Paul was here for the beginning of this thread on
> Monday, so before supplying the info requested I need to provide some
> context on my situation.  I have one machine ("desktop") which works fine
> with 2.6.2[67] kernels, with mboard = "Gigabyte GA-M59SLI-S5"; and I have
> two machines ("fileserver", "webserver") on which 2.6.2[67] kernels freeze,
> both with mboard = "ECS AMD690GM-M2".  I also am interested in getting the
> Debian stock kernel working for their upcoming stable release, as well as
> getting my own custom kernels working again.

OK, so at least the desktop machine is multi-CPU, and perhaps the
fileserver as well.

> First, here is the .config info for the Debian stock kernel called
> "linux-image-2.6.26-1-amd64":
> ====================
> $ egrep 'HPET|RCU|PREEMPT' config-2.6.26-1-amd64
> CONFIG_PREEMPT_NOTIFIERS=y
> CONFIG_CLASSIC_RCU=y

OK.  Classic RCU has not changed much recently.  This also indicates
an infinite loop in kernel code (or a CPU locking up completely, which
is quite rare, but can still happen).

You do try preemptable RCU, which is much more recent (and thus much
more subject to suspicion), but get the same result.

I will see about putting together a diagnostic patch for Classic RCU.
The approach will be to record jiffies (or some such) at the beginning
of the grace period (in rcu_start_batch()), then have
rcu_check_callbacks() complain if:

1.	it is running on a CPU that has holding up grace periods for
	a long time (say one second).  This will identify the culprit
	assuming that the culprit has not disabled hardware irqs,
	instruction execution, or some such.

2.	it is running on a CPU that is not holding up grace periods,
	but grace periods have been held up for an even longer time
	(say two seconds).

In either case, some sort of exponential backoff would be needed to
avoid multi-gigabyte log files.  Of course, all of this assumes that
the machine remains healthy enough to actually get any such messages
somewhere that you can see them, but so it goes...

							Thanx, Paul

> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set
> CONFIG_HPET=y
> CONFIG_HPET_MMAP=y
> # CONFIG_RCU_TORTURE_TEST is not set
> ====================
> This kernel freezes on webserver/fileserver, but runs fine on desktop.  (The
> binary is identical, having moved it from desktop to the others via NFS instead
> of downloading a separate instance from the Debian repositories.)
> 
> Here is info from the custom .config for my FREEZING fileserver machine, which
> is not the same as the desktop, and not the same as Debian stock:
> ====================
> $ egrep 'HPET|RCU|PREEMPT' config-2.6.26-2s11950.080804.fileserver.uvesafb
> CONFIG_CLASSIC_RCU=y
> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set
> CONFIG_HPET=y
> CONFIG_HPET_RTC_IRQ=y
> CONFIG_HPET_MMAP=y
> ====================
> This was derived from the working .config for 2.6.25 on fileserver:
> ====================
> $ egrep 'HPET|RCU|PREEMPT' config-2.6.25-7.080720.fileserver.uvesafb
> CONFIG_CLASSIC_RCU=y
> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> CONFIG_PREEMPT_NONE=y
> # CONFIG_PREEMPT_VOLUNTARY is not set
> # CONFIG_PREEMPT is not set
> CONFIG_HPET=y
> # CONFIG_HPET_RTC_IRQ is not set
> CONFIG_HPET_MMAP=y
> ====================
> 
> After reading Paul's email, but before replying, I applied the changes
> to PREEMPT and PREEMPT_RCU and built 2.6.27-rc2 from my git tree on
> fileserver.  This kernel FREEZES on fileserver, like the custom and
> Debian stock 2.6.26 kernels mentioned above:
> ====================
> $ egrep 'HPET|RCU|PREEMPT' config-2.6.27-rc2.080809.preempt+rcu
> # CONFIG_CLASSIC_RCU is not set
> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> CONFIG_PREEMPT_RCU=y
> CONFIG_RCU_TRACE=y
> CONFIG_HPET=y
> CONFIG_HPET_RTC_IRQ=y
> CONFIG_HPET_MMAP=y
> # CONFIG_PREEMPT_TRACER is not set
> ====================
> 
> Here is info from the custom .config for my WORKING desktop machine, which
> is not the same as fileserver/webserver, and not the same as Debian stock:
> ====================
> $ egrep 'HPET|RCU|PREEMPT' config-2.6.26-1.080801.desktop.uvesafb
> CONFIG_CLASSIC_RCU=y
> CONFIG_HPET_TIMER=y
> CONFIG_HPET_EMULATE_RTC=y
> # CONFIG_PREEMPT_NONE is not set
> # CONFIG_PREEMPT_VOLUNTARY is not set
> CONFIG_PREEMPT=y
> # CONFIG_PREEMPT_RCU is not set
> CONFIG_HPET=y
> CONFIG_HPET_RTC_IRQ=y
> CONFIG_HPET_MMAP=y
> ====================
> 
> (My custom configurations originated with the Debian stock config, but I
> disabled drivers and features irrelevant for my hardware, then tweaked
> each .config according to each machine's specific hardware and usage.
> All machines work fine using my custom configs for 2.6.25 kernels and
> earlier.)
> 
> 
> > > 3. I was able to find the commit that introduced the freeze
> > > (3def3d6ddf43dbe20c00c3cbc38dfacc8586998f), so there has to be a connection
> > > between that commit and the RCU problem.  Is it possible that a prexisting
> > > error or oversight in the code was merely exposed by that commit?  (And 
> > > only on certain hardware?)  Or does that code itself contain the error?
> > 
> > Thank you for finding the commit -- should be quite helpful!!!
> > 
> > A quick look reveals what appears to be reader-writer locking rather
> > than RCU.  It does run in early boot before rcu_init(), so if it managed
> > to call synchronize_rcu() somehow you indeed would see a hang.  I do
> > not see such a call, but then again, I don't know this code much at all.
> > 
> > This is the second time in as many days that motivated RCU's working
> > correctly before rcu_init()...  Hmmm...
> 
> Again, I think Paul was not here for the previous messages in this thread.  A
> bit of recap may be in order:
> 
> The commit that first causes the freeze (and I assume that no commits since
> would also cause a freeze, but that is unknown at this point) touches 3
> files:
> 
> arch/x86/kernel/e820_64.c:  Here, the algorithm was altered to remove
> several calls to a function called request_resource(<args>), replacing them
> with a single call to insert_resource(<args>).  I have no idea whether this
> change is problematic, by I observe that "request" sounds read-only, while
> "insert" implies read-write behavior.  (NB: this file no longer exists, and
> its contents have been merged into 'e820.c'.)
> 
> arch/x86/kernel/setup_64.c:  Here, several calls of insert_resource(<args>)
> are added in 2 functions.
> 
> include/asm-x86/e820_64.h:  Here, a function prototype is modified to reflect
> changes made in 'e820_64.c'.
> 
> 
> Booting the 2.6.26 kernels on fileserver with "debug initcall_debug" reveals
> that the last function called before the freeze is called "inet_init()".
> (The inet_init() function itself is not important here; one desperate 
> experiment I tried, disabling most of the kernel... including CONFIG_NET...
> caused the freeze to occur in pci_init() instead.)  The inet_init() function
> is located in net/ipv4/af_inet.c, and freezes in a loop which calls
> inet_register_protosw(<arg>):
> 
> ===== BEGIN CODE EXCERPT ========
>     /* Register the socket-side information for inet_create. */
>     for (r = &inetsw[0]; r < &inetsw[SOCK_MAX]; ++r)
>         INIT_LIST_HEAD(r);
> 
>     for (q = inetsw_array; q < &inetsw_array[INETSW_ARRAY_LEN]; ++q)
>         inet_register_protosw(q);
> ===== END EXCERPT ========
> 
> The inet_register_protosw(<arg>) function calls list_add_rcu(<args>) in a
> block of code enclosed between spin_lock_bh(<arg>) and spin_unlock_bh(<arg>).
> Again, I don't know what I'm doing, but it looks like this is where
> inet_init() touchs RCU features.  Just before inet_register_protosw() hits
> "return;" it calls, synchronize_net(); this is a tiny function, which calls
> might_sleep() and synchronize_rcu().
> 
> At synchronize_rcu(), the freeze occurs.  It occurs on the first iteration
> of inet_register_protosw(<arg>) as well.
> 
> To quote Daffy Duck:  "Something's amiss here...."  I lack the knowledge
> and skills to know whether commit 3def3d... is really to blame, or whether
> the changes it made simply revealed breakage in the other code which was
> already present.  Indeed, none of you seem to be having any problem at all;
> nor am I, on my "desktop" machine!
> 
> 
> > > If any has any test code I can run to detect massive HPET breakage on
> > > these motherboards, I'll be glad to do so.  Or any other experimental
> > > code changes, for that matter.
> > 
> > If you can answer my CONFIG_CLASSIC_RCU vs. CONFIG_PREEMPT_RCU question
> > above, I should be able to provide you a diagnostic patch that would say
> > which CPU RCU was waiting on.  At least assuming that at least one CPU
> > was still taking the scheduling-clock interrupt, that is.  ;-)
> 
> [More poor grammar apologies:  "If any has any test code..." ==> 
> "If *anyone has any test code..."]
> 
> Thank you for the help.  This problem is frustrating, but incredibly
> interesting to me.  I have never had this sort of problem with any previous
> kernel, so I have never had an opportunity to play bug-catcher before.  By
> pursuing the matter this far, I have learned elementary usage of 'git', I
> have had a chance to peek at the kernel source code itself, and have even
> successfully inserted code (only harmless printk()'s, though) and built the
> modified kernel without errors afterward!  Without this regression, I would
> have had none of this fun!
> 
> A few closing comments, then:
> 
> 1.  I don't think the PREEMPT options in .config are to blame.  The Debian
> stock 2.6.26 kernel runs on "desktop", but freezes on "fileserver".  That
> makes it look like a hardware issue, but 2.6.25 ran fine.  [init_headache()]
> 
> 2.  Commit 3def3d... draws the line between 2.6.25 working on "fileserver"
> and pre-2.6.26 not working on "fileserver".  The changes in e820.c seem to
> modify a function called e820_reserve_resources() from requesting resources
> to inserting resources.  (The changes in setup.c don't affect me, since the
> additional call of insert_resource() is in a block depending on CONFIG_KEXEC,
> which is disable in my custom kernels.)  Something about this commit causes
> inet_init() -- which calls inet_register_protosw(), which calls 
> synchronize_net(), which calls synchronize_rcu() -- to freeze.
> [init_migraine()]
> 
> 3.  Whatever the cause -- whether the commit is doing something wrong, or
> whether it just exposed something else that wasn't right to begin with --
> the problem can just be made to go away by using "hpet=disabled" as a boot
> parameter.  [init_apoplexy()]
> 
> 4.  The problem seems to only manifest itself on an ECS AMD690GM-M2
> motherboard, since of the thousands of users of Debian Sid I am the only
> one reporting a problem on the Debian BTS, and no one else on the LKML is
> experiencing it either.  [init_fatal_aneurism()]
> 
> However, even though I am the only one plagued by this problem, it is clear
> that this hardware ran 2.6.25 just fine.  Maybe the full extent of the
> problem is yet to be seen, since the vast majority of Linux users run
> distributions with older kernels.  So, I'm viewing this as a chance for
> me to finally be able to contribute, until one of 3 things is discovered:
> the problem is my fault, the problem is my hardware's fault, or the problem
> is a bug in the kernel.
> 
> 
> Thanks Paul (and Peter and Yinghai),
> Dave W.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH diagnostic] Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-10 15:15 ` Paul E. McKenney
@ 2008-08-11  1:35   ` Paul E. McKenney
  2008-08-11  1:38     ` Paul E. McKenney
  2008-08-11 11:38     ` Ingo Molnar
  0 siblings, 2 replies; 19+ messages in thread
From: Paul E. McKenney @ 2008-08-11  1:35 UTC (permalink / raw)
  To: David Witbrodt
  Cc: Peter Zijlstra, linux-kernel, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, netdev

On Sun, Aug 10, 2008 at 08:15:20AM -0700, Paul E. McKenney wrote:
> I will see about putting together a diagnostic patch for Classic RCU.
> The approach will be to record jiffies (or some such) at the beginning
> of the grace period (in rcu_start_batch()), then have
> rcu_check_callbacks() complain if:
> 
> 1.	it is running on a CPU that has holding up grace periods for
> 	a long time (say one second).  This will identify the culprit
> 	assuming that the culprit has not disabled hardware irqs,
> 	instruction execution, or some such.
> 
> 2.	it is running on a CPU that is not holding up grace periods,
> 	but grace periods have been held up for an even longer time
> 	(say two seconds).
> 
> In either case, some sort of exponential backoff would be needed to
> avoid multi-gigabyte log files.  Of course, all of this assumes that
> the machine remains healthy enough to actually get any such messages
> somewhere that you can see them, but so it goes...

And here is the patch.  It is still a bit raw, so the results should
be viewed with some suspicion.  It adds a default-off kernel parameter
CONFIG_RCU_CPU_STALL which must be enabled.

Rather than exponential backoff, it backs off to once per 30 seconds.
My feeling upon thinking on it was that if you have stalled RCU grace
periods for that long, a few extra printk() messages are probably the
least of your worries...

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 include/linux/rcuclassic.h |    3 +
 kernel/rcuclassic.c        |   80 +++++++++++++++++++++++++++++++++++++++++++++
 lib/Kconfig.debug          |   13 +++++++
 3 files changed, 96 insertions(+)

diff -urpNa -X dontdiff linux-2.6.27-rc1/include/linux/rcuclassic.h linux-2.6.27-rc1-cpustall/include/linux/rcuclassic.h
--- linux-2.6.27-rc1/include/linux/rcuclassic.h	2008-07-30 08:48:16.000000000 -0700
+++ linux-2.6.27-rc1-cpustall/include/linux/rcuclassic.h	2008-08-10 12:21:22.000000000 -0700
@@ -46,6 +46,9 @@ struct rcu_ctrlblk {
 	long	cur;		/* Current batch number.                      */
 	long	completed;	/* Number of the last completed batch         */
 	int	next_pending;	/* Is the next batch already waiting?         */
+#ifdef CONFIG_RCU_CPU_STALL
+	unsigned long gp_check;	/* Time grace period should end, in seconds.  */
+#endif /* #ifdef CONFIG_RCU_CPU_STALL */
 
 	int	signaled;
 
diff -urpNa -X dontdiff linux-2.6.27-rc1/kernel/rcuclassic.c linux-2.6.27-rc1-cpustall/kernel/rcuclassic.c
--- linux-2.6.27-rc1/kernel/rcuclassic.c	2008-07-30 08:48:17.000000000 -0700
+++ linux-2.6.27-rc1-cpustall/kernel/rcuclassic.c	2008-08-10 17:51:32.000000000 -0700
@@ -47,6 +47,7 @@
 #include <linux/notifier.h>
 #include <linux/cpu.h>
 #include <linux/mutex.h>
+#include <linux/time.h>
 
 #ifdef CONFIG_DEBUG_LOCK_ALLOC
 static struct lock_class_key rcu_lock_key;
@@ -269,6 +270,81 @@ static void rcu_do_batch(struct rcu_data
  *   rcu_check_quiescent_state calls rcu_start_batch(0) to start the next grace
  *   period (if necessary).
  */
+
+#ifdef CONFIG_RCU_CPU_STALL
+
+static inline void record_gp_check_time(struct rcu_ctrlblk *rcp)
+{
+	rcp->gp_check = get_seconds() + 3;
+}
+static void print_other_cpu_stall(struct rcu_ctrlblk *rcp)
+{
+	int cpu;
+	long delta;
+
+	/* Only let one CPU complain about others per time interval. */
+
+	spin_lock(&rcp->lock);
+	delta = get_seconds() - rcp->gp_check;
+	if (delta < 2L ||
+	    cpus_empty(rcp->cpumask)) {
+		spin_unlock(&rcp->lock);
+		return;
+	rcp->gp_check = get_seconds() + 30;
+	}
+	spin_unlock(&rcp->lock);
+
+	/* OK, time to rat on our buddy... */
+
+	printk(KERN_ERR "RCU detected CPU stalls:");
+	for_each_cpu_mask(cpu, rcp->cpumask)
+		printk(" %d", cpu);
+	printk(" (detected by %d, t=%lu/%lu)\n",
+	       smp_processor_id(), get_seconds(), rcp->gp_check);
+}
+static void print_cpu_stall(struct rcu_ctrlblk *rcp)
+{
+	printk(KERN_ERR "RCU detected CPU %d stall (t=%lu/%lu)\n",
+			smp_processor_id(), get_seconds(), rcp->gp_check);
+	dump_stack();
+	spin_lock(&rcp->lock);
+	if ((long)(get_seconds() - rcp->gp_check) >= 0L)
+		rcp->gp_check = get_seconds() + 30;
+	spin_unlock(&rcp->lock);
+}
+static inline void check_cpu_stall(struct rcu_ctrlblk *rcp,
+				   struct rcu_data *rdp)
+{
+	long delta;
+
+	delta = get_seconds() - rcp->gp_check;
+	if (cpu_isset(smp_processor_id(), rcp->cpumask) && delta >= 0L) {
+
+		/* We haven't checked in, so go dump stack. */
+
+		print_cpu_stall(rcp);
+
+	} else if (!cpus_empty(rcp->cpumask) && delta >= 2L) {
+
+		/* They had two seconds to dump stack, so complain. */
+
+		print_other_cpu_stall(rcp);
+
+	}
+}
+
+#else /* #ifdef CONFIG_RCU_CPU_STALL */
+
+static inline void record_gp_check_time(struct rcu_ctrlblk *rcp)
+{
+}
+static inline void check_cpu_stall(struct rcu_ctrlblk *rcp,
+				   struct rcu_data *rdp)
+{
+}
+
+#endif /* #else #ifdef CONFIG_RCU_CPU_STALL */
+
 /*
  * Register a new batch of callbacks, and start it up if there is currently no
  * active batch and the batch to be registered has not already occurred.
@@ -285,6 +361,7 @@ static void rcu_start_batch(struct rcu_c
 		 */
 		smp_wmb();
 		rcp->cur++;
+		record_gp_check_time(rcp);
 
 		/*
 		 * Accessing nohz_cpu_mask before incrementing rcp->cur needs a
@@ -468,6 +545,9 @@ static void rcu_process_callbacks(struct
 
 static int __rcu_pending(struct rcu_ctrlblk *rcp, struct rcu_data *rdp)
 {
+	/* Check for CPU stalls, if enabled. */
+	check_cpu_stall(rcp, rdp);
+
 	/* This cpu has pending rcu entries and the grace period
 	 * for them has completed.
 	 */
diff -urpNa -X dontdiff linux-2.6.27-rc1/lib/Kconfig.debug linux-2.6.27-rc1-cpustall/lib/Kconfig.debug
--- linux-2.6.27-rc1/lib/Kconfig.debug	2008-07-30 08:48:17.000000000 -0700
+++ linux-2.6.27-rc1-cpustall/lib/Kconfig.debug	2008-08-10 12:14:18.000000000 -0700
@@ -597,6 +597,19 @@ config RCU_TORTURE_TEST_RUNNABLE
 	  Say N here if you want the RCU torture tests to start only
 	  after being manually enabled via /proc.
 
+config RCU_CPU_STALL
+	bool "Check for stalled CPUs delaying RCU grace periods"
+	depends on CLASSIC_RCU
+	default n
+	help
+	  This option causes RCU to printk information on which
+	  CPUs are delaying the current grace period, but only when
+	  the grace period extends for excessive time periods.
+
+	  Say Y if you want RCU to perform such checks.
+
+	  Say N if you are unsure.
+
 config KPROBES_SANITY_TEST
 	bool "Kprobes sanity tests"
 	depends on DEBUG_KERNEL

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH diagnostic] Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-11  1:35   ` [PATCH diagnostic] " Paul E. McKenney
@ 2008-08-11  1:38     ` Paul E. McKenney
  2008-08-11 11:38     ` Ingo Molnar
  1 sibling, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2008-08-11  1:38 UTC (permalink / raw)
  To: David Witbrodt
  Cc: Peter Zijlstra, linux-kernel, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, netdev

[-- Attachment #1: Type: text/plain, Size: 6949 bytes --]

On Sun, Aug 10, 2008 at 06:35:38PM -0700, Paul E. McKenney wrote:
> On Sun, Aug 10, 2008 at 08:15:20AM -0700, Paul E. McKenney wrote:
> > I will see about putting together a diagnostic patch for Classic RCU.
> > The approach will be to record jiffies (or some such) at the beginning
> > of the grace period (in rcu_start_batch()), then have
> > rcu_check_callbacks() complain if:
> > 
> > 1.	it is running on a CPU that has holding up grace periods for
> > 	a long time (say one second).  This will identify the culprit
> > 	assuming that the culprit has not disabled hardware irqs,
> > 	instruction execution, or some such.
> > 
> > 2.	it is running on a CPU that is not holding up grace periods,
> > 	but grace periods have been held up for an even longer time
> > 	(say two seconds).
> > 
> > In either case, some sort of exponential backoff would be needed to
> > avoid multi-gigabyte log files.  Of course, all of this assumes that
> > the machine remains healthy enough to actually get any such messages
> > somewhere that you can see them, but so it goes...

And attached is the kernel module I used to test the patch, for whatever
that might be worth.

							Thanx, Paul

> And here is the patch.  It is still a bit raw, so the results should
> be viewed with some suspicion.  It adds a default-off kernel parameter
> CONFIG_RCU_CPU_STALL which must be enabled.
> 
> Rather than exponential backoff, it backs off to once per 30 seconds.
> My feeling upon thinking on it was that if you have stalled RCU grace
> periods for that long, a few extra printk() messages are probably the
> least of your worries...
> 
> Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
> ---
> 
>  include/linux/rcuclassic.h |    3 +
>  kernel/rcuclassic.c        |   80 +++++++++++++++++++++++++++++++++++++++++++++
>  lib/Kconfig.debug          |   13 +++++++
>  3 files changed, 96 insertions(+)
> 
> diff -urpNa -X dontdiff linux-2.6.27-rc1/include/linux/rcuclassic.h linux-2.6.27-rc1-cpustall/include/linux/rcuclassic.h
> --- linux-2.6.27-rc1/include/linux/rcuclassic.h	2008-07-30 08:48:16.000000000 -0700
> +++ linux-2.6.27-rc1-cpustall/include/linux/rcuclassic.h	2008-08-10 12:21:22.000000000 -0700
> @@ -46,6 +46,9 @@ struct rcu_ctrlblk {
>  	long	cur;		/* Current batch number.                      */
>  	long	completed;	/* Number of the last completed batch         */
>  	int	next_pending;	/* Is the next batch already waiting?         */
> +#ifdef CONFIG_RCU_CPU_STALL
> +	unsigned long gp_check;	/* Time grace period should end, in seconds.  */
> +#endif /* #ifdef CONFIG_RCU_CPU_STALL */
>  
>  	int	signaled;
>  
> diff -urpNa -X dontdiff linux-2.6.27-rc1/kernel/rcuclassic.c linux-2.6.27-rc1-cpustall/kernel/rcuclassic.c
> --- linux-2.6.27-rc1/kernel/rcuclassic.c	2008-07-30 08:48:17.000000000 -0700
> +++ linux-2.6.27-rc1-cpustall/kernel/rcuclassic.c	2008-08-10 17:51:32.000000000 -0700
> @@ -47,6 +47,7 @@
>  #include <linux/notifier.h>
>  #include <linux/cpu.h>
>  #include <linux/mutex.h>
> +#include <linux/time.h>
>  
>  #ifdef CONFIG_DEBUG_LOCK_ALLOC
>  static struct lock_class_key rcu_lock_key;
> @@ -269,6 +270,81 @@ static void rcu_do_batch(struct rcu_data
>   *   rcu_check_quiescent_state calls rcu_start_batch(0) to start the next grace
>   *   period (if necessary).
>   */
> +
> +#ifdef CONFIG_RCU_CPU_STALL
> +
> +static inline void record_gp_check_time(struct rcu_ctrlblk *rcp)
> +{
> +	rcp->gp_check = get_seconds() + 3;
> +}
> +static void print_other_cpu_stall(struct rcu_ctrlblk *rcp)
> +{
> +	int cpu;
> +	long delta;
> +
> +	/* Only let one CPU complain about others per time interval. */
> +
> +	spin_lock(&rcp->lock);
> +	delta = get_seconds() - rcp->gp_check;
> +	if (delta < 2L ||
> +	    cpus_empty(rcp->cpumask)) {
> +		spin_unlock(&rcp->lock);
> +		return;
> +	rcp->gp_check = get_seconds() + 30;
> +	}
> +	spin_unlock(&rcp->lock);
> +
> +	/* OK, time to rat on our buddy... */
> +
> +	printk(KERN_ERR "RCU detected CPU stalls:");
> +	for_each_cpu_mask(cpu, rcp->cpumask)
> +		printk(" %d", cpu);
> +	printk(" (detected by %d, t=%lu/%lu)\n",
> +	       smp_processor_id(), get_seconds(), rcp->gp_check);
> +}
> +static void print_cpu_stall(struct rcu_ctrlblk *rcp)
> +{
> +	printk(KERN_ERR "RCU detected CPU %d stall (t=%lu/%lu)\n",
> +			smp_processor_id(), get_seconds(), rcp->gp_check);
> +	dump_stack();
> +	spin_lock(&rcp->lock);
> +	if ((long)(get_seconds() - rcp->gp_check) >= 0L)
> +		rcp->gp_check = get_seconds() + 30;
> +	spin_unlock(&rcp->lock);
> +}
> +static inline void check_cpu_stall(struct rcu_ctrlblk *rcp,
> +				   struct rcu_data *rdp)
> +{
> +	long delta;
> +
> +	delta = get_seconds() - rcp->gp_check;
> +	if (cpu_isset(smp_processor_id(), rcp->cpumask) && delta >= 0L) {
> +
> +		/* We haven't checked in, so go dump stack. */
> +
> +		print_cpu_stall(rcp);
> +
> +	} else if (!cpus_empty(rcp->cpumask) && delta >= 2L) {
> +
> +		/* They had two seconds to dump stack, so complain. */
> +
> +		print_other_cpu_stall(rcp);
> +
> +	}
> +}
> +
> +#else /* #ifdef CONFIG_RCU_CPU_STALL */
> +
> +static inline void record_gp_check_time(struct rcu_ctrlblk *rcp)
> +{
> +}
> +static inline void check_cpu_stall(struct rcu_ctrlblk *rcp,
> +				   struct rcu_data *rdp)
> +{
> +}
> +
> +#endif /* #else #ifdef CONFIG_RCU_CPU_STALL */
> +
>  /*
>   * Register a new batch of callbacks, and start it up if there is currently no
>   * active batch and the batch to be registered has not already occurred.
> @@ -285,6 +361,7 @@ static void rcu_start_batch(struct rcu_c
>  		 */
>  		smp_wmb();
>  		rcp->cur++;
> +		record_gp_check_time(rcp);
>  
>  		/*
>  		 * Accessing nohz_cpu_mask before incrementing rcp->cur needs a
> @@ -468,6 +545,9 @@ static void rcu_process_callbacks(struct
>  
>  static int __rcu_pending(struct rcu_ctrlblk *rcp, struct rcu_data *rdp)
>  {
> +	/* Check for CPU stalls, if enabled. */
> +	check_cpu_stall(rcp, rdp);
> +
>  	/* This cpu has pending rcu entries and the grace period
>  	 * for them has completed.
>  	 */
> diff -urpNa -X dontdiff linux-2.6.27-rc1/lib/Kconfig.debug linux-2.6.27-rc1-cpustall/lib/Kconfig.debug
> --- linux-2.6.27-rc1/lib/Kconfig.debug	2008-07-30 08:48:17.000000000 -0700
> +++ linux-2.6.27-rc1-cpustall/lib/Kconfig.debug	2008-08-10 12:14:18.000000000 -0700
> @@ -597,6 +597,19 @@ config RCU_TORTURE_TEST_RUNNABLE
>  	  Say N here if you want the RCU torture tests to start only
>  	  after being manually enabled via /proc.
>  
> +config RCU_CPU_STALL
> +	bool "Check for stalled CPUs delaying RCU grace periods"
> +	depends on CLASSIC_RCU
> +	default n
> +	help
> +	  This option causes RCU to printk information on which
> +	  CPUs are delaying the current grace period, but only when
> +	  the grace period extends for excessive time periods.
> +
> +	  Say Y if you want RCU to perform such checks.
> +
> +	  Say N if you are unsure.
> +
>  config KPROBES_SANITY_TEST
>  	bool "Kprobes sanity tests"
>  	depends on DEBUG_KERNEL

[-- Attachment #2: stall.c --]
[-- Type: text/x-csrc, Size: 1719 bytes --]

/*
 * Taken from Linux Device Drivers, 3rd Edition.
 * Corbet, Rubini, and Kroah-Hartman.
 *
 * Modified by Paul E. McKenney (c) 2008 to test CPU-stall detection.
 */

#include <linux/init.h>
#include <linux/module.h>
#include <linux/sched.h>
#include <linux/kthread.h>
#include <linux/err.h>
#include <linux/time.h>
MODULE_LICENSE("Dual BSD/GPL");

int stall_secs = 0;
module_param(stall_secs, int, 0);
MODULE_PARM_DESC(stall_secs, "Number of seconds to stall");

static struct task_struct *kstall_task;

static int stall_thread(void *arg)
{
	unsigned long stop_at;

	printk(KERN_ALERT "stall_thread %ld\n", (long)arg);
	do {
		stop_at = get_seconds();
		printk(KERN_ALERT "stall_thread at %ld seconds\n", stop_at);
		stop_at += stall_secs;
		printk(KERN_ALERT "stall_thread wait until %ld seconds\n", stop_at);
		while ((long)(get_seconds() - stop_at) < 0)
			continue;
		printk(KERN_ALERT "stall_thread ended wait at %ld seconds\n", get_seconds());
		set_current_state(TASK_UNINTERRUPTIBLE);
		schedule_timeout(10 * HZ);
		printk(KERN_ALERT "stall_thread %ld awakening\n", (long)arg);
	} while (!kthread_should_stop());
	printk(KERN_ALERT "stall_thread %ld exiting\n", (long)arg);
	return 0;
}

static int stall_init(void)
{
	int err;

	printk(KERN_ALERT "Hello, world!  stall_secs = %d\n", stall_secs);
	kstall_task = kthread_run(stall_thread, (void *)(long)stall_secs,
				  "stall_thread");
	if (IS_ERR(kstall_task)) {
		err = PTR_ERR(kstall_task);
		kstall_task = NULL;
		return err;
	}
	return 0;
}

static void stall_exit(void)
{
	if (kstall_task != NULL) {
		kthread_stop(kstall_task);
	}
	kstall_task = NULL;
	printk(KERN_ALERT "Goodbye, cruel world\n");
}

module_init(stall_init);
module_exit(stall_exit);

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-09 13:56 ` Paul E. McKenney
@ 2008-08-11 11:25   ` Ingo Molnar
  2008-08-11 16:15     ` Yinghai Lu
  0 siblings, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-08-11 11:25 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Witbrodt, Peter Zijlstra, linux-kernel, Yinghai Lu,
	Thomas Gleixner, H. Peter Anvin, netdev

* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> > I'm _way_ over my head in this discussion, but here's some more food 
> > for thought.  Last weekend, when I first tried 2.6.26 and discovered 
> > the freeze, I thought an error of my own in .config was causing it.  
> > Before I ever sought help, I made about a dozen experiments with 
> > different .config files.
> > 
> > One series of those experiments involved turning off most of the 
> > kernel... including CONFIG_INET.  The kernel still froze, but when 
> > entering pci_init().  (This info can be read in my original post to 
> > the Debian BTS, which I have provided links for a couple of times in 
> > this LKML thread.  I even went further and removed enough that the 
> > freeze was avoided, but so much of the kernel was missing that my 
> > init scripts couldn't mount a hard disk any more.  Trying to restore 
> > enough to allow HD mounting just brought back the freeze.)
[...]
> 
> RCU doesn't use HPET directly.  Most of its time-dependent behavior 
> comes from its being invoked from the scheduling-clock interrupt.

such freezes frequently occur due to the plain lack of timer interrupts.

As networking's rcu_synchronize() is one of the first calls in the 
kernel that relies on a timer IRQ hitting the CPU, it would be the first 
one that "freezes". It's not a real freeze though: it's the lack of 
timer events breaking RCU completion. (RCU has an implicit and somewhat 
subtle dependency on timer irqs periodically hitting the CPU)

You can probably verify this by adding something like this to 
kernel/timer.c's do_timer() function:

   if (printk_ratelimit())
	printk("timer irq hit, jiffies: %ld\n", jiffies);

Yinghai, do you have any ideas about this particular problem? One theory 
would be that your e820 changes might have caused a shuffling of 
resources that made the hpet's timer IRQ generation inoperable.

David, it would be nice to check whether tip/master still locks up for 
you:

    http://people.redhat.com/mingo/tip.git/README

just to make sure no pending fix resolves your issue. (the bug is 
probably still present, but might be worth checking nevertheless.)

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH diagnostic] Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-11  1:35   ` [PATCH diagnostic] " Paul E. McKenney
  2008-08-11  1:38     ` Paul E. McKenney
@ 2008-08-11 11:38     ` Ingo Molnar
  2008-08-11 13:17       ` Paul E. McKenney
  1 sibling, 1 reply; 19+ messages in thread
From: Ingo Molnar @ 2008-08-11 11:38 UTC (permalink / raw)
  To: Paul E. McKenney
  Cc: David Witbrodt, Peter Zijlstra, linux-kernel, Yinghai Lu,
	Thomas Gleixner, H. Peter Anvin, netdev

* Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:

> And here is the patch.  It is still a bit raw, so the results should 
> be viewed with some suspicion.  It adds a default-off kernel parameter 
> CONFIG_RCU_CPU_STALL which must be enabled.
> 
> Rather than exponential backoff, it backs off to once per 30 seconds. 
> My feeling upon thinking on it was that if you have stalled RCU grace 
> periods for that long, a few extra printk() messages are probably the 
> least of your worries...

while this wont debug problems were timer irqs are genuinely stuck for 
long periods of time, it should find problems with RCU completion logic 
itself in the presence of correct timer irqs - and the lack of any 
messages from this debug option should point the finger more firmly in 
the direction of stalled timer irqs.

So i find this debug feature rather useful and have applied it to 
tip/core/rcu (and cleaned it up a bit). I renamed the config option to 
CONFIG_DEBUG_RCU_STALL to make it more in line with usual debug option 
names. Lets see whether -tip testing finds any false positives.

	Ingo

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH diagnostic] Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-11 11:38     ` Ingo Molnar
@ 2008-08-11 13:17       ` Paul E. McKenney
  2008-08-13  0:25         ` [PATCH diagnostic] Prevent console flood when one CPU sees another AWOL via RCU Paul E. McKenney
  0 siblings, 1 reply; 19+ messages in thread
From: Paul E. McKenney @ 2008-08-11 13:17 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Witbrodt, Peter Zijlstra, linux-kernel, Yinghai Lu,
	Thomas Gleixner, H. Peter Anvin, netdev

On Mon, Aug 11, 2008 at 01:38:17PM +0200, Ingo Molnar wrote:
> 
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> 
> > And here is the patch.  It is still a bit raw, so the results should 
> > be viewed with some suspicion.  It adds a default-off kernel parameter 
> > CONFIG_RCU_CPU_STALL which must be enabled.
> > 
> > Rather than exponential backoff, it backs off to once per 30 seconds. 
> > My feeling upon thinking on it was that if you have stalled RCU grace 
> > periods for that long, a few extra printk() messages are probably the 
> > least of your worries...
> 
> while this wont debug problems were timer irqs are genuinely stuck for 
> long periods of time, it should find problems with RCU completion logic 
> itself in the presence of correct timer irqs - and the lack of any 
> messages from this debug option should point the finger more firmly in 
> the direction of stalled timer irqs.
> 
> So i find this debug feature rather useful and have applied it to 
> tip/core/rcu (and cleaned it up a bit). I renamed the config option to 
> CONFIG_DEBUG_RCU_STALL to make it more in line with usual debug option 
> names. Lets see whether -tip testing finds any false positives.

Sounds good!

For whatever it is worth, this diagnostic can also locate latency issues
in non-CONFIG_PREEMPT kernels, even when those problems are outside of
preempt_disable() regions.  Latency tracer is of course a better tool
for things -inside- of preempt_disable() regions.

							Thanx, Paul

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-11 11:25   ` Ingo Molnar
@ 2008-08-11 16:15     ` Yinghai Lu
  0 siblings, 0 replies; 19+ messages in thread
From: Yinghai Lu @ 2008-08-11 16:15 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul E. McKenney, David Witbrodt, Peter Zijlstra, linux-kernel,
	Thomas Gleixner, H. Peter Anvin, netdev

On Mon, Aug 11, 2008 at 4:25 AM, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
>
>> > I'm _way_ over my head in this discussion, but here's some more food
>> > for thought.  Last weekend, when I first tried 2.6.26 and discovered
>> > the freeze, I thought an error of my own in .config was causing it.
>> > Before I ever sought help, I made about a dozen experiments with
>> > different .config files.
>> >
>> > One series of those experiments involved turning off most of the
>> > kernel... including CONFIG_INET.  The kernel still froze, but when
>> > entering pci_init().  (This info can be read in my original post to
>> > the Debian BTS, which I have provided links for a couple of times in
>> > this LKML thread.  I even went further and removed enough that the
>> > freeze was avoided, but so much of the kernel was missing that my
>> > init scripts couldn't mount a hard disk any more.  Trying to restore
>> > enough to allow HD mounting just brought back the freeze.)
> [...]
>>
>> RCU doesn't use HPET directly.  Most of its time-dependent behavior
>> comes from its being invoked from the scheduling-clock interrupt.
>
> such freezes frequently occur due to the plain lack of timer interrupts.
>
> As networking's rcu_synchronize() is one of the first calls in the
> kernel that relies on a timer IRQ hitting the CPU, it would be the first
> one that "freezes". It's not a real freeze though: it's the lack of
> timer events breaking RCU completion. (RCU has an implicit and somewhat
> subtle dependency on timer irqs periodically hitting the CPU)
>
> You can probably verify this by adding something like this to
> kernel/timer.c's do_timer() function:
>
>   if (printk_ratelimit())
>        printk("timer irq hit, jiffies: %ld\n", jiffies);
>
> Yinghai, do you have any ideas about this particular problem? One theory
> would be that your e820 changes might have caused a shuffling of
> resources that made the hpet's timer IRQ generation inoperable.

the hpet request_resource() calling fail?

YH

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
@ 2008-08-11 18:01 David Witbrodt
  2008-08-11 18:08 ` Yinghai Lu
  0 siblings, 1 reply; 19+ messages in thread
From: David Witbrodt @ 2008-08-11 18:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Paul E. McKenney, Peter Zijlstra, linux-kernel, Yinghai Lu,
	Thomas Gleixner, H. Peter Anvin, netdev

Thanks for the reply.  Let me address these out of order (cutting
your post):

> David, it would be nice to check whether tip/master still locks up for 
> you:
> 
>     http://people.redhat.com/mingo/tip.git/README
> 
> just to make sure no pending fix resolves your issue. (the bug is 
> probably still present, but might be worth checking nevertheless.)

Just to recap (not sure if you saw the earlier posts in the thread, going
back to Monday a week ago):

- kernel 2.6.25 worked for me without error

- kernel 2.6.26 locked up at boot when it finally became availble in
  Debian

- I was asked to grab Linus' git tree and try that, last Tuesday --
  v2.6.27-rc1 locked up like the 2.6.26 kernels.

OK, now grabbing "tip":
===== BEGIN SHELL STUFF =========
$ git remote add tip git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip.git
$ git remote update
$ git checkout tip/master

$ git show
commit 5cbd27ebcd387cbc48c47712eb671f35d85a575f
Merge: 2097987... 2ae111c...
Author: Ingo Molnar <mingo@elte.hu>
Date:   Mon Aug 11 16:43:42 2008 +0200

    Merge branch 'x86/core'
===== END SHELL STUFF =========

I copied into .config the configuration that I had prepared for 2.6.26
(which was prepared from my most recent version for 2.6.25) and ran
'make oldconfig'.  The kernel built fine, so I installed it and rebooted.

The kernel locked up as expected at inet_init().  Rebooting, and adding
"hpet=disable", allowed it to boot just fine... as expected.

I understand the need to do this, and was hoping that it would just
magically start working again.  But it didn't ... which means I don't
have to bisect again to find out when, where, and why!  ;)

> You can probably verify this by adding something like this to 
> kernel/timer.c's do_timer() function:
> 
>    if (printk_ratelimit())
>     printk("timer irq hit, jiffies: %ld\n", jiffies);

So I made this change to do_timer(), and rebuilt the kernel.

Unfortunately, I cannot report whether this change made a difference
in the output.  This is a 2.5 GHz AMD X2 4850e processor, and in the
few moments before freeze (maybe < 2 secs) the lines just scroll too
fast for me to read them.  Once the kernel locks up, that's it... no
more printk()'s.

I can report that the do_timer() change worked correctly, though.  I
rebooted with "hpet=disabled", and just happened to hit an automatic
fsck on a 460 GB partition.  The printk()'s interfere with the fsck 
progress indicator: about every 4 secs, ten of those printk()'s fire 
off very quickly.  (Pretty annoying, actually.)

> Yinghai, do you have any ideas about this particular problem? One theory 
> would be that your e820 changes might have caused a shuffling of 
> resources that made the hpet's timer IRQ generation inoperable.

It's so weird that this commit works on so many machines, but fails on
the 2 machines I have (with the same motherboard).  Of course, many
more people/machines might be affected by this issue, but they simply
aren't using 2.6.26 yet... and will find out later, the hard way!

Thanks,
Dave W.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-11 18:01 David Witbrodt
@ 2008-08-11 18:08 ` Yinghai Lu
  0 siblings, 0 replies; 19+ messages in thread
From: Yinghai Lu @ 2008-08-11 18:08 UTC (permalink / raw)
  To: David Witbrodt
  Cc: Ingo Molnar, Paul E. McKenney, Peter Zijlstra, linux-kernel,
	Thomas Gleixner, H. Peter Anvin, netdev

On Mon, Aug 11, 2008 at 11:01 AM, David Witbrodt <dawitbro@sbcglobal.net> wrote:
>
>
> Thanks for the reply.  Let me address these out of order (cutting
> your post):
>
>> David, it would be nice to check whether tip/master still locks up for
>> you:
>>
>>     http://people.redhat.com/mingo/tip.git/README
>>
>> just to make sure no pending fix resolves your issue. (the bug is
>> probably still present, but might be worth checking nevertheless.)
>
> Just to recap (not sure if you saw the earlier posts in the thread, going
> back to Monday a week ago):
>
> - kernel 2.6.25 worked for me without error

does hpet work at 2.6.25? or it is not enabled by default for the
famous chipset?

YH

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
@ 2008-08-11 19:05 David Witbrodt
  0 siblings, 0 replies; 19+ messages in thread
From: David Witbrodt @ 2008-08-11 19:05 UTC (permalink / raw)
  To: Yinghai Lu
  Cc: Ingo Molnar, Paul E. McKenney, Peter Zijlstra, linux-kernel,
	Thomas Gleixner, H. Peter Anvin, netdev

> > Just to recap (not sure if you saw the earlier posts in the thread, going
> > back to Monday a week ago):
> >
> > - kernel 2.6.25 worked for me without error
> 
> does hpet work at 2.6.25? or it is not enabled by default for the
> famous chipset?

Yes, 2.6.25 works perfectly.

I first discovered the problem (about 10 days ago) when compiling
my first 2.6.26 kernel, after Debian uploaded their first 2.6.26 kernel
sources (about 14 days ago).  My method is to begin with my previous 
working .config, and update it using 'make oldconfig'.  The output of 

  egrep "HPET|RCU|PREEMPT" .config

for several kernels, including my custom 2.6.25 and custom 2.6.26 kernels,
was posted in a message cc'd to you on Saturday, but you can see it here
if you no longer have it:

http://www.uwsg.indiana.edu/hypermail/linux/kernel/0808.1/0720.html

Your commit changes that produced my troubles were to e820_64.c and 
setup_64.c (in arch/x86/kernel).  These mostly had to do with adding
calls to insert_resource() where there had been none.

On one of my machines, this change causes no problem.  (That is also
true on your own machines, and the machines of everyone else on LKML.)
On my two machines which lock up because of the commit, the kernel dies
in inet_init().  The chain of calls is:

  inet_init() -->  inet_register_protosw() --> synchronize_net()
  --> synchronize_rcu()

Since inet_register_protosw() touches RCU features, like list_add_rcu(),
before calling synchronize_net(), I first guessed that the problem was
here.  Gut none of this seems to have any direct connection to the 
insert_resource() calls, or to HPET.

[Of course, since I am not a kernel programmer, I should not even be
guessing in the first place....  I really wish I could give more help.]

Thanks,
Dave W.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
@ 2008-08-12 15:17 David Witbrodt
  2008-08-12 16:03 ` Ray Lee
  0 siblings, 1 reply; 19+ messages in thread
From: David Witbrodt @ 2008-08-12 15:17 UTC (permalink / raw)
  To: linux-kernel
  Cc: Peter Zijlstra, Yinghai Lu, Ingo Molnar, Thomas Gleixner,
	H. Peter Anvin, Paul E. McKenney, netdev

BRAIN DAMAGE CONTROL:  the problem is only on my hardware, so no one 
on LKML can play with this hardware directly.  That makes _me_ the weak 
link.

1.  Can someone comment on whether I correctly identified the commit #
causing the issue for me.  Here is the 'git bisect' data from my first
post:

2.6.25, good
2.6.26-rc4, bad
10c993a6b5418cb1026775765ba4c70ffb70853d, bad
334d094504c2fe1c44211ecb49146ae6bca8c321, bad
eddeb0e2d863e3941d8768e70cb50c6120e61fa0, bad
77ad386e596c6b0930cc2e09e3cce485e3ee7f72, bad
ede1389f8ab4f3a1343e567133fa9720a054a3aa, bad
c048fdfe6178e082be918d4062c86d9764979112, bad
f73920cd63d316008738427a0df2caab6cc88ad7, bad
04aaa7ba096c707a8df337b29303f1a5a65f0462, good
8fa6878ffc6366f490e99a1ab31127fb599657c9, good
1180e01de50c0c7683c6648251f32957bc2d7850, good
1e934dda0c77c8ad13fdda02074f2cfcea118a56, bad
322850af8d93735f67b8ebf84bb1350639be3f34, good
3def3d6ddf43dbe20c00c3cbc38dfacc8586998f, bad
700efc1b9f6afe34caae231b87d129ad8ffb559f, good

I concluded that 3def3d... was causing the problem for me, but I didn't
actually pipe or redirect the output message from 'git bisect' when it
stated that.  Does that conclusion look OK?

2.  I have not tried different versions of gcc.  I did not think of
doing so because (a) I use the same version of gcc on all 3 machines,
(b) the kernel builds without error on all 3 machines, and (c) the
kernel runs on 1 machine ("desktop") but freezes on the other 2 
[which share the same mboard model as each other, but are different 
from the "desktop" mboard].  If gcc was bad, wouldn't the kernels
freeze on all the machines; and wouldn't the Debian BTS be full of
reports about kernel freezes with the recently released 2.6.26 line?

$ gcc -v
Using built-in specs.
Target: x86_64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Debian 4.3.1-8'
--with-bugurl=file:///usr/share/doc/gcc-4.3/README.Bugs
--enable-languages=c,c++,fortran,objc,obj-c++ --prefix=/usr
--enable-shared --with-system-zlib --libexecdir=/usr/lib
--without-included-gettext --enable-threads=posix --enable-nls
--with-gxx-include-dir=/usr/include/c++/4.3 --program-suffix=-4.3
--enable-clocale=gnu --enable-libstdcxx-debug --enable-objc-gc
--enable-mpfr --enable-cld --enable-checking=release
--build=x86_64-linux-gnu --host=x86_64-linux-gnu
--target=x86_64-linux-gnu
Thread model: posix
gcc version 4.3.1 (Debian 4.3.1-8) 

3.  I keep wanting to play with source code, but I keep repressing the
urge because I _know_ that I do not know what I'm doing.  I keep seeing
code that I want to alter, test, or otherwise play with.  For example:

A)  The commit above touches arch/x86/kernel/e820_64.c (now e820.c) in the
e820_reserve_resources() function this way:

@@ -245,21 +244,7 @@ 
         res->start = e820.map[i].addr;
         res->end = res->start + e820.map[i].size - 1;
         res->flags = IORESOURCE_MEM | IORESOURCE_BUSY;
-        request_resource(&iomem_resource, res);
-        if (e820.map[i].type == E820_RAM) {
-            /*
-             * We don't know which RAM region contains kernel data,
-             * so we try it repeatedly and let the resource manager
-             * test it.
-             */
-            request_resource(res, code_resource);
-            request_resource(res, data_resource);
-            request_resource(res, bss_resource);
-#ifdef CONFIG_KEXEC
-            if (crashk_res.start != crashk_res.end)
-                request_resource(res, &crashk_res);
-#endif
-        }
+        insert_resource(&iomem_resource, res);
     }
 }

I keep wondering whether my hardware needed something with the 
if(e820...) block that was removed (that the rest of the world does
not need).

B)  Since the commit mostly involved changes that add insert_resource()
calls, I look that that function in kernel/resource.c, and saw this
section:

    for (next = first; ; next = next->sibling) {
        /* Partial overlap? Bad, and unfixable */
        if (next->start < new->start || next->end > new->end)
            goto out;
        if (!next->sibling)
            break;
        if (next->sibling->start > new->end)
            break;
    }

Maybe the "partial overlap" is something that should never occur, and
occurs so rarely that most folks are never bitten.  Except me?

Chanting, "Every day, and in every way, I'm getting better and better..."
Dave W.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-12 15:17 David Witbrodt
@ 2008-08-12 16:03 ` Ray Lee
  0 siblings, 0 replies; 19+ messages in thread
From: Ray Lee @ 2008-08-12 16:03 UTC (permalink / raw)
  To: David Witbrodt
  Cc: linux-kernel, Peter Zijlstra, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Paul E. McKenney, netdev

On Tue, Aug 12, 2008 at 8:17 AM, David Witbrodt <dawitbro@sbcglobal.net> wrote:
> BRAIN DAMAGE CONTROL:  the problem is only on my hardware, so no one
> on LKML can play with this hardware directly.  That makes _me_ the weak
> link.

Heh. Can I offer a suggestion here? You're trying to do two things at
once -- finding where the problem is, and also trying to understand
the problem at the same time. Speaking just for myself, I try to
either do one of those or the other, but not both at the same time
:-). Since you bisected it (seems like a good log when I view the
commit history, but I'm no git expert), let's just work with that.

> 1.  Can someone comment on whether I correctly identified the commit #
> causing the issue for me.  Here is the 'git bisect' data from my first
> post:
>
> 2.6.25, good
> 2.6.26-rc4, bad
> 10c993a6b5418cb1026775765ba4c70ffb70853d, bad
> 334d094504c2fe1c44211ecb49146ae6bca8c321, bad
> eddeb0e2d863e3941d8768e70cb50c6120e61fa0, bad
> 77ad386e596c6b0930cc2e09e3cce485e3ee7f72, bad
> ede1389f8ab4f3a1343e567133fa9720a054a3aa, bad
> c048fdfe6178e082be918d4062c86d9764979112, bad
> f73920cd63d316008738427a0df2caab6cc88ad7, bad
> 04aaa7ba096c707a8df337b29303f1a5a65f0462, good
> 8fa6878ffc6366f490e99a1ab31127fb599657c9, good
> 1180e01de50c0c7683c6648251f32957bc2d7850, good
> 1e934dda0c77c8ad13fdda02074f2cfcea118a56, bad
> 322850af8d93735f67b8ebf84bb1350639be3f34, good
> 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f, bad
> 700efc1b9f6afe34caae231b87d129ad8ffb559f, good
>
> I concluded that 3def3d... was causing the problem for me, but I didn't
> actually pipe or redirect the output message from 'git bisect' when it
> stated that.  Does that conclusion look OK?

Git should have printed out "<SHA1> is first bad commit" Did you see
that? If not, you stopped the process too soon. Viewing the history
with gitk, though, it seems you fingered the right commit. Which leads
to the next step...

> 2.  I have not tried different versions of gcc.

Which is not this :-).

> 3.  I keep wanting to play with source code,

Or this :-).

Can you try reverting that commit against the top of the latest tree,
and see if the revert applies correctly? If it does, compile and boot
and see if it works. If it does, it'll be Yinghai's job to figure out
what went wrong, not yours (unless you're a real gluton for
punishment, and happen to know what was going on in Yinghai's head
when he decided that it was safe to make those changes).

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
@ 2008-08-12 17:29 David Witbrodt
  2008-08-12 17:38 ` Ray Lee
  0 siblings, 1 reply; 19+ messages in thread
From: David Witbrodt @ 2008-08-12 17:29 UTC (permalink / raw)
  To: Ray Lee
  Cc: linux-kernel, Peter Zijlstra, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Paul E. McKenney, netdev

> > BRAIN DAMAGE CONTROL:  the problem is only on my hardware, so no one
> > on LKML can play with this hardware directly.  That makes _me_ the weak
> > link.
> 
> Heh. Can I offer a suggestion here? You're trying to do two things at
> once -- finding where the problem is, and also trying to understand
> the problem at the same time. Speaking just for myself, I try to
> either do one of those or the other, but not both at the same time
> :-). Since you bisected it (seems like a good log when I view the
> commit history, but I'm no git expert), let's just work with that.

I having nothing but gratitude for anyone who has any suggestions
whatsoever!

I do _want_ to do both of those things... but you are right that no
one should try to do them both at the same time.  BTW, the bisect data
was from my first post (trying to _find_ the problem) about 8 days ago.
Since that time, I have been _assuming_ I located the cause problem,
and had not thought about it again... until today.

Unfortunately, this kernel stuff can get very deep.  Just finding the
commit, where "before" works and "after" does not, doesn't necessarily
seem to mean that the commit is the problem.  It could also mean that
the problem was already present, and the commit exposed it.  So for me,
or anyone, to be going over that commit with a fine-toothed comb could
either be exactly what's needed or a complete waste of time!

> > 1.  Can someone comment on whether I correctly identified the commit #
> > causing the issue for me.  Here is the 'git bisect' data from my first
> > post:
> >
> > 2.6.25, good
> > 2.6.26-rc4, bad
> > 10c993a6b5418cb1026775765ba4c70ffb70853d, bad
[...]
> > 3def3d6ddf43dbe20c00c3cbc38dfacc8586998f, bad
> > 700efc1b9f6afe34caae231b87d129ad8ffb559f, good
> >
> > I concluded that 3def3d... was causing the problem for me, but I didn't
> > actually pipe or redirect the output message from 'git bisect' when it
> > stated that.  Does that conclusion look OK?
> 
> Git should have printed out "is first bad commit" Did you see
> that? If not, you stopped the process too soon. Viewing the history
> with gitk, though, it seems you fingered the right commit. Which leads
> to the next step...

It _did_ print that message.  My meaning here was that I did not have
the presence of mind to record that output in any way.  I have a memory
that the output indicated "3d...", but that could be a false memory.
(Thus the reference to brain damage.)

What I _was_ careful to record was the list posted above.  I used a
text editor to take notes after each stage of the bisecting process,
but I failed to actually take that last note on the finally output.  That
list documents all of iterations in the process, and I concluded that it
was identifying commit #3def..., and I just wanted to make sure everyone
agreed with that.

> > 2.  I have not tried different versions of gcc.
> 
> Which is not this :-).

  Just making sure... but thanks!

> > 3.  I keep wanting to play with source code,
> 
> Or this :-).

I thought so....

Since this _feels_ like my problem alone, then it _feels_ like I
should have to be the one to fix it.  I hate having to throw my arms
up and admit I am unable to do something about it....

> Can you try reverting that commit against the top of the latest tree,
> and see if the revert applies correctly? If it does, compile and boot
> and see if it works.

OK, I can give it a try.  I probably would have tried it already but I 
noticed that the file(s) touched by the commit had been merged with one
or more other files in the meantime, so I decided to leave it alone.  But
now that I think about it, what is the worst that can happen?  The kernel
will fail to build?  It will build, but it won't run?  That's happening
already, so I have nothing to lose.

Thanks!

> If it does, it'll be Yinghai's job to figure out
> what went wrong, not yours (unless you're a real gluton for
> punishment, and happen to know what was going on in Yinghai's head
> when he decided that it was safe to make those changes).

Well, it's his job only if his commit broke something.

Is 2.6.2[67] broken on your machine?  Or on anyone's machine on LKML?
These kernels even work on one of my machines!  So it's not clear that
Yinghai's commit is to blame:  maybe it is, maybe it isn't.  All that
we know is that commit triggered the problem, and we don't even know
whether the problem will affect a lot of hardware or just mine!

Anyway, I think you were really just trying to be nice, and I thank you
for it.  I do feel marginally better.  ;)

I'll really feel better if the reverting experiment works.

Thanks,
Dave W.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: HPET regression in 2.6.26 versus 2.6.25 -- RCU problem
  2008-08-12 17:29 HPET regression in 2.6.26 versus 2.6.25 -- RCU problem David Witbrodt
@ 2008-08-12 17:38 ` Ray Lee
  0 siblings, 0 replies; 19+ messages in thread
From: Ray Lee @ 2008-08-12 17:38 UTC (permalink / raw)
  To: David Witbrodt
  Cc: linux-kernel, Peter Zijlstra, Yinghai Lu, Ingo Molnar,
	Thomas Gleixner, H. Peter Anvin, Paul E. McKenney, netdev

On Tue, Aug 12, 2008 at 10:29 AM, David Witbrodt <dawitbro@sbcglobal.net> wrote:
> Since this _feels_ like my problem alone, then it _feels_ like I
> should have to be the one to fix it.

Hard data first, and then there will be plenty of time for blame
later. Not that there's any real blame for anyone here, we just have a
bug that needs to be found and fixed.

> I hate having to throw my arms
> up and admit I am unable to do something about it....

Finding the problem is over half the battle, so you *are* doing something.

>> Can you try reverting that commit against the top of the latest tree,
>> and see if the revert applies correctly? If it does, compile and boot
>> and see if it works.
>
> OK, I can give it a try.  I probably would have tried it already but I
> noticed that the file(s) touched by the commit had been merged with one
> or more other files in the meantime, so I decided to leave it alone.  But
> now that I think about it, what is the worst that can happen?  The kernel
> will fail to build?  It will build, but it won't run?  That's happening
> already, so I have nothing to lose.

Right.

>> If it does, it'll be Yinghai's job to figure out
>> what went wrong, not yours (unless you're a real gluton for
>> punishment, and happen to know what was going on in Yinghai's head
>> when he decided that it was safe to make those changes).
>
> Well, it's his job only if his commit broke something.

His commit may have uncovered a latent problem somewhere else, that
happens often. But if the commit really is the trouble one, then two
things happen: It's rc3 or rc4 now, so we just revert the damn thing,
and then (secondly) he works with you (by adding debugging or
whatever) to figure out where the problem actually is.

The point I'm trying to make here is when you take on too much for
yourself, then it slows down debugging the problem, and means whatever
issue is in the code will be in there longer, affecting more people.

> Is 2.6.2[67] broken on your machine?

2.6.26-rc9+ seems to work fine on my system. I haven't tried 2.6.26.0
or 2.6.27-rcX yet, I'm overloaded with actual work and other things
right now. But no one is ever alone in problems with the kernel, only
alone in reporting them. You're a canary in the coal mine.

> Or on anyone's machine on LKML?
> These kernels even work on one of my machines!  So it's not clear that
> Yinghai's commit is to blame:  maybe it is, maybe it isn't.  All that
> we know is that commit triggered the problem, and we don't even know
> whether the problem will affect a lot of hardware or just mine!

Yes, all of that is true, but changes nothing.

> Anyway, I think you were really just trying to be nice, and I thank you
> for it.  I do feel marginally better.  ;)

I *am* nice :-). Email is tricky sometimes, y'know?

> I'll really feel better if the reverting experiment works.

Good luck.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH diagnostic] Prevent console flood when one CPU sees another AWOL via RCU
  2008-08-11 13:17       ` Paul E. McKenney
@ 2008-08-13  0:25         ` Paul E. McKenney
  0 siblings, 0 replies; 19+ messages in thread
From: Paul E. McKenney @ 2008-08-13  0:25 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: David Witbrodt, Peter Zijlstra, linux-kernel, Yinghai Lu,
	Thomas Gleixner, H. Peter Anvin, netdev

On Mon, Aug 11, 2008 at 06:17:28AM -0700, Paul E. McKenney wrote:
> On Mon, Aug 11, 2008 at 01:38:17PM +0200, Ingo Molnar wrote:
> > 
> > * Paul E. McKenney <paulmck@linux.vnet.ibm.com> wrote:
> > 
> > > And here is the patch.  It is still a bit raw, so the results should 
> > > be viewed with some suspicion.  It adds a default-off kernel parameter 
> > > CONFIG_RCU_CPU_STALL which must be enabled.
> > > 
> > > Rather than exponential backoff, it backs off to once per 30 seconds. 
> > > My feeling upon thinking on it was that if you have stalled RCU grace 
> > > periods for that long, a few extra printk() messages are probably the 
> > > least of your worries...
> > 
> > while this wont debug problems were timer irqs are genuinely stuck for 
> > long periods of time, it should find problems with RCU completion logic 
> > itself in the presence of correct timer irqs - and the lack of any 
> > messages from this debug option should point the finger more firmly in 
> > the direction of stalled timer irqs.
> > 
> > So i find this debug feature rather useful and have applied it to 
> > tip/core/rcu (and cleaned it up a bit). I renamed the config option to 
> > CONFIG_DEBUG_RCU_STALL to make it more in line with usual debug option 
> > names. Lets see whether -tip testing finds any false positives.
> 
> Sounds good!
> 
> For whatever it is worth, this diagnostic can also locate latency issues
> in non-CONFIG_PREEMPT kernels, even when those problems are outside of
> preempt_disable() regions.  Latency tracer is of course a better tool
> for things -inside- of preempt_disable() regions.

One small change needed to keep from flooding the console when one
CPU notices that another is AWOL.  Unless I am missing something subtle.
Otherwise the cleanups look good!

Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
---

 rcuclassic.c |    1 +
 1 file changed, 1 insertion(+)

diff --git a/kernel/rcuclassic.c b/kernel/rcuclassic.c
index 56b8712..dab2676 100644
--- a/kernel/rcuclassic.c
+++ b/kernel/rcuclassic.c
@@ -308,6 +308,7 @@ static void print_other_cpu_stall(struct rcu_ctrlblk *rcp)
 		spin_unlock(&rcp->lock);
 		return;
 	}
+	rcp->gp_check = get_seconds() + 30;
 	spin_unlock(&rcp->lock);
 
 	/* OK, time to rat on our buddy... */

^ permalink raw reply related	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-08-13  0:25 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-08-09 22:35 HPET regression in 2.6.26 versus 2.6.25 -- RCU problem David Witbrodt
2008-08-10 15:15 ` Paul E. McKenney
2008-08-11  1:35   ` [PATCH diagnostic] " Paul E. McKenney
2008-08-11  1:38     ` Paul E. McKenney
2008-08-11 11:38     ` Ingo Molnar
2008-08-11 13:17       ` Paul E. McKenney
2008-08-13  0:25         ` [PATCH diagnostic] Prevent console flood when one CPU sees another AWOL via RCU Paul E. McKenney
  -- strict thread matches above, loose matches on Subject: below --
2008-08-12 17:29 HPET regression in 2.6.26 versus 2.6.25 -- RCU problem David Witbrodt
2008-08-12 17:38 ` Ray Lee
2008-08-12 15:17 David Witbrodt
2008-08-12 16:03 ` Ray Lee
2008-08-11 19:05 David Witbrodt
2008-08-11 18:01 David Witbrodt
2008-08-11 18:08 ` Yinghai Lu
2008-08-09 12:39 David Witbrodt
2008-08-09 13:56 ` Paul E. McKenney
2008-08-11 11:25   ` Ingo Molnar
2008-08-11 16:15     ` Yinghai Lu
     [not found] <506429.22669.qm@web82107.mail.mud.yahoo.com>
2008-08-09  7:34 ` Peter Zijlstra

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).