From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Thomas Gleixner <tglx@linutronix.de>,
Gerald Schaefer <gerald.schaefer@de.ibm.com>,
manfred@colorfullife.com, Ihno Krumreich <ihno@suse.de>,
Greg KH <gregkh@suse.de>
Subject: Re: [BUG] race of RCU vs NOHU
Date: Fri, 21 Aug 2009 08:54:18 -0700 [thread overview]
Message-ID: <20090821155418.GB6735@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090812093233.4006b9a1@skybase>
On Wed, Aug 12, 2009 at 09:32:33AM +0200, Martin Schwidefsky wrote:
> On Tue, 11 Aug 2009 11:04:07 -0700
> "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
>
> > On Tue, Aug 11, 2009 at 05:17:51PM +0200, Martin Schwidefsky wrote:
> > > On Tue, 11 Aug 2009 07:52:22 -0700
> > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > >
> > > > On Tue, Aug 11, 2009 at 12:56:53PM +0200, Martin Schwidefsky wrote:
> > > > > On Mon, 10 Aug 2009 08:08:07 -0700
> > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > >
> > > > > > On Mon, Aug 10, 2009 at 02:25:35PM +0200, Martin Schwidefsky wrote:
> > > > > > > On Fri, 7 Aug 2009 07:29:57 -0700
> > > > > > > "Paul E. McKenney" <paulmck@linux.vnet.ibm.com> wrote:
> > > > > > >
> > > > > > > > On Fri, Aug 07, 2009 at 03:15:29PM +0200, Martin Schwidefsky wrote:
> > > > > > > > > Hi Paul,
> > > > > > > > > I analysed a dump of a hanging 2.6.30 system and found what I think is
> > > > > > > > > a bug of RCU vs NOHZ. There are a number of patches ontop of that
> > > > > > > > > kernel but they should be independent of the bug.
> > > > > > > > >
> > > > > > > > > The systems has 4 cpus and uses classic RCU. cpus #0, #2 and #3 woke up
> > > > > > > > > recently, cpu #1 has been sleeping for 5 minutes, but there is a pending
> > > > > > > > > rcu batch. The timer wheel for cpu #1 is empty, it will continue to
> > > > > > > > > sleep for NEXT_TIMER_MAX_DELTA ticks.
> > > > > > > >
> > > > > > > > Congratulations, Martin! You have exercised what to date has been a
> > > > > > > > theoretical bug identified last year by Manfred Spraul. The fix is to
> > > > > > > > switch from CONFIG_RCU_CLASSIC to CONFIG_RCU_TREE, which was added in
> > > > > > > > 2.6.29.
> > > > > > > >
> > > > > > > > Of course, if you need to work with an old kernel version, you might
> > > > > > > > still need a patch, perhaps for the various -stable versions. If so,
> > > > > > > > please let me know -- otherwise, I will focus forward on CONFIG_RCU_TREE
> > > > > > > > rather than backwards on CONFIG_RCU_CLASSIC.
> > > > > > >
> > > > > > > SLES11 is 2.6.27 and uses classic RCU. The not-so theoretical bug is
> > > > > > > present there and I think it needs to be fixed :-/
> > > > > >
> > > > > > I was afraid of that. ;-)
> > > > > >
> > > > > > Given that there are some other theoretical bugs in Classic RCU involving
> > > > > > interrupts and CONFIG_NO_HZ, would backporting CONFIG_TREE_RCU make more
> > > > > > sense than playing whack-a-mole on Classic RCU bugs?
> > > > >
> > > > > Fine with me but I don't know if SuSE/Novell is willing to accept such a
> > > > > big change for an existing distribution. I've put Ihno and Greg on Cc.
> > > >
> > > > Good point! While they are thinking about the tradeoff between
> > > > whack-a-mole on Classic RCU and backporting CONFIG_TREE_RCU, if I was
> > > > to send you a patch backporting CONFIG_TREE_RCU, to exactly which kernel
> > > > version(s) should I backport it to?
> > >
> > > We found the bug with kernel version 2.6.30 - the kernel on our test systems
> > > still use classic RCU. For us it is easy to switch to tree-RCU, no patch
> > > required.
> >
> > Ah! Could you please send me the test you use? My tests were
> > insufficient to force this problem to happen.
>
> There is no specific test, just a regular system boot. The boot did not
> finish and our tester took a dump. This boot failure seems to happen from
> time to time.
OK. Has CONFIG_TREE_RCU been working for you? If so, which variant
of 2.6.27 do you need a backport to?
Thanx, Paul
next prev parent reply other threads:[~2009-08-21 15:54 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 13:15 [BUG] race of RCU vs NOHU Martin Schwidefsky
2009-08-07 14:29 ` Paul E. McKenney
2009-08-10 12:25 ` Martin Schwidefsky
2009-08-10 15:08 ` Paul E. McKenney
2009-08-11 10:56 ` Martin Schwidefsky
2009-08-11 14:52 ` Paul E. McKenney
2009-08-11 15:17 ` Martin Schwidefsky
2009-08-11 18:04 ` Paul E. McKenney
2009-08-12 7:32 ` Martin Schwidefsky
2009-08-21 15:54 ` Paul E. McKenney [this message]
2009-08-31 8:47 ` Martin Schwidefsky
2009-08-31 14:30 ` Paul E. McKenney
2009-08-11 16:58 ` Greg KH
2009-08-10 16:10 ` Pavel Machek
2009-08-11 21:23 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090821155418.GB6735@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=gerald.schaefer@de.ibm.com \
--cc=gregkh@suse.de \
--cc=ihno@suse.de \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=mingo@elte.hu \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.