From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
Thomas Gleixner <tglx@linutronix.de>,
Gerald Schaefer <gerald.schaefer@de.ibm.com>,
manfred@colorfullife.com
Subject: Re: [BUG] race of RCU vs NOHU
Date: Fri, 7 Aug 2009 07:29:57 -0700 [thread overview]
Message-ID: <20090807142957.GB6700@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090807151529.2806b8b4@skybase>
On Fri, Aug 07, 2009 at 03:15:29PM +0200, Martin Schwidefsky wrote:
> Hi Paul,
> I analysed a dump of a hanging 2.6.30 system and found what I think is
> a bug of RCU vs NOHZ. There are a number of patches ontop of that
> kernel but they should be independent of the bug.
>
> The systems has 4 cpus and uses classic RCU. cpus #0, #2 and #3 woke up
> recently, cpu #1 has been sleeping for 5 minutes, but there is a pending
> rcu batch. The timer wheel for cpu #1 is empty, it will continue to
> sleep for NEXT_TIMER_MAX_DELTA ticks.
Congratulations, Martin! You have exercised what to date has been a
theoretical bug identified last year by Manfred Spraul. The fix is to
switch from CONFIG_RCU_CLASSIC to CONFIG_RCU_TREE, which was added in
2.6.29.
Of course, if you need to work with an old kernel version, you might
still need a patch, perhaps for the various -stable versions. If so,
please let me know -- otherwise, I will focus forward on CONFIG_RCU_TREE
rather than backwards on CONFIG_RCU_CLASSIC.
Thanx, Paul
> Now if I look at the RCU data structures I find this:
>
> rcu_ctrlblk
> >> px *(struct rcu_ctrlblk *) 0x810000
> struct rcu_ctrlblk {
> cur = 0xffffffffffffff99
> completed = 0xffffffffffffff98
> pending = 0xffffffffffffff99
> signaled = 0x0
> lock = spinlock_t {
> raw_lock = raw_spinlock_t {
> owner_cpu = 0x0
> }
> break_lock = 0x0
> magic = 0xdead4ead
> owner_cpu = 0xffffffff
> owner = 0xffffffffffffffff
> dep_map = struct lockdep_map {
> key = 0x810118
> class_cache = 0xcbcff0
> name = 0x63e944
> cpu = 0x0
> ip = 0x1a7f64
> }
> }
> cpumask = {
> [0] 0x2
> }
> }
>
> rcu_data cpu #0
> >> px *(struct rcu_data *) 0x872f8430
> struct rcu_data {
> quiescbatch = 0xffffffffffffff99
> passed_quiesc = 0x1
> qs_pending = 0x0
> batch = 0xffffffffffffff97
> nxtlist = (nil)
> nxttail = {
> [0] 0x872f8448
> [1] 0x872f8448
> [2] 0x872f8448
> }
> qlen = 0x0
> donelist = (nil)
> donetail = 0x872f8470
> blimit = 0xa
> cpu = 0x0
> barrier = struct rcu_head {
> next = (nil)
> func = 0x0
> }
> }
>
> rcu_data cpu #1
> >> px *(struct rcu_data *) 0x874be430
> struct rcu_data {
> quiescbatch = 0xffffffffffffff98
> passed_quiesc = 0x1
> qs_pending = 0x0
> batch = 0xffffffffffffff97
> nxtlist = (nil)
> nxttail = {
> [0] 0x874be448
> [1] 0x874be448
> [2] 0x874be448
> }
> qlen = 0x0
> donelist = (nil)
> donetail = 0x874be470
> blimit = 0xa
> cpu = 0x1
> barrier = struct rcu_head {
> next = (nil)
> func = 0x0
> }
> }
>
> rcu_data cpu #2
> >> px *(struct rcu_data *) 0x87684430
> struct rcu_data {
> quiescbatch = 0xffffffffffffff99
> passed_quiesc = 0x1
> qs_pending = 0x0
> batch = 0xffffffffffffff99
> nxtlist = 0xffc1fc18
> nxttail = {
> [0] 0x87684448
> [1] 0x87684448
> [2] 0xffc1fc18
> }
> qlen = 0x1
> donelist = (nil)
> donetail = 0x87684470
> blimit = 0xa
> cpu = 0x2
> barrier = struct rcu_head {
> next = (nil)
> func = 0x0
> }
> }
>
> rcu_data cpu #3
> >> px *(struct rcu_data *) 0x8784a430
> struct rcu_data {
> quiescbatch = 0xffffffffffffff99
> passed_quiesc = 0x1
> qs_pending = 0x0
> batch = 0xffffffffffffff63
> nxtlist = (nil)
> nxttail = {
> [0] 0x8784a448
> [1] 0x8784a448
> [2] 0x8784a448
> }
> qlen = 0x0
> donelist = (nil)
> donetail = 0x8784a470
> blimit = 0xa
> cpu = 0x3
> barrier = struct rcu_head {
> next = (nil)
> func = 0x0
> }
> }
>
> At the time cpu #1 went to sleep rcu_needs_cpu must have answered false,
> otherwise a 1 tick delay would have been programmed. rcu_pending compares
> rcu_ctrlblk.cur with rcu_data.quiescbatch for cpu #1. So these two must
> have been equal otherwise rcu_needs_cpu would have answered true.
> That means that the rcu_needs_cpu check has been completed before
> rcu_start_batch for batch 0xffffffffffffff99. The bit for cpu #1 is
> still set in the rcu_ctrlblk.cpumask, therefore the bit for cpu #1
> in nohz_cpu_mask can not have been set at the time rcu_start_batch has
> completed. That gives the following race (cpu 0 is starting the batch,
> cpu 1 is going to sleep):
>
> cpu 1: tick_nohz_stop_sched_tick: rcu_needs_cpu();
> cpu 0: rcu_start_batch: rcp->cur++;
> cpu 0: rcu_start_batch: cpumask_andnot(to_cpumask(rcp->cpumask),
> cpu_online_mask, nonz_cpu_mask);
> cpu 1: tick_nohz_stop_schedk_tick: cpumask_set_cpu(1, nohz_cpu_mask);
>
> The order of i) setting the bit in nohz_cpu_mask and ii) the rcu_needs_cpu()
> check in tick_nohz_stop_sched_tick is wrong, no? Or did I miss some suble
> check that comes afterwards ?
>
> --
> blue skies,
> Martin.
>
> "Reality continues to ruin my life." - Calvin.
>
next prev parent reply other threads:[~2009-08-07 14:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2009-08-07 13:15 [BUG] race of RCU vs NOHU Martin Schwidefsky
2009-08-07 14:29 ` Paul E. McKenney [this message]
2009-08-10 12:25 ` Martin Schwidefsky
2009-08-10 15:08 ` Paul E. McKenney
2009-08-11 10:56 ` Martin Schwidefsky
2009-08-11 14:52 ` Paul E. McKenney
2009-08-11 15:17 ` Martin Schwidefsky
2009-08-11 18:04 ` Paul E. McKenney
2009-08-12 7:32 ` Martin Schwidefsky
2009-08-21 15:54 ` Paul E. McKenney
2009-08-31 8:47 ` Martin Schwidefsky
2009-08-31 14:30 ` Paul E. McKenney
2009-08-11 16:58 ` Greg KH
2009-08-10 16:10 ` Pavel Machek
2009-08-11 21:23 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20090807142957.GB6700@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=gerald.schaefer@de.ibm.com \
--cc=linux-kernel@vger.kernel.org \
--cc=manfred@colorfullife.com \
--cc=mingo@elte.hu \
--cc=schwidefsky@de.ibm.com \
--cc=tglx@linutronix.de \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.