public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: linux-kernel@vger.kernel.org, Ingo Molnar <mingo@elte.hu>,
	Thomas Gleixner <tglx@linutronix.de>,
	Gerald Schaefer <gerald.schaefer@de.ibm.com>,
	manfred@colorfullife.com
Subject: Re: [BUG] race of RCU vs NOHU
Date: Fri, 7 Aug 2009 07:29:57 -0700	[thread overview]
Message-ID: <20090807142957.GB6700@linux.vnet.ibm.com> (raw)
In-Reply-To: <20090807151529.2806b8b4@skybase>

On Fri, Aug 07, 2009 at 03:15:29PM +0200, Martin Schwidefsky wrote:
> Hi Paul,
> I analysed a dump of a hanging 2.6.30 system and found what I think is
> a bug of RCU vs NOHZ. There are a number of patches ontop of that
> kernel but they should be independent of the bug.
> 
> The systems has 4 cpus and uses classic RCU. cpus #0, #2 and #3 woke up
> recently, cpu #1 has been sleeping for 5 minutes, but there is a pending
> rcu batch. The timer wheel for cpu #1 is empty, it will continue to
> sleep for NEXT_TIMER_MAX_DELTA ticks.

Congratulations, Martin!  You have exercised what to date has been a
theoretical bug identified last year by Manfred Spraul.  The fix is to
switch from CONFIG_RCU_CLASSIC to CONFIG_RCU_TREE, which was added in
2.6.29.

Of course, if you need to work with an old kernel version, you might
still need a patch, perhaps for the various -stable versions.  If so,
please let me know -- otherwise, I will focus forward on CONFIG_RCU_TREE
rather than backwards on CONFIG_RCU_CLASSIC.

							Thanx, Paul

> Now if I look at the RCU data structures I find this:
> 
> rcu_ctrlblk
> >> px *(struct rcu_ctrlblk *) 0x810000
> struct rcu_ctrlblk {
>         cur = 0xffffffffffffff99
>         completed = 0xffffffffffffff98
>         pending = 0xffffffffffffff99
>         signaled = 0x0
>         lock = spinlock_t {
>                 raw_lock = raw_spinlock_t {
>                         owner_cpu = 0x0
>                 }
>                 break_lock = 0x0
>                 magic = 0xdead4ead
>                 owner_cpu = 0xffffffff
>                 owner = 0xffffffffffffffff
>                 dep_map = struct lockdep_map {
>                         key = 0x810118
>                         class_cache = 0xcbcff0
>                         name = 0x63e944
>                         cpu = 0x0
>                         ip = 0x1a7f64
>                 }
>         }
>         cpumask = {
>                 [0] 0x2
>         }
> }
> 
> rcu_data cpu #0
> >> px *(struct rcu_data *) 0x872f8430
> struct rcu_data {
>         quiescbatch = 0xffffffffffffff99
>         passed_quiesc = 0x1
>         qs_pending = 0x0
>         batch = 0xffffffffffffff97
>         nxtlist = (nil)
>         nxttail = {
>                 [0]         0x872f8448
>                 [1]         0x872f8448
>                 [2]         0x872f8448
>         }
>         qlen = 0x0
>         donelist = (nil)
>         donetail = 0x872f8470
>         blimit = 0xa
>         cpu = 0x0
>         barrier = struct rcu_head {
>                 next = (nil)
>                 func = 0x0
>         }
> }
> 
> rcu_data cpu #1
> >> px *(struct rcu_data *) 0x874be430
> struct rcu_data {
>         quiescbatch = 0xffffffffffffff98
>         passed_quiesc = 0x1
>         qs_pending = 0x0
>         batch = 0xffffffffffffff97
>         nxtlist = (nil)
>         nxttail = {
>                 [0]         0x874be448
>                 [1]         0x874be448
>                 [2]         0x874be448
>         }
>         qlen = 0x0
>         donelist = (nil)
>         donetail = 0x874be470
>         blimit = 0xa
>         cpu = 0x1
>         barrier = struct rcu_head {
>                 next = (nil)
>                 func = 0x0
>         }
> }
> 
> rcu_data cpu #2
> >> px *(struct rcu_data *) 0x87684430
> struct rcu_data {
>         quiescbatch = 0xffffffffffffff99
>         passed_quiesc = 0x1
>         qs_pending = 0x0
>         batch = 0xffffffffffffff99
>         nxtlist = 0xffc1fc18
>         nxttail = {
>                 [0]         0x87684448
>                 [1]         0x87684448
>                 [2]         0xffc1fc18
>         }
>         qlen = 0x1
>         donelist = (nil)
>         donetail = 0x87684470
>         blimit = 0xa
>         cpu = 0x2
>         barrier = struct rcu_head {
>                 next = (nil)
>                 func = 0x0
>         }
> }
> 
> rcu_data cpu #3
> >> px *(struct rcu_data *) 0x8784a430
> struct rcu_data {
>         quiescbatch = 0xffffffffffffff99
>         passed_quiesc = 0x1
>         qs_pending = 0x0
>         batch = 0xffffffffffffff63
>         nxtlist = (nil)
>         nxttail = {
>                 [0]         0x8784a448
>                 [1]         0x8784a448
>                 [2]         0x8784a448
>         }
>         qlen = 0x0
>         donelist = (nil)
>         donetail = 0x8784a470
>         blimit = 0xa
>         cpu = 0x3
>         barrier = struct rcu_head {
>                 next = (nil)
>                 func = 0x0
>         }
> }
> 
> At the time cpu #1 went to sleep rcu_needs_cpu must have answered false,
> otherwise a 1 tick delay would have been programmed. rcu_pending compares
> rcu_ctrlblk.cur with rcu_data.quiescbatch for cpu #1. So these two must
> have been equal otherwise rcu_needs_cpu would have answered true.
> That means that the rcu_needs_cpu check has been completed before
> rcu_start_batch for batch 0xffffffffffffff99. The bit for cpu #1 is
> still set in the rcu_ctrlblk.cpumask, therefore the bit for cpu #1
> in nohz_cpu_mask can not have been set at the time rcu_start_batch has
> completed. That gives the following race (cpu 0 is starting the batch,
> cpu 1 is going to sleep):
> 
> cpu 1: tick_nohz_stop_sched_tick: rcu_needs_cpu();
> cpu 0: rcu_start_batch: rcp->cur++;
> cpu 0: rcu_start_batch: cpumask_andnot(to_cpumask(rcp->cpumask),
> 					cpu_online_mask, nonz_cpu_mask);
> cpu 1: tick_nohz_stop_schedk_tick: cpumask_set_cpu(1, nohz_cpu_mask);
> 
> The order of i) setting the bit in nohz_cpu_mask and ii) the rcu_needs_cpu()
> check in tick_nohz_stop_sched_tick is wrong, no? Or did I miss some suble
> check that comes afterwards ?
> 
> -- 
> blue skies,
>    Martin.
> 
> "Reality continues to ruin my life." - Calvin.
> 

  reply	other threads:[~2009-08-07 14:30 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2009-08-07 13:15 [BUG] race of RCU vs NOHU Martin Schwidefsky
2009-08-07 14:29 ` Paul E. McKenney [this message]
2009-08-10 12:25   ` Martin Schwidefsky
2009-08-10 15:08     ` Paul E. McKenney
2009-08-11 10:56       ` Martin Schwidefsky
2009-08-11 14:52         ` Paul E. McKenney
2009-08-11 15:17           ` Martin Schwidefsky
2009-08-11 18:04             ` Paul E. McKenney
2009-08-12  7:32               ` Martin Schwidefsky
2009-08-21 15:54                 ` Paul E. McKenney
2009-08-31  8:47                   ` Martin Schwidefsky
2009-08-31 14:30                     ` Paul E. McKenney
2009-08-11 16:58         ` Greg KH
2009-08-10 16:10     ` Pavel Machek
2009-08-11 21:23       ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20090807142957.GB6700@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=gerald.schaefer@de.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=manfred@colorfullife.com \
    --cc=mingo@elte.hu \
    --cc=schwidefsky@de.ibm.com \
    --cc=tglx@linutronix.de \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox