All of lore.kernel.org
 help / color / mirror / Atom feed
From: Steffen Persvold <sp@numascale.com>
To: paulmck@linux.vnet.ibm.com
Cc: Daniel J Blueman <daniel@numascale-asia.com>,
	Dipankar Sarma <dipankar@in.ibm.com>,
	linux-kernel@vger.kernel.org, x86@kernel.org
Subject: Re: RCU qsmask !=0 warnings on large-SMP...
Date: Wed, 25 Jan 2012 21:35:15 +0100	[thread overview]
Message-ID: <4F206783.2050901@numascale.com> (raw)
In-Reply-To: <20120125181441.GD2849@linux.vnet.ibm.com>

On 1/25/2012 19:14, Paul E. McKenney wrote:
[]
>> CONFIG_NO_HZ is not set, so it should not happen. We see that the behavior is the same with CONFIG_NO_HZ=y though, but it takes longer to reproduce usually.
>
> OK, the CONFIG_NO_HZ=n has the least code involved, so it would be best
> for debugging.

Good, that was my thought also when looking at the code. I'm reducing 
NR_CPUS to 512 now to get two levels just for simpler debugging (the 
issue is still present).

[]
>> Because the RCU tree is 3 levels, the printout function we added in the patch gets called 3 times each time with the same RDP but with different RNPs (in rcu_start_gp()).
>
> Ah, good point.  Hmmm...
>
> Looking back at Daniel's original email, we have the root rcu_node
> structure with ->qsmask=0x1 (indicating first descendent), the next
> level having ->qsmask=0x8 (indicating fourth descendent) and the last
> level having ->qsmask=0x1, again indicating first descendent.  So,
> zero, 16, 32, 48.  Which agrees with the CPU number below that has not
> yet caught up to the current grace period.
>
> Another really odd thing...  If we are starting the new grace period,
> we should have incremented rsp->gpnum.  And in fact, we do have
> rsp->gpnum being one larger than rsp->completed, as expected.  But
> if we have only initialized the root rcu_node structures, how can
> the per-CPU rcudata structures know about the new grace period yet?
>
> There was a time when the CPUs could find out early, but I think that
> was a long time ago.  Yes, check_for_new_grace_period() does compare
> rdp->gpnum against rsp->gpnum, but it calls note_new_gpnum() which
> acquires the rnp->lock(), and nothing happens unless __note_new_gpnum()
> sees that rnp->gpnum is different rdp->gpnum.
>
> So, it would be very interesting to add the values rdp->mynode->gpnum
> and rdp->mynode->completed to your list, perhaps labeling them something
> like "rng" and "rnc" respectively.

I will add this to the printout.

>
> Of course, CPU 48 should not have decided that it was done with the
> old grace period before clearing its bit.  For that matter, some
> CPU somewhere decided that the grace period was done despite the
> root rcu_node's ->qsmask being non-zero, which should be prevented
> by the:
>
> 	if (rnp->qsmask != 0 || rcu_preempt_blocked_readers_cgp(rnp)) {
>
> line in rcu_report_qs_rnp().
>
> 3.2 has some event tracing that would be extremely helpful in tracking
> this down.  Are you able to run 3.2?

Yes, 3.2.1 is our debug target right now.

[]
>>> Same here, but most of the ql= values are larger.  Later printout?
>>
>> The loop in rcu_start_gp() releases the node lock between each time it gets a new level in the RCU tree (it has to) :
>>
>> 	rcu_for_each_node_breadth_first(rsp, rnp) {
>> 		raw_spin_lock(&rnp->lock);	/* irqs already disabled. */
>> 		rcu_debug_print(rsp, rnp);
>>
>> so I guess this allows ql= values to increase maybe, no ?
>
> The ql= levels can increase anyway -- those queues are only accessed by
> the corresponding CPU or from stop_machine context.  The small increases
> are entirely consistent with your having bits set at all levels of the
> rcu_node tree.  The reason I was surprised is that my earlier bugs (as
> in before the code hit mainline) only ever resulted in a single level
> having a stray bit.

Ok.

[]
>>
>> Thanks for looking into this Paul, we'd be more than happy to test out theories and patches.
>
> The event tracing, particularly the "rcu_grace_period" set, would be
> very helpful.

Are you talking about the data from /sys/kernel/debug/rcu/ ? I have 
CONFIG_RCU_TRACE (and consequently CONFIG_TREE_RCU_TRACE) set, is this 
enough to get the event data you want ?

Cheers,
-- 
Steffen Persvold, Chief Architect NumaChip
Numascale AS - www.numascale.com
Tel: +47 92 49 25 54 Skype: spersvold

  reply	other threads:[~2012-01-25 20:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-01-25  9:44 RCU qsmask !=0 warnings on large-SMP Daniel J Blueman
2012-01-25 14:00 ` Paul E. McKenney
2012-01-25 14:18   ` Steffen Persvold
2012-01-25 18:14     ` Paul E. McKenney
2012-01-25 20:35       ` Steffen Persvold [this message]
2012-01-25 21:51         ` Paul E. McKenney
2012-01-25 22:51           ` Steffen Persvold
2012-01-26  1:57             ` Paul E. McKenney
2012-01-25 21:14       ` Steffen Persvold
2012-01-25 21:34         ` Paul E. McKenney
2012-01-25 22:48           ` Steffen Persvold
2012-01-26  1:58             ` Paul E. McKenney
2012-01-26 15:04               ` Steffen Persvold
2012-01-26 19:26                 ` Paul E. McKenney
2012-01-27 11:09                   ` Steffen Persvold
2012-01-29  6:09                     ` Paul E. McKenney
2012-01-30 16:15                       ` Paul E. McKenney
2012-01-31 17:33                         ` Steffen Persvold
2012-01-31 17:38                           ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F206783.2050901@numascale.com \
    --to=sp@numascale.com \
    --cc=daniel@numascale-asia.com \
    --cc=dipankar@in.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.