Re: rcu, sched: WARNING: CPU: 30 PID: 23771 at kernel/rcu/tree_plugin.h:337 rcu_read_unlock_special+0x369/0x550()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Sasha Levin <sasha.levin@oracle.com>
Cc: Lai Jiangshan <laijs@cn.fujitsu.com>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	LKML <linux-kernel@vger.kernel.org>,
	"davej@codemonkey.org.uk >> Dave Jones" <davej@codemonkey.org.uk>
Subject: Re: rcu, sched: WARNING: CPU: 30 PID: 23771 at kernel/rcu/tree_plugin.h:337 rcu_read_unlock_special+0x369/0x550()
Date: Fri, 23 Jan 2015 01:36:44 -0800	[thread overview]
Message-ID: <20150123093644.GA11175@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150123065542.GQ9719@linux.vnet.ibm.com>

On Thu, Jan 22, 2015 at 10:55:42PM -0800, Paul E. McKenney wrote:
> On Thu, Jan 22, 2015 at 11:05:45PM -0500, Sasha Levin wrote:
> > On 01/22/2015 11:02 PM, Sasha Levin wrote:
> > > On 01/22/2015 10:51 PM, Paul E. McKenney wrote:
> > >> On Thu, Jan 22, 2015 at 10:29:01PM -0500, Sasha Levin wrote:
> > >>>> On 01/21/2015 07:43 PM, Paul E. McKenney wrote:
> > >>>>>> On Wed, Jan 21, 2015 at 10:44:57AM -0500, Sasha Levin wrote:
> > >>>>>>>> On 01/20/2015 09:57 PM, Paul E. McKenney wrote:
> > >>>>>>>>>>>>>> So RCU believes that an RCU read-side critical section that ended within
> > >>>>>>>>>>>>>>>>>> an interrupt handler (in this case, an hrtimer) somehow got preempted.
> > >>>>>>>>>>>>>>>>>> Which is not supposed to happen.
> > >>>>>>>>>>>>>>>>>>
> > >>>>>>>>>>>>>>>>>> Do you have CONFIG_PROVE_RCU enabled?  If not, could you please enable it
> > >>>>>>>>>>>>>>>>>> and retry?
> > >>>>>>>>>>>>>>
> > >>>>>>>>>>>>>> I did have CONFIG_PROVE_RCU, and didn't see anything else besides what I pasted here.
> > >>>>>>>>>> OK, fair enough.  I do have a stack of RCU CPU stall-warning changes on
> > >>>>>>>>>> their way in, please see v3.19-rc1..630181c4a915 in -rcu, which is at:
> > >>>>>>>>>>
> > >>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu.git
> > >>>>>>>>>>
> > >>>>>>>>>> These handle the problems that Dave Jones, yourself, and a few others
> > >>>>>>>>>> located this past December.  Could you please give them a spin?
> > >>>>>>>>
> > >>>>>>>> They seem to be a part of -next already, so this testing already includes them.
> > >>>>>>>>
> > >>>>>>>> I seem to be getting them about once a day, anything I can add to debug it?
> > >>>>>>
> > >>>>>> Could you please try reproducing with the following patch?
> > >>>>
> > >>>> Yes, and I've got mixed results. It reproduced, and all I got was:
> > >>>>
> > >>>> [  717.645572] ===============================
> > >>>> [  717.645572] [ INFO: suspicious RCU usage. ]
> > >>>> [  717.645572] 3.19.0-rc5-next-20150121-sasha-00064-g3c37e35-dirty #1809 Tainted: G        W
> > >>>> [  717.645572] -------------------------------
> > >>>> [  717.645572] kernel/rcu/tree_plugin.h:337 rcu_read_unlock() from irq or softirq with blocking in critical section!!!
> > >>>> [  717.645572] !
> > >>>> [  717.645572]
> > >>>> [  717.645572] other info that might help us debug this:
> > >>>> [  717.645572]
> > >>>> [  717.645572]
> > >>>> [  717.645572] rcu_scheduler_active = 1, debug_locks = 1
> > >>>> [  717.645572] 3 locks held by trinity-c29/16497:
> > >>>> [  717.645572]  #0:  (&sb->s_type->i_mutex_key){+.+.+.}, at: [<ffffffff81bec373>] lookup_slow+0xd3/0x420
> > >>>> [  717.645572]  #1:
> > >>>> [hang]
> > >>>>
> > >>>> So the rest of the locks/stack trace didn't get printed, nor the pr_alert() which
> > >>>> should follow that.
> > >>>>
> > >>>> I've removed the lockdep call and will re-run it.
> > >> Thank you!  You are keeping the pr_alert(), correct?
> > > 
> > > Yup, just the lockdep call goes away.
> > 
> > Okay, this reproduced faster than I anticipated:
> > 
> > [  786.160131] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.239513] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.240503] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.242575] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.243565] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.243565] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.243565] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.243565] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > [  786.243565] ->rcu_read_unlock_special: 0x100 (b: 0, nq: 1)
> > 
> > It seems like the WARN_ON_ONCE was hiding the fact it actually got hit couple
> > of times in a very short interval. Maybe that would also explain lockdep crapping
> > itself.
> 
> OK, that was what I thought was the situation.  I have not yet fully
> worked out how RCU gets into that state, but in the meantime, here
> is a patch that should prevent the splats.  (It requires a subtle
> interaction of quiescent-state detection and the scheduling-clock
> interrupt.)

And I did finally figure out how this can happen.  Please see below
for an updated patch with this information recorded in the commit log.
Sasha, I am impressed -- your testing not only located a true RCU bug,
but an RCU bug that can happen on a uniprocessor!  ;-)

As far as I know, the bug is harmless apart from the splat, but still...

							Thanx, Paul

------------------------------------------------------------------------

rcu: Clear need_qs flag to prevent splat

If the scheduling-clock interrupt sets the current tasks need_qs flag,
but if the current CPU passes through a quiescent state in the meantime,
then rcu_preempt_qs() will fail to clear the need_qs flag, which can fool
RCU into thinking that additional rcu_read_unlock_special() processing
is needed.  This commit therefore clears the need_qs flag before checking
for additional processing.

For this problem to occur, we need rcu_preempt_data.passed_quiesce equal
to true and current->rcu_read_unlock_special.b.need_qs also equal to true.
This condition can occur as follows:

1.	CPU 0 is aware of the current preemptible RCU grace period,
    	but has not yet passed through a quiescent state.  Among other
    	things, this means that rcu_preempt_data.passed_quiesce is false.
    
2.	Task A running on CPU 0 enters a preemptible RCU read-side
    	critical section.
    
3.	CPU 0 takes a scheduling-clock interrupt, which notices the
    	RCU read-side critical section and the need for a quiescent state,
    	and thus sets current->rcu_read_unlock_special.b.need_qs to true.
    
4.	Task A is preempted, enters the scheduler, eventually invoking
    	rcu_preempt_note_context_switch() which in turn invokes
    	rcu_preempt_qs().
    
    	Because rcu_preempt_data.passed_quiesce is false,
    	control enters the body of the "if" statement, which sets
    	rcu_preempt_data.passed_quiesce to true.
    
5.	At this point, CPU 0 takes an interrupt.  The interrupt
    	handler contains an RCU read-side critical section, and
    	the rcu_read_unlock() notes that current->rcu_read_unlock_special
    	is nonzero, and thus invokes rcu_read_unlock_special().
    
6.	Once in rcu_read_unlock_special(), the fact that
    	current->rcu_read_unlock_special.b.need_qs is true becomes
    	apparent, so rcu_read_unlock_special() invokes rcu_preempt_qs().
    	Recursively, given that we interrupted out of that same
    	function in the preceding step.
    
7.	Because rcu_preempt_data.passed_quiesce is now true,
    	rcu_preempt_qs() does nothing, and simply returns.
    
8.	Upon return to rcu_read_unlock_special(), it is noted that
    	current->rcu_read_unlock_special is still nonzero (because
    	the interrupted rcu_preempt_qs() had not yet gotten around
    	to clearing current->rcu_read_unlock_special.b.need_qs).
    
9.	Execution proceeds to the WARN_ON_ONCE(), which notes that
    	we are in an interrupt handler and thus duly splats.

The solution, as noted above, is to make rcu_read_unlock_special()
clear out current->rcu_read_unlock_special.b.need_qs after calling
rcu_preempt_qs().  The interrupted rcu_preempt_qs() will clear it again,
but this is harmless.  The worst that happens is that we clobber another
attempt to set this field, but this is not a problem because we just
got done reporting a quiescent state.

Reported-by: Sasha Levin <sasha.levin@oracle.com>
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>

diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h
index 8669de884445..ec99dc16aa38 100644
--- a/kernel/rcu/tree_plugin.h
+++ b/kernel/rcu/tree_plugin.h
@@ -322,6 +322,7 @@ void rcu_read_unlock_special(struct task_struct *t)
 	special = t->rcu_read_unlock_special;
 	if (special.b.need_qs) {
 		rcu_preempt_qs();
+		t->rcu_read_unlock_special.need_qs = false;
 		if (!t->rcu_read_unlock_special.s) {
 			local_irq_restore(flags);
 			return;

next prev parent reply	other threads:[~2015-01-23  9:36 UTC|newest]

Thread overview: 30+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-01-18 14:17 rcu, sched: WARNING: CPU: 30 PID: 23771 at kernel/rcu/tree_plugin.h:337 rcu_read_unlock_special+0x369/0x550() Sasha Levin
2015-01-18 23:22 ` Paul E. McKenney
2015-01-20 15:39   ` Sasha Levin
2015-01-21  2:57     ` Paul E. McKenney
2015-01-21 15:44       ` Sasha Levin
2015-01-22  0:43         ` Paul E. McKenney
2015-01-23  3:29           ` Sasha Levin
2015-01-23  3:51             ` Paul E. McKenney
2015-01-23  4:02               ` Sasha Levin
2015-01-23  4:05                 ` Sasha Levin
2015-01-23  6:55                   ` Paul E. McKenney
2015-01-23  8:41                     ` Lai Jiangshan
2015-01-23  9:38                       ` Paul E. McKenney
2015-01-23  9:16                     ` Lai Jiangshan
2015-01-23  9:48                       ` Paul E. McKenney
2015-01-23  9:36                     ` Paul E. McKenney [this message]
2015-01-23 21:51                       ` Sasha Levin
2015-01-23 22:54                         ` Paul E. McKenney
2015-01-23 23:05                           ` Sasha Levin
2015-01-23 23:16                             ` Paul E. McKenney
2015-01-24  2:18                       ` Lai Jiangshan
2015-01-24 21:18                         ` Paul E. McKenney
2015-01-26  2:08                           ` Lai Jiangshan
2015-01-27 22:03                             ` Paul E. McKenney
2015-01-27 22:08                               ` Sasha Levin
2015-01-27 23:16                                 ` Paul E. McKenney
2015-01-30 19:57                                   ` Sasha Levin
2015-01-30 20:33                                     ` Paul E. McKenney
2015-02-11 23:17                                       ` Sasha Levin
2015-02-12  0:42                                         ` Paul E. McKenney

find likely ancestor, descendant, or conflicting patches for this message:
( dfblob:8669de88444 dfblob:ec99dc16aa3 )
 OR (
bs:"Re: rcu, sched: WARNING: CPU: 30 PID: 23771 at kernel/rcu/tree_plugin.h:337 rcu_read_unlock_special+0x369/0x550()" )
	(help)

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150123093644.GA11175@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@codemonkey.org.uk \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=sasha.levin@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.