All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@codemonkey.org.uk>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	Ingo Molnar <mingo@redhat.com>,
	Stephane Eranian <eranian@gmail.com>,
	Andi Kleen <andi@firstfloor.org>
Subject: Re: perf related lockdep bug
Date: Wed, 4 Nov 2015 07:34:54 -0800	[thread overview]
Message-ID: <20151104153454.GU29027@linux.vnet.ibm.com> (raw)
In-Reply-To: <20151104142058.GX3604@twins.programming.kicks-ass.net>

On Wed, Nov 04, 2015 at 03:20:58PM +0100, Peter Zijlstra wrote:
> On Wed, Nov 04, 2015 at 05:48:38AM -0800, Paul E. McKenney wrote:
> > Ouch!!!  Thank you for the analysis, though I am very surprised that
> > my testing did not find this. 
> 
> Yeah, not sure how that ended up not triggering earlier.
> 
> I'm thinking of adding a might_wake(), much like we have might_fault()
> and add that to printk().

The idea being that might_wake() complains if a scheduler lock is held?
Sounds like a good idea to me.

> > But pulling all printk()s out from under
> > rnp->lock is going to re-introduce some stall-warning bugs.
> 
> figures :/
> 
> > So what other options do I have?
> 
> Kill printk() :-) Its unreliable garbage anyway ;-)

;-) ;-) ;-)

> > o	I could do raise_softirq(), then report the quiescent state in
> > 	the core RCU code, but I bet that raise_softirq()'s  wakeup gets
> > 	me into just as much trouble.
> 
> Yep..
> 
> > o	Ditto for workqueues, I suspect.
> 
> Yep..
> 
> > o	I cannot send an IPI because interrupts are disabled, and that
> > 	would be rather annoying from a real-time perspective in any
> > 	case.
> 
> Indeed.
> 
> > So this hit the code in perf_lock_task_context() that disables preemption
> > across an RCU read-side critical section, which previously sufficed to
> > prevent this scenario.  What happened this time is as follows:
> > 
> > o	CPU 0 entered perf_lock_task_context(), disabled preemption,
> > 	and entered its RCU read-side critical section.  Of course,
> > 	the whole point of disabling preemption is to prevent the
> > 	matching rcu_read_unlock() from grabbing locks.
> > 
> > o	CPU 1 started an expedited grace period.  It checked CPU
> > 	state, saw that CPU 0 was running in the kernel, and therefore
> > 	IPIed it.
> > 
> > o	The IPI handler running on CPU 0 saw that there was an
> > 	RCU read-side critical section in effect, so it set the
> > 	->exp_need_qs flag.
> > 
> > o	When the matching rcu_read_unlock() executes, it notes that
> > 	->exp_need_qs is set, and therefore grabs the locks that it
> > 	shouldn't, hence lockdep's complaints about deadlock.
> > 
> > This problem is caused by the IPI handler interrupting the RCU read-side
> > critical section.  One way to prevent the IPI from doing this is to
> > disable interrupts across the RCU read-side critical section instead
> > of merely disabling preemption.  This is a reasonable approach given
> > that acquiring the scheduler locks is going to disable interrupts
> > in any case.
> > 
> > The (untested) patch below takes this approach.
> > 
> > Thoughts?
> 
> Yes, this should work, but now I worry I need to go audit all of perf
> and sched for this :/

Could lockdep be convinced to do the auditing for you?

							Thanx, Paul


  reply	other threads:[~2015-11-04 15:35 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-11-04  5:17 perf related lockdep bug Dave Jones
2015-11-04 10:21 ` Peter Zijlstra
2015-11-04 10:28   ` Peter Zijlstra
2015-11-04 10:50     ` Peter Zijlstra
2015-11-04 13:48       ` Paul E. McKenney
2015-11-04 14:20         ` Peter Zijlstra
2015-11-04 15:34           ` Paul E. McKenney [this message]
2015-11-04 15:36           ` Peter Zijlstra
2015-11-04 15:51             ` Paul E. McKenney
2015-11-04 20:58         ` Andi Kleen
2015-11-05  0:55           ` Paul E. McKenney
2015-11-05  1:59             ` Paul E. McKenney
2015-11-05  2:46             ` Andi Kleen
2015-11-05 14:04               ` Paul E. McKenney
2015-11-11 13:29                 ` Paul E. McKenney
2015-11-10  6:39         ` [tip:perf/urgent] perf: Disable IRQs across RCU RS CS that acquires scheduler lock tip-bot for Paul E. McKenney
2015-11-04 14:01     ` perf related lockdep bug Paul E. McKenney
2015-11-04 14:34       ` Peter Zijlstra
2015-11-05  1:57         ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20151104153454.GU29027@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=andi@firstfloor.org \
    --cc=davej@codemonkey.org.uk \
    --cc=eranian@gmail.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.