linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	gregkh@linuxfoundation.org, peter@hurleysoftware.com
Subject: Re: tty^Wrcu/perf lockdep trace.
Date: Fri, 4 Oct 2013 09:15:48 -0700	[thread overview]
Message-ID: <20131004161548.GA19957@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131004160352.GF5790@linux.vnet.ibm.com>

On Fri, Oct 04, 2013 at 09:03:52AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 04, 2013 at 08:58:35AM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 03, 2013 at 12:58:32PM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 03, 2013 at 09:42:26PM +0200, Peter Zijlstra wrote:
> > > > 
> > > > That's not tty; that's RCU..
> > > > 
> > > > On Thu, Oct 03, 2013 at 03:08:30PM -0400, Dave Jones wrote:
> > > > >  ======================================================
> > > > >  [ INFO: possible circular locking dependency detected ]
> > > > >  3.12.0-rc3+ #92 Not tainted
> > > > >  -------------------------------------------------------
> > > > >  trinity-child2/15191 is trying to acquire lock:
> > > > >   (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
> > > > >  
> > > > > but task is already holding lock:
> > > > >   (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
> > > > > 
> > > > > which lock already depends on the new lock.
> > > > > 
> > > > > 
> > > > > the existing dependency chain (in reverse order) is:
> > > > > 
> > > > > -> #3 (&ctx->lock){-.-...}:
> > > > >  
> > > > > -> #2 (&rq->lock){-.-.-.}:
> > > > >  
> > > > > -> #1 (&p->pi_lock){-.-.-.}:
> > > > >  
> > > > > -> #0 (&rdp->nocb_wq){......}:
> > > 
> > > I suppose I could defer the ->nocb_wq wakeup until the next context switch
> > > or transition to idle/userspace, but it might be simpler for put_ctx()
> > > to maintain a per-CPU chain of callbacks which are kfree_rcu()ed when
> > > ctx->lock is dropped.  Also easier on the kernel/user and kernel/idle
> > > transition overhead/latency...
> > > 
> > > Other thoughts?
> > 
> > What's caused this? We've had that kfree_rcu() in there for ages. I need
> > to audit all the get/put_ctx calls anyway for an unrelated issue but I
> > fear its going to be messy to defer that kfree_rcu() call, but I can
> > try.
> 
> The problem exists, but NOCB made it much more probable.  With non-NOCB
> kernels, an irq-disabled call_rcu() invocation does a wake_up() only if
> there are more than 10,000 callbacks stacked up on the CPU.  With a NOCB
> kernel, the wake_up() happens on the first callback.
> 
> So let's look at what is required to solve this within RCU.  Currently,
> I cannot safely do any sort of wakeup or even a resched_cpu() from
> within an call_rcu() that is called with interrupts disabled because of
> this deadlock.  I could require that the rcu_nocb_poll sysfs parameter
> always be set, but the energy-efficiency guys are not going to be happy
> with the resulting wakeups on idle systems.
> 
> I could try defering the wake_up(), Lai Jiangshan style.  The question
> is then "to where do I defer it?"  The straightforward answer is to
> check on each context switch, each transition to RCU idle, and each
> scheduling-clock interrupt from userspace execution.  The scenario that
> defeats this is where the CPU has a single runnable task, but where that
> task spends much of its time in the kernel, so that the scheduling-clock
> interrupts always hit kernel-mode execution.  The callback is then
> deferred forever.

Ah, but it is safe to call wake_up() from a scheduling-clock interrupt,
because these cannot interrupt an irq-disabled lock critical section.
So maybe I can unconditionally defer to a scheduling-clock interrupt.

> We could keep Frederic Weisbecker's kernel/user transition hooks,
> currently in place only for NO_HZ_FULL, and propagate these to all
> architectures, and do the additional checking on those transitions.
> This would work, but is not an immediate solution.  And adds overhead
> that is not otherwise needed.

But if !NO_HZ_FULL, there will eventually be a scheduling-clock interrupt.

So maybe check on each context switch, transition to idle, subsequent
non-irq-disabled call_rcu(), and scheduling-clock interrupt?

Does that actually avoid this deadlock?

							Thanx, Paul

> Another approach that just now occurred to me is to do a mod_timer()
> each time the first callback is posted with irqs disabled, and to
> cancel that timer if the wake_up() gets done later.  (I can safely and
> unconditionally do a wake_up() from a timer handler, IIRC.)  So, does
> perf ever want to invoke call_rcu() holding a timer lock?
> 
> I am not too happy about the complexity of deferring, but maybe it is
> the right approach, at least assuming perf isn't going to whack me
> with a timer lock.  ;-)
> 
> Any other approaches that I am missing?
> 
> 							Thanx, Paul


  reply	other threads:[~2013-10-04 16:15 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-03 19:08 tty/perf lockdep trace Dave Jones
2013-10-03 19:42 ` tty^Wrcu/perf " Peter Zijlstra
2013-10-03 19:58   ` Paul E. McKenney
2013-10-04  6:58     ` Peter Zijlstra
2013-10-04 16:03       ` Paul E. McKenney
2013-10-04 16:15         ` Paul E. McKenney [this message]
2013-10-04 16:50         ` Peter Zijlstra
2013-10-04 17:09           ` Paul E. McKenney
2013-10-04 18:52             ` Peter Zijlstra
2013-10-04 21:25               ` Paul E. McKenney
2013-10-04 22:02                 ` Paul E. McKenney
2013-10-05  0:23                   ` Paul E. McKenney
2013-10-07 11:24                     ` Peter Zijlstra
2013-10-07 12:59                       ` Paul E. McKenney
2013-10-05 16:05                 ` Peter Zijlstra
2013-10-05 16:28                   ` Paul E. McKenney
2013-10-05 19:59                     ` Peter Zijlstra
2013-10-05 22:03                       ` Paul E. McKenney
2013-10-07  8:42                         ` Peter Zijlstra
2013-10-07 13:11                           ` Paul E. McKenney
2013-10-07 17:35                     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131004161548.GA19957@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peter@hurleysoftware.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).