From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>,
Linux Kernel <linux-kernel@vger.kernel.org>,
gregkh@linuxfoundation.org, peter@hurleysoftware.com
Subject: Re: tty^Wrcu/perf lockdep trace.
Date: Fri, 4 Oct 2013 09:15:48 -0700 [thread overview]
Message-ID: <20131004161548.GA19957@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131004160352.GF5790@linux.vnet.ibm.com>
On Fri, Oct 04, 2013 at 09:03:52AM -0700, Paul E. McKenney wrote:
> On Fri, Oct 04, 2013 at 08:58:35AM +0200, Peter Zijlstra wrote:
> > On Thu, Oct 03, 2013 at 12:58:32PM -0700, Paul E. McKenney wrote:
> > > On Thu, Oct 03, 2013 at 09:42:26PM +0200, Peter Zijlstra wrote:
> > > >
> > > > That's not tty; that's RCU..
> > > >
> > > > On Thu, Oct 03, 2013 at 03:08:30PM -0400, Dave Jones wrote:
> > > > > ======================================================
> > > > > [ INFO: possible circular locking dependency detected ]
> > > > > 3.12.0-rc3+ #92 Not tainted
> > > > > -------------------------------------------------------
> > > > > trinity-child2/15191 is trying to acquire lock:
> > > > > (&rdp->nocb_wq){......}, at: [<ffffffff8108ff43>] __wake_up+0x23/0x50
> > > > >
> > > > > but task is already holding lock:
> > > > > (&ctx->lock){-.-...}, at: [<ffffffff81154c19>] perf_event_exit_task+0x109/0x230
> > > > >
> > > > > which lock already depends on the new lock.
> > > > >
> > > > >
> > > > > the existing dependency chain (in reverse order) is:
> > > > >
> > > > > -> #3 (&ctx->lock){-.-...}:
> > > > >
> > > > > -> #2 (&rq->lock){-.-.-.}:
> > > > >
> > > > > -> #1 (&p->pi_lock){-.-.-.}:
> > > > >
> > > > > -> #0 (&rdp->nocb_wq){......}:
> > >
> > > I suppose I could defer the ->nocb_wq wakeup until the next context switch
> > > or transition to idle/userspace, but it might be simpler for put_ctx()
> > > to maintain a per-CPU chain of callbacks which are kfree_rcu()ed when
> > > ctx->lock is dropped. Also easier on the kernel/user and kernel/idle
> > > transition overhead/latency...
> > >
> > > Other thoughts?
> >
> > What's caused this? We've had that kfree_rcu() in there for ages. I need
> > to audit all the get/put_ctx calls anyway for an unrelated issue but I
> > fear its going to be messy to defer that kfree_rcu() call, but I can
> > try.
>
> The problem exists, but NOCB made it much more probable. With non-NOCB
> kernels, an irq-disabled call_rcu() invocation does a wake_up() only if
> there are more than 10,000 callbacks stacked up on the CPU. With a NOCB
> kernel, the wake_up() happens on the first callback.
>
> So let's look at what is required to solve this within RCU. Currently,
> I cannot safely do any sort of wakeup or even a resched_cpu() from
> within an call_rcu() that is called with interrupts disabled because of
> this deadlock. I could require that the rcu_nocb_poll sysfs parameter
> always be set, but the energy-efficiency guys are not going to be happy
> with the resulting wakeups on idle systems.
>
> I could try defering the wake_up(), Lai Jiangshan style. The question
> is then "to where do I defer it?" The straightforward answer is to
> check on each context switch, each transition to RCU idle, and each
> scheduling-clock interrupt from userspace execution. The scenario that
> defeats this is where the CPU has a single runnable task, but where that
> task spends much of its time in the kernel, so that the scheduling-clock
> interrupts always hit kernel-mode execution. The callback is then
> deferred forever.
Ah, but it is safe to call wake_up() from a scheduling-clock interrupt,
because these cannot interrupt an irq-disabled lock critical section.
So maybe I can unconditionally defer to a scheduling-clock interrupt.
> We could keep Frederic Weisbecker's kernel/user transition hooks,
> currently in place only for NO_HZ_FULL, and propagate these to all
> architectures, and do the additional checking on those transitions.
> This would work, but is not an immediate solution. And adds overhead
> that is not otherwise needed.
But if !NO_HZ_FULL, there will eventually be a scheduling-clock interrupt.
So maybe check on each context switch, transition to idle, subsequent
non-irq-disabled call_rcu(), and scheduling-clock interrupt?
Does that actually avoid this deadlock?
Thanx, Paul
> Another approach that just now occurred to me is to do a mod_timer()
> each time the first callback is posted with irqs disabled, and to
> cancel that timer if the wake_up() gets done later. (I can safely and
> unconditionally do a wake_up() from a timer handler, IIRC.) So, does
> perf ever want to invoke call_rcu() holding a timer lock?
>
> I am not too happy about the complexity of deferring, but maybe it is
> the right approach, at least assuming perf isn't going to whack me
> with a timer lock. ;-)
>
> Any other approaches that I am missing?
>
> Thanx, Paul
next prev parent reply other threads:[~2013-10-04 16:15 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-10-03 19:08 tty/perf lockdep trace Dave Jones
2013-10-03 19:42 ` tty^Wrcu/perf " Peter Zijlstra
2013-10-03 19:58 ` Paul E. McKenney
2013-10-04 6:58 ` Peter Zijlstra
2013-10-04 16:03 ` Paul E. McKenney
2013-10-04 16:15 ` Paul E. McKenney [this message]
2013-10-04 16:50 ` Peter Zijlstra
2013-10-04 17:09 ` Paul E. McKenney
2013-10-04 18:52 ` Peter Zijlstra
2013-10-04 21:25 ` Paul E. McKenney
2013-10-04 22:02 ` Paul E. McKenney
2013-10-05 0:23 ` Paul E. McKenney
2013-10-07 11:24 ` Peter Zijlstra
2013-10-07 12:59 ` Paul E. McKenney
2013-10-05 16:05 ` Peter Zijlstra
2013-10-05 16:28 ` Paul E. McKenney
2013-10-05 19:59 ` Peter Zijlstra
2013-10-05 22:03 ` Paul E. McKenney
2013-10-07 8:42 ` Peter Zijlstra
2013-10-07 13:11 ` Paul E. McKenney
2013-10-07 17:35 ` Paul E. McKenney
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20131004161548.GA19957@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=davej@redhat.com \
--cc=gregkh@linuxfoundation.org \
--cc=linux-kernel@vger.kernel.org \
--cc=peter@hurleysoftware.com \
--cc=peterz@infradead.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.