linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Peter Zijlstra <peterz@infradead.org>
Cc: Dave Jones <davej@redhat.com>,
	Linux Kernel <linux-kernel@vger.kernel.org>,
	gregkh@linuxfoundation.org, peter@hurleysoftware.com
Subject: Re: tty^Wrcu/perf lockdep trace.
Date: Mon, 7 Oct 2013 06:11:02 -0700	[thread overview]
Message-ID: <20131007131102.GY5790@linux.vnet.ibm.com> (raw)
In-Reply-To: <20131007084239.GX3081@twins.programming.kicks-ass.net>

On Mon, Oct 07, 2013 at 10:42:39AM +0200, Peter Zijlstra wrote:
> On Sat, Oct 05, 2013 at 03:03:11PM -0700, Paul E. McKenney wrote:
> > In theory, we could do that.  But in practice, what would wake us up
> > when the CPUs go non-idle?
> > 
> > 1.	We could do a wakeup on the idle-to-non-idle transition.  That
> > 	would increase idle-to-non-idle latency, defeating the purpose
> > 	of rcu_nocb_poll=y.  Plus there are workloads that enter and
> > 	exit idle extremely quickly, which would not be good for either
> > 	perforrmance, scalability, or energy efficiency.
> > 
> > 2.	We could have some other thread poll all the CPUs for activity,
> > 	for example, the RCU grace-period kthreads.  This might actually
> > 	work, but there are some really ugly races involving CPUs becoming
> > 	active just long enough to post a callback, going to sleep,
> > 	with no other RCU activity in the system.  This could easily
> > 	result in a system hang.
> > 
> > 3.	We could post a timeout to check for the corresponding CPU
> > 	being idle, but that just transfers the wakeups from idle from
> > 	the rcuo kthreads to the other CPUs.
> > 
> > 4.	I remove rcu_nocb_poll and see if anyone complains.  That doesn't
> > 	solve the deadlock problem, but it does simplify RCU a bit.  ;-)
> > 
> > Other thoughts?
> 
> So we already move all the nocb rcuo threads over to the timekeeping
> cpu, right? Giving you n threads to wake and/or poll and that's
> expensive.

I don't pin the rcuo threads anywhere, though I would expect people
to move them to some set of housekeeping CPUs, the timekeeping CPU
being a good candidate.

> So why doesn't the time-keeping cpu, which is awake when at least one of
> the nocb cpus is awake, not poll the nocb cpus their call list?

If !NO_HZ_FULL, there won't be a timekeeping CPU as such, if I remember
correctly.

> Arguably you don't want to do that from the old scheduler tick interrupt
> or softirq context thingy, but by using a kthread but you've already got
> all that around.

The polling happens in the grace-period kthread, but it is not guaranteed
to be happening unless NO_HZ_FULL_SYSIDLE, in which case the system
will generate artificial grace periods as needed to make the required
polling happen.  On the other hand, if !NO_HZ_FULL_SYSIDLE, there will
not be any polling if there is no RCU update activity.

> At that point; you've got a single kthread periodically being woken by
> the scheduler timer interrupt -- which still goes away when the entire
> machine goes idle -- which would do something like:
> 
> 
>   for_each_cpu(cpu, nocb_cpus_mask) {
>   	if (!list_empty_careful(&per_cpu(rcu_state, cpu)->callbacks))
> 		advance_cpu_callbacks(cpu);
>   }
> 
> 
> That fully preserves the !NOCB state of affairs while also dealing with
> the NOCB stuff. And the single remote read only gets really expensive
> once you go _very_ large or once the cpu in question actually touched
> the cacheline and moved it into exclusive mode due to writing to it; at
> which point you've saved yourself a wakeup and we're still faster.
> 
> It automatically deals with the full idle case, it basically gives you
> 'poll' behaviour for nr_running==1 and to me appears as the simplest and
> most straight fwd extension of the RCU model.
> 
> More importantly it does away with that wakeup that so often happens on
> nocb cpus. Although, rereading your email, I get the impression we do
> this wakeup even on !nocb cpus when CONFIG_NOCB=y, which seems another
> undesired feature.

The __call_rcu_nocb_enqueue() wakeup happens only when CONFIG_NOCB=y,
and even then only on CPUs that have actually been offloaded.
Now my patch does the checking even on non-offloaded CPUs, but this
still only happen on CONFIG_NOCB=y and is only a check of a
per-CPU variable.

The other wakeups in __call_rcu_core() only happen in special cases,
which I believe avoid this deadlock condition.

> Maybe you've already thought of this and there's a very good reason
> things aren't like this; but like said, I've been away for a little
> while and need to catch up a bit.

>From what I can see, what you suggest would work quite well in special
cases, but I still have to solve the general case.  If I solve the
general case, I don't believe I need to work on the special cases.

							Thanx, Paul


  reply	other threads:[~2013-10-07 13:11 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-10-03 19:08 tty/perf lockdep trace Dave Jones
2013-10-03 19:42 ` tty^Wrcu/perf " Peter Zijlstra
2013-10-03 19:58   ` Paul E. McKenney
2013-10-04  6:58     ` Peter Zijlstra
2013-10-04 16:03       ` Paul E. McKenney
2013-10-04 16:15         ` Paul E. McKenney
2013-10-04 16:50         ` Peter Zijlstra
2013-10-04 17:09           ` Paul E. McKenney
2013-10-04 18:52             ` Peter Zijlstra
2013-10-04 21:25               ` Paul E. McKenney
2013-10-04 22:02                 ` Paul E. McKenney
2013-10-05  0:23                   ` Paul E. McKenney
2013-10-07 11:24                     ` Peter Zijlstra
2013-10-07 12:59                       ` Paul E. McKenney
2013-10-05 16:05                 ` Peter Zijlstra
2013-10-05 16:28                   ` Paul E. McKenney
2013-10-05 19:59                     ` Peter Zijlstra
2013-10-05 22:03                       ` Paul E. McKenney
2013-10-07  8:42                         ` Peter Zijlstra
2013-10-07 13:11                           ` Paul E. McKenney [this message]
2013-10-07 17:35                     ` Paul E. McKenney

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20131007131102.GY5790@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=davej@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=peter@hurleysoftware.com \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).