public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Frederic Weisbecker <fweisbec@gmail.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>,
	"Paul E. McKenney" <paulmck@linux.vnet.ibm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Josh Triplett <josh@joshtriplett.org>,
	KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com>,
	rostedt <rostedt@goodmis.org>,
	Nicholas Miell <nmiell@comcast.net>,
	Ingo Molnar <mingo@redhat.com>,
	One Thousand Gnomes <gnomes@lxorguk.ukuu.org.uk>,
	Lai Jiangshan <laijs@cn.fujitsu.com>,
	Stephen Hemminger <stephen@networkplumber.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Peter Zijlstra <peterz@infradead.org>,
	David Howells <dhowells@redhat.com>,
	bobby prani <bobby.prani@gmail.com>,
	Michael Kerrisk <mtk.manpages@gmail.com>,
	Shuah Khan <shuahkh@osg.samsung.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: Re: [RFC PATCH] membarrier: handle nohz_full with expedited thread registration
Date: Tue, 17 Jan 2017 04:55:22 +0100	[thread overview]
Message-ID: <20170117035521.GB2787@lerouge> (raw)
In-Reply-To: <1587103499.5523.1484607367124.JavaMail.zimbra@efficios.com>

On Mon, Jan 16, 2017 at 10:56:07PM +0000, Mathieu Desnoyers wrote:
> ----- On Jan 16, 2017, at 3:15 PM, Linus Torvalds torvalds@linux-foundation.org wrote:
> 
> > Excuse my french, but this looks like incredible shit to me.
> 
> I'm currently trying to figure out how we can get membarrier
> to play nicely with recent no-hz kernel features. Indeed, my
> initial prototype is a mess. The good news is based on the number
> of flaws you found in this RFC, there is plenty of room for
> improvement. :)
> 
> > 
> > On Mon, Jan 16, 2017 at 11:51 AM, Mathieu Desnoyers
> > <mathieu.desnoyers@efficios.com> wrote:
> >> +
> >> +static int membarrier_register_expedited(struct task_struct *t)
> >> +{
> >> +       struct rq *rq;
> >> +
> >> +       if (t->membarrier_expedited == UINT_MAX)
> >> +               return -EOVERFLOW;
> >> +       rq = this_rq();
> >> +       raw_spin_lock(&rq->lock);
> >> +       t->membarrier_expedited++;
> >> +       raw_spin_unlock(&rq->lock);
> >> +       return 0;
> >> +}
> > 
> > Yeah, taking the rq lock with
> > 
> > (a) a random "rq" that isn't stable
> > 
> > (b) without disabling interrupts
> 
> So for both register and unregister functions, as well as the use in
> membarrier_nohz_full_expedited(), disabling interrupts around the rq
> lock should fix this. But perhaps it would be wiser trying not to use the
> rq lock at all.
> 
> > 
> > (c) using an internal scheduler helper that isn't supposed to be used
> > externally
> 
> Yeah, I hate doing that. Hence the TODO comment I've placed near the include:
> 
>  * TODO: private sched.h is needed for runqueue. Should we move the
>  * sched code under kernel/sched/ ?
> 
> I'm open to better ideas.

I haven't thought the problem very deep yet, now at a quick glance, it seems to me
that if we want to make sure we spare the IRQ for nohz_full CPUs unless they
run a task that carries that membarrier_expedited flag, then I fear we have to take
the target rq lock.

So either we do that and the rq related code should move to kernel/sched/, or we
can think of some alternative.

Here is one that is not very elegant but avoids the rq lock from the membarrier
request side.

We could do things the other way around: when the nohz_full task does
membarrier_register_expedited(), it increments a counter on all CPUs it is affine
to and sends a global IPI (or rather only the CPUs outside the nohz_full range + those with
that positive counter) so that all CPUs see that update immediately. Then when
a membarrier request happens somewhere we know where we are allowed to send the IPI.
We can maintain a cpumask based on top of those per-CPU counters.

Now that's a bit of a brute method because we blindly poke a range of CPUs where
a given task might run but it avoids the rq lock on the membarrier request side.

That idea can be utterly simplified if membarrier_register_expedited() were to be
called for a CPU instead of a task. Also we wouldn't bother about synchronization
against concurrent task affinity changes (which might otherwise require rq lock from the
expedited register path).

But well we may as well exit the CPU nohz_full state for the timeframe where we need
to care about membarrier.

In fact due to the complexity involved, I have to ask first if we really need this feature.
Typically nohz_full workloads don't want to be disturbed at all, so do we have real
and significant usecases of CPU isolation workloads that want to be concerned by
this membarrier so much that they can tolerate some random IRQ?

Thanks.

  parent reply	other threads:[~2017-01-17  3:55 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-01-16 19:51 [RFC PATCH] membarrier: handle nohz_full with expedited thread registration Mathieu Desnoyers
2017-01-16 20:15 ` Linus Torvalds
2017-01-16 22:56   ` Mathieu Desnoyers
2017-01-16 23:50     ` Linus Torvalds
2017-01-17  2:09       ` Mathieu Desnoyers
2017-01-17  3:55     ` Frederic Weisbecker [this message]
2017-01-17 20:53       ` Paul E. McKenney
2017-01-18 11:00         ` Peter Zijlstra
2017-01-19 22:01           ` Paul McKenney
2017-01-17 21:56       ` Mathieu Desnoyers
2017-01-17 23:29         ` Steven Rostedt

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170117035521.GB2787@lerouge \
    --to=fweisbec@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=bobby.prani@gmail.com \
    --cc=dhowells@redhat.com \
    --cc=gnomes@lxorguk.ukuu.org.uk \
    --cc=josh@joshtriplett.org \
    --cc=kosaki.motohiro@jp.fujitsu.com \
    --cc=laijs@cn.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@redhat.com \
    --cc=mtk.manpages@gmail.com \
    --cc=nmiell@comcast.net \
    --cc=paulmck@linux.vnet.ibm.com \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=shuahkh@osg.samsung.com \
    --cc=stephen@networkplumber.org \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox