Re: Alternative to signals/sys_membarrier() in liburcu

lttng-dev.lists.lttng.org archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Sullivan <sully@msully.net>,
	lttng-dev@lists.lttng.org, LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: Alternative to signals/sys_membarrier() in liburcu
Date: Fri, 13 Mar 2015 07:18:53 -0700	[thread overview]
Message-ID: <20150313141853.GE5412@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150313080743.GA21156@gmail.com>

On Fri, Mar 13, 2015 at 09:07:43AM +0100, Ingo Molnar wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> > ----- Original Message -----
> > > From: "Linus Torvalds" <torvalds@linux-foundation.org>
> > > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > > Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E.
> > > McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>,
> > > "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org>
> > > Sent: Thursday, March 12, 2015 5:47:05 PM
> > > Subject: Re: Alternative to signals/sys_membarrier() in liburcu
> > > 
> > > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers
> > > <mathieu.desnoyers@efficios.com> wrote:
> > > >
> > > > So the question as it stands appears to be: would you be comfortable
> > > > having users abuse mprotect(), relying on its side-effect of issuing
> > > > a smp_mb() on each targeted CPU for the TLB shootdown, as
> > > > an effective implementation of process-wide memory barrier ?
> > > 
> > > Be *very* careful.
> > > 
> > > Just yesterday, in another thread (discussing the auto-numa TLB 
> > > performance regression), we were discussing skipping the TLB 
> > > invalidates entirely if the mprotect relaxes the protections.
> 
> We have such code already in mm/mprotect.c, introduced in:
> 
>   10c1045f28e8 mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries
> 
> which does:
> 
>                                 /* Avoid TLB flush if possible */
>                                 if (pte_protnone(oldpte))
>                                         continue;
> 
> > > Because if you *used* to be read-only, and them mprotect() 
> > > something so that it is read-write, there really is no need to 
> > > send a TLB invalidate, at least on x86. You can just change the 
> > > page tables, and *if* any entries are stale in the TLB they'll 
> > > take a microfault on access and then just reload the TLB.
> > > 
> > > So mprotect() to a more permissive mode is not necessarily 
> > > serializing.
> > 
> > The idea here is to always mprotect() to a more restrictive mode, 
> > which should trigger the TLB shootdown.
> 
> So what happens if a CPU comes around that integrates TLB shootdown 
> management into its cache coherency protocol? In such a case IPI 
> traffic can be skipped: the memory bus messages take care of TLB 
> flushes in most cases.
> 
> It's a natural optimization IMHO, because TLB flushes are conceptually 
> pretty close to the synchronization mechanisms inherent in data cache 
> coherency protocols:
> 
> This could be implemented for example by a CPU that knows about ptes 
> and handles their modification differently: when a pte is modified it 
> will broadcast a MESI invalidation message not just for the cacheline 
> belonging to the pte's physical address, but also an 'invalidate TLB' 
> MESI message for the pte value's page.
> 
> The TLB shootdown would either be guaranteed within the MESI 
> transaction, or there would either be a deterministic timing 
> guarantee, or some explicit synchronization mechanism (new 
> instruction) to make sure the remote TLB(s) got shot down.
> 
> Every form of this would be way faster than sending interrupts. New 
> OSs could support this by the hardware telling them in which cases the 
> TLBs are 'auto-flushed', while old OSs would still be compatible by 
> sending (now pointless) TLB shootdown IPIs.
> 
> So it's a relatively straightforward hardware optimization IMHO: 
> assuming TLB flushes are considered important enough to complicate the 
> cacheline state machine (which I think they currently aren't).
> 
> So in this case there's no interrupt and no other interruption of the 
> remote CPU's flow of execution in any fashion that could advance the 
> RCU state machine.
> 
> What do you think?

I agree -- there really have been systems able to flush remote TLBs
without interrupting the remote CPU.

So, given the fact that the userspace RCU library does now see
some real-world use, is it now time for Mathieu to resubmit his
sys_membarrier() patch?

							Thanx, Paul

next prev parent reply	other threads:[~2015-03-13 14:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANW5cDmTCM9ZmhN7-2eWUEYvD+Y=sGt2i7mecdPTTLHMcT8fPg@mail.gmail.com>
2015-02-12 13:57 ` Alternative to signals/sys_membarrier() in liburcu Duncan Sands
2015-03-12 14:57 ` Mathieu Desnoyers
     [not found] ` <54DCB15F.80505@free.fr>
2015-03-12 14:58   ` Mathieu Desnoyers
     [not found] ` <867044376.285926.1426172227750.JavaMail.zimbra@efficios.com>
2015-03-12 16:04   ` Michael Sullivan
     [not found]   ` <CANW5cDkiZoysNM3rqb4v6Tj996ocsaSh=OZoBLfp4h7ZGb4bxg@mail.gmail.com>
2015-03-12 20:53     ` Mathieu Desnoyers
     [not found]     ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com>
2015-03-12 20:56       ` Mathieu Desnoyers
2015-03-12 21:12         ` Paul E. McKenney
2015-03-14 21:06           ` Benjamin Herrenschmidt
2015-03-12 23:59         ` One Thousand Gnomes
2015-03-13  0:43           ` Mathieu Desnoyers
2015-03-12 21:47       ` Linus Torvalds
2015-03-12 22:30         ` Mathieu Desnoyers
2015-03-13  8:07           ` Ingo Molnar
2015-03-13 14:18             ` Paul E. McKenney [this message]
2015-03-23  9:35               ` [lttng-dev] " Duncan Sands
2015-02-11  0:03 Michael Sullivan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150313141853.GE5412@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lttng-dev@lists.lttng.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sully@msully.net \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).