Re: Alternative to signals/sys_membarrier() in liburcu

All of lore.kernel.org
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Ingo Molnar <mingo@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	Michael Sullivan <sully@msully.net>,
	lttng-dev@lists.lttng.org, LKML <linux-kernel@vger.kernel.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Thomas Gleixner <tglx@linutronix.de>,
	Steven Rostedt <rostedt@goodmis.org>
Subject: Re: Alternative to signals/sys_membarrier() in liburcu
Date: Fri, 13 Mar 2015 07:18:53 -0700	[thread overview]
Message-ID: <20150313141853.GE5412@linux.vnet.ibm.com> (raw)
In-Reply-To: <20150313080743.GA21156@gmail.com>

On Fri, Mar 13, 2015 at 09:07:43AM +0100, Ingo Molnar wrote:
> 
> * Mathieu Desnoyers <mathieu.desnoyers@efficios.com> wrote:
> 
> > ----- Original Message -----
> > > From: "Linus Torvalds" <torvalds@linux-foundation.org>
> > > To: "Mathieu Desnoyers" <mathieu.desnoyers@efficios.com>
> > > Cc: "Michael Sullivan" <sully@msully.net>, lttng-dev@lists.lttng.org, "LKML" <linux-kernel@vger.kernel.org>, "Paul E.
> > > McKenney" <paulmck@linux.vnet.ibm.com>, "Peter Zijlstra" <peterz@infradead.org>, "Ingo Molnar" <mingo@kernel.org>,
> > > "Thomas Gleixner" <tglx@linutronix.de>, "Steven Rostedt" <rostedt@goodmis.org>
> > > Sent: Thursday, March 12, 2015 5:47:05 PM
> > > Subject: Re: Alternative to signals/sys_membarrier() in liburcu
> > > 
> > > On Thu, Mar 12, 2015 at 1:53 PM, Mathieu Desnoyers
> > > <mathieu.desnoyers@efficios.com> wrote:
> > > >
> > > > So the question as it stands appears to be: would you be comfortable
> > > > having users abuse mprotect(), relying on its side-effect of issuing
> > > > a smp_mb() on each targeted CPU for the TLB shootdown, as
> > > > an effective implementation of process-wide memory barrier ?
> > > 
> > > Be *very* careful.
> > > 
> > > Just yesterday, in another thread (discussing the auto-numa TLB 
> > > performance regression), we were discussing skipping the TLB 
> > > invalidates entirely if the mprotect relaxes the protections.
> 
> We have such code already in mm/mprotect.c, introduced in:
> 
>   10c1045f28e8 mm: numa: avoid unnecessary TLB flushes when setting NUMA hinting entries
> 
> which does:
> 
>                                 /* Avoid TLB flush if possible */
>                                 if (pte_protnone(oldpte))
>                                         continue;
> 
> > > Because if you *used* to be read-only, and them mprotect() 
> > > something so that it is read-write, there really is no need to 
> > > send a TLB invalidate, at least on x86. You can just change the 
> > > page tables, and *if* any entries are stale in the TLB they'll 
> > > take a microfault on access and then just reload the TLB.
> > > 
> > > So mprotect() to a more permissive mode is not necessarily 
> > > serializing.
> > 
> > The idea here is to always mprotect() to a more restrictive mode, 
> > which should trigger the TLB shootdown.
> 
> So what happens if a CPU comes around that integrates TLB shootdown 
> management into its cache coherency protocol? In such a case IPI 
> traffic can be skipped: the memory bus messages take care of TLB 
> flushes in most cases.
> 
> It's a natural optimization IMHO, because TLB flushes are conceptually 
> pretty close to the synchronization mechanisms inherent in data cache 
> coherency protocols:
> 
> This could be implemented for example by a CPU that knows about ptes 
> and handles their modification differently: when a pte is modified it 
> will broadcast a MESI invalidation message not just for the cacheline 
> belonging to the pte's physical address, but also an 'invalidate TLB' 
> MESI message for the pte value's page.
> 
> The TLB shootdown would either be guaranteed within the MESI 
> transaction, or there would either be a deterministic timing 
> guarantee, or some explicit synchronization mechanism (new 
> instruction) to make sure the remote TLB(s) got shot down.
> 
> Every form of this would be way faster than sending interrupts. New 
> OSs could support this by the hardware telling them in which cases the 
> TLBs are 'auto-flushed', while old OSs would still be compatible by 
> sending (now pointless) TLB shootdown IPIs.
> 
> So it's a relatively straightforward hardware optimization IMHO: 
> assuming TLB flushes are considered important enough to complicate the 
> cacheline state machine (which I think they currently aren't).
> 
> So in this case there's no interrupt and no other interruption of the 
> remote CPU's flow of execution in any fashion that could advance the 
> RCU state machine.
> 
> What do you think?

I agree -- there really have been systems able to flush remote TLBs
without interrupting the remote CPU.

So, given the fact that the userspace RCU library does now see
some real-world use, is it now time for Mathieu to resubmit his
sys_membarrier() patch?

							Thanx, Paul

next prev parent reply	other threads:[~2015-03-13 14:18 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CANW5cDmTCM9ZmhN7-2eWUEYvD+Y=sGt2i7mecdPTTLHMcT8fPg@mail.gmail.com>
2015-02-12 13:57 ` Alternative to signals/sys_membarrier() in liburcu Duncan Sands
2015-03-12 14:57 ` Mathieu Desnoyers
     [not found] ` <54DCB15F.80505@free.fr>
2015-03-12 14:58   ` Mathieu Desnoyers
     [not found] ` <867044376.285926.1426172227750.JavaMail.zimbra@efficios.com>
2015-03-12 16:04   ` Michael Sullivan
     [not found]   ` <CANW5cDkiZoysNM3rqb4v6Tj996ocsaSh=OZoBLfp4h7ZGb4bxg@mail.gmail.com>
2015-03-12 20:53     ` Mathieu Desnoyers
     [not found]     ` <666590480.287502.1426193588471.JavaMail.zimbra@efficios.com>
2015-03-12 20:56       ` Mathieu Desnoyers
2015-03-12 21:12         ` Paul E. McKenney
2015-03-14 21:06           ` Benjamin Herrenschmidt
2015-03-12 23:59         ` One Thousand Gnomes
2015-03-13  0:43           ` Mathieu Desnoyers
2015-03-12 21:47       ` Linus Torvalds
2015-03-12 22:30         ` Mathieu Desnoyers
2015-03-13  8:07           ` Ingo Molnar
2015-03-13 14:18             ` Paul E. McKenney [this message]
2015-03-23  9:35               ` [lttng-dev] " Duncan Sands
2015-02-11  0:03 Michael Sullivan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20150313141853.GE5412@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lttng-dev@lists.lttng.org \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    --cc=rostedt@goodmis.org \
    --cc=sully@msully.net \
    --cc=tglx@linutronix.de \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.