Re: on memory barriers and cachelines

linux-arch.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
	Linus Torvalds <torvalds@linux-foundation.org>,
	benh <benh@kernel.crashing.org>, davem <davem@davemloft.net>,
	"H. Peter Anvin" <hpa@zytor.com>,
	Linux-Arch <linux-arch@vger.kernel.org>,
	Ingo Molnar <mingo@elte.hu>, dhowells <dhowells@redhat.com>
Subject: Re: on memory barriers and cachelines
Date: Fri, 10 Feb 2012 08:32:52 -0800	[thread overview]
Message-ID: <20120210163252.GG2458@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120210025129.GA9135@jl-vm1.vm.bytemark.co.uk>

On Fri, Feb 10, 2012 at 02:51:29AM +0000, Jamie Lokier wrote:
> Paul E. McKenney wrote:
> > On Wed, Feb 01, 2012 at 10:33:58AM +0100, Peter Zijlstra wrote:
> > > Hi all,
> > > 
> > > So I was talking to Paul yesterday and he mentioned how the SRCU sync
> > > primitive has to use extra synchronize_sched() calls in order to avoid
> > > smp_rmb() calls in the srcu_read_{un,}lock() calls.
> > > 
> > > Now memory barriers are usually explained as observable order between
> > > two (or more) unrelated variables, as Documentation/memory-barriers.txt
> > > does in great detail.
> > > 
> > > What I couldn't find in there though, is what happens when both
> > > variables are on the same cacheline. The "The effects of the CPU cache"
> > > and "Cache coherency" sections are closest but leave me wanting on this
> > > point.
> > > 
> > > Can we get some implicit behaviour from being on the same cacheline? Or
> > > can this memory access queue still totally wreck the game?
> > 
> > I don't know of any guarantees in this area, but am checking with
> > hardware architects for a couple of architectures.
> 
> On a related note:
> 
>    - What's to stop the compiler optimising away a data dependency,
>      converting it to a speculative control dependency?  Here's a
>      contrived example:
> 
>          ORIGINAL:
> 
>              int func(int *p)
>              {
>                  int index = p[0], first = p[1];
>                  read_barrier_depends(); /* do..while(0) on most archs */
> 		 return max(first, p[index]);
>              }
> 
>          OPTIMISED:
> 
>              int func(int *p)
>              {
>                  int index = p[0], val = p[1];
>                  if (index != 1)
>                      val = max(val, p[index]);
>                  return val;
>              }
> 
>      A quick search of the GCC manual for "speculation" and
>      "speculative" comes up with quite a few hits.  I've no idea if
>      they are relevant.

Well, that would be one reason why I did all that work to get
memory_order_consume into C++11.  ;-)

More seriously, you can defeat some of the speculative optimizations
by using ACCESS_ONCE():

                 int index = ACCESS_ONCE(p[0]), first = ACCESS_ONCE(p[1]);

This forces a volatile access which should make the compiler at least
a bit more reluctant to apply speculation optimizations.  And using
rcu_dereference_index_check() in the kernel packages the ACCESS_ONCE()
and the smp_read_barrier_depends().

>    - If I understood correctly, IA64 has explicit special registers to
>      assist data-memory speculation by the compiler.  These would be
>      the ALAT registers.  I don't know if they are used in a way that
>      affects RCU, but they do appear in the GCC machine description,
>      and in the manual some kinds of "data speculative scheduling" are
>      enabled by default.  But read_barrier_depends() is a do {} while
>      on IA64.

As I understand it, the ALAT registers do respect dependency ordering.
But you would need to talk to an IA64 hardware architect and an IA64
compiler expert to get the whole story.

>    - The GCC manual mentions data speculation in conjunction with
>      Blackfin as well.  I have no idea if it's relevant, but Blackfin
>      does at least define read_barrier_depends() in an interesting way,
>      sometimes.

Are there SMP blackfin systems now?  There were not last I checked,
and these issues matter only on SMP.

>    - I read that ARM can do speculative memory loads these days.  It
>      complicates DMA.  But are they implemented by speculative
>      preloading into the cache, or by speculatively executing load
>      instructions whose results are predicated on a control path
>      taken?  If the latter, is an empty read_barrier_depends() still
>      ok on ARM?

But ARM does guarantee dependency ordering, so whatever it does to
speculate, it must validate -- the results must be as if the hardware
had done no speculation.

							Thanx, Paul

next prev parent reply	other threads:[~2012-02-10 16:33 UTC|newest]

Thread overview: 8+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-02-01  9:33 on memory barriers and cachelines Peter Zijlstra
2012-02-01 14:22 ` Paul E. McKenney
2012-02-10  2:51   ` Jamie Lokier
2012-02-10 16:32     ` Paul E. McKenney [this message]
2012-02-10 18:13       ` Peter Zijlstra
2012-02-10 18:47         ` Paul E. McKenney
2012-02-01 17:17 ` Linus Torvalds
2012-02-01 17:29   ` Peter Zijlstra

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20120210163252.GG2458@linux.vnet.ibm.com \
    --to=paulmck@linux.vnet.ibm.com \
    --cc=benh@kernel.crashing.org \
    --cc=davem@davemloft.net \
    --cc=dhowells@redhat.com \
    --cc=hpa@zytor.com \
    --cc=jamie@shareable.org \
    --cc=linux-arch@vger.kernel.org \
    --cc=mingo@elte.hu \
    --cc=peterz@infradead.org \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).