From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
To: Jamie Lokier <jamie@shareable.org>
Cc: Peter Zijlstra <peterz@infradead.org>,
Linus Torvalds <torvalds@linux-foundation.org>,
benh <benh@kernel.crashing.org>, davem <davem@davemloft.net>,
"H. Peter Anvin" <hpa@zytor.com>,
Linux-Arch <linux-arch@vger.kernel.org>,
Ingo Molnar <mingo@elte.hu>, dhowells <dhowells@redhat.com>
Subject: Re: on memory barriers and cachelines
Date: Fri, 10 Feb 2012 08:32:52 -0800 [thread overview]
Message-ID: <20120210163252.GG2458@linux.vnet.ibm.com> (raw)
In-Reply-To: <20120210025129.GA9135@jl-vm1.vm.bytemark.co.uk>
On Fri, Feb 10, 2012 at 02:51:29AM +0000, Jamie Lokier wrote:
> Paul E. McKenney wrote:
> > On Wed, Feb 01, 2012 at 10:33:58AM +0100, Peter Zijlstra wrote:
> > > Hi all,
> > >
> > > So I was talking to Paul yesterday and he mentioned how the SRCU sync
> > > primitive has to use extra synchronize_sched() calls in order to avoid
> > > smp_rmb() calls in the srcu_read_{un,}lock() calls.
> > >
> > > Now memory barriers are usually explained as observable order between
> > > two (or more) unrelated variables, as Documentation/memory-barriers.txt
> > > does in great detail.
> > >
> > > What I couldn't find in there though, is what happens when both
> > > variables are on the same cacheline. The "The effects of the CPU cache"
> > > and "Cache coherency" sections are closest but leave me wanting on this
> > > point.
> > >
> > > Can we get some implicit behaviour from being on the same cacheline? Or
> > > can this memory access queue still totally wreck the game?
> >
> > I don't know of any guarantees in this area, but am checking with
> > hardware architects for a couple of architectures.
>
> On a related note:
>
> - What's to stop the compiler optimising away a data dependency,
> converting it to a speculative control dependency? Here's a
> contrived example:
>
> ORIGINAL:
>
> int func(int *p)
> {
> int index = p[0], first = p[1];
> read_barrier_depends(); /* do..while(0) on most archs */
> return max(first, p[index]);
> }
>
> OPTIMISED:
>
> int func(int *p)
> {
> int index = p[0], val = p[1];
> if (index != 1)
> val = max(val, p[index]);
> return val;
> }
>
> A quick search of the GCC manual for "speculation" and
> "speculative" comes up with quite a few hits. I've no idea if
> they are relevant.
Well, that would be one reason why I did all that work to get
memory_order_consume into C++11. ;-)
More seriously, you can defeat some of the speculative optimizations
by using ACCESS_ONCE():
int index = ACCESS_ONCE(p[0]), first = ACCESS_ONCE(p[1]);
This forces a volatile access which should make the compiler at least
a bit more reluctant to apply speculation optimizations. And using
rcu_dereference_index_check() in the kernel packages the ACCESS_ONCE()
and the smp_read_barrier_depends().
> - If I understood correctly, IA64 has explicit special registers to
> assist data-memory speculation by the compiler. These would be
> the ALAT registers. I don't know if they are used in a way that
> affects RCU, but they do appear in the GCC machine description,
> and in the manual some kinds of "data speculative scheduling" are
> enabled by default. But read_barrier_depends() is a do {} while
> on IA64.
As I understand it, the ALAT registers do respect dependency ordering.
But you would need to talk to an IA64 hardware architect and an IA64
compiler expert to get the whole story.
> - The GCC manual mentions data speculation in conjunction with
> Blackfin as well. I have no idea if it's relevant, but Blackfin
> does at least define read_barrier_depends() in an interesting way,
> sometimes.
Are there SMP blackfin systems now? There were not last I checked,
and these issues matter only on SMP.
> - I read that ARM can do speculative memory loads these days. It
> complicates DMA. But are they implemented by speculative
> preloading into the cache, or by speculatively executing load
> instructions whose results are predicated on a control path
> taken? If the latter, is an empty read_barrier_depends() still
> ok on ARM?
But ARM does guarantee dependency ordering, so whatever it does to
speculate, it must validate -- the results must be as if the hardware
had done no speculation.
Thanx, Paul
next prev parent reply other threads:[~2012-02-10 16:33 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-02-01 9:33 on memory barriers and cachelines Peter Zijlstra
2012-02-01 14:22 ` Paul E. McKenney
2012-02-10 2:51 ` Jamie Lokier
2012-02-10 16:32 ` Paul E. McKenney [this message]
2012-02-10 18:13 ` Peter Zijlstra
2012-02-10 18:47 ` Paul E. McKenney
2012-02-01 17:17 ` Linus Torvalds
2012-02-01 17:29 ` Peter Zijlstra
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20120210163252.GG2458@linux.vnet.ibm.com \
--to=paulmck@linux.vnet.ibm.com \
--cc=benh@kernel.crashing.org \
--cc=davem@davemloft.net \
--cc=dhowells@redhat.com \
--cc=hpa@zytor.com \
--cc=jamie@shareable.org \
--cc=linux-arch@vger.kernel.org \
--cc=mingo@elte.hu \
--cc=peterz@infradead.org \
--cc=torvalds@linux-foundation.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).