From mboxrd@z Thu Jan  1 00:00:00 1970
From: "Paul E. McKenney" <paulmck@linux.vnet.ibm.com>
Subject: Re: [RFC][PATCH 0/5] arch: atomic rework
Date: Thu, 6 Feb 2014 14:11:18 -0800
Message-ID: <20140206221117.GJ4250@linux.vnet.ibm.com>
References: <20140206134825.305510953@infradead.org>
 <21984.1391711149@warthog.procyon.org.uk>
 <52F3DA85.1060209@arm.com>
 <20140206185910.GE27276@mudshark.cambridge.arm.com>
 <20140206192743.GH4250@linux.vnet.ibm.com>
 <1391721423.23421.3898.camel@triegel.csb>
Reply-To: paulmck@linux.vnet.ibm.com
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: QUOTED-PRINTABLE
Return-path: <linux-arch-owner@vger.kernel.org>
Received: from e34.co.us.ibm.com ([32.97.110.152]:36728 "EHLO
	e34.co.us.ibm.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1752497AbaBFWLW (ORCPT
	<rfc822;linux-arch@vger.kernel.org>); Thu, 6 Feb 2014 17:11:22 -0500
Received: from /spool/local
	by e34.co.us.ibm.com with IBM ESMTP SMTP Gateway: Authorized Use Only! Violators will be prosecuted
	for <linux-arch@vger.kernel.org> from <paulmck@linux.vnet.ibm.com>;
	Thu, 6 Feb 2014 15:11:22 -0700
Content-Disposition: inline
In-Reply-To: <1391721423.23421.3898.camel@triegel.csb>
Sender: linux-arch-owner@vger.kernel.org
List-ID: <linux-arch.vger.kernel.org>
To: Torvald Riegel <triegel@redhat.com>
Cc: Will Deacon <will.deacon@arm.com>, Ramana Radhakrishnan <Ramana.Radhakrishnan@arm.com>, David Howells <dhowells@redhat.com>, Peter Zijlstra <peterz@infradead.org>, "linux-arch@vger.kernel.org" <linux-arch@vger.kernel.org>, "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>, "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>, "akpm@linux-foundation.org" <akpm@linux-foundation.org>, "mingo@kernel.org" <mingo@kernel.org>, "gcc@gcc.gnu.org" <gcc@gcc.gnu.org>

On Thu, Feb 06, 2014 at 10:17:03PM +0100, Torvald Riegel wrote:
> On Thu, 2014-02-06 at 11:27 -0800, Paul E. McKenney wrote:
> > On Thu, Feb 06, 2014 at 06:59:10PM +0000, Will Deacon wrote:
> > > On Thu, Feb 06, 2014 at 06:55:01PM +0000, Ramana Radhakrishnan wr=
ote:
> > > > On 02/06/14 18:25, David Howells wrote:
> > > > >
> > > > > Is it worth considering a move towards using C11 atomics and =
barriers and
> > > > > compiler intrinsics inside the kernel?  The compiler _ought_ =
to be able to do
> > > > > these.
> > > >=20
> > > >=20
> > > > It sounds interesting to me, if we can make it work properly an=
d=20
> > > > reliably. + gcc@gcc.gnu.org for others in the GCC community to =
chip in.
> > >=20
> > > Given my (albeit limited) experience playing with the C11 spec an=
d GCC, I
> > > really think this is a bad idea for the kernel. It seems that nob=
ody really
> > > agrees on exactly how the C11 atomics map to real architectural
> > > instructions on anything but the trivial architectures. For examp=
le, should
> > > the following code fire the assert?
> > >=20
> > >=20
> > > extern atomic<int> foo, bar, baz;
> > >=20
> > > void thread1(void)
> > > {
> > > 	foo.store(42, memory_order_relaxed);
> > > 	bar.fetch_add(1, memory_order_seq_cst);
> > > 	baz.store(42, memory_order_relaxed);
> > > }
> > >=20
> > > void thread2(void)
> > > {
> > > 	while (baz.load(memory_order_seq_cst) !=3D 42) {
> > > 		/* do nothing */
> > > 	}
> > >=20
> > > 	assert(foo.load(memory_order_seq_cst) =3D=3D 42);
> > > }
> > >=20
> > >=20
> > > To answer that question, you need to go and look at the definitio=
ns of
> > > synchronises-with, happens-before, dependency_ordered_before and =
a whole
> > > pile of vaguely written waffle to realise that you don't know. Ce=
rtainly,
> > > the code that arm64 GCC currently spits out would allow the asser=
tion to fire
> > > on some microarchitectures.
> >=20
> > Yep!  I believe that a memory_order_seq_cst fence in combination wi=
th the
> > fetch_add() would do the trick on many architectures, however.  All=
 of
> > this is one reason that any C11 definitions need to be individually
> > overridable by individual architectures.
>=20
> "Overridable" in which sense?  Do you want to change the semantics on
> the language level in the sense of altering the memory model, or rath=
er
> use a different implementation under the hood to, for example, fix
> deficiencies in the compilers?

We need the architecture maintainer to be able to select either an
assembly-language implementation or a C11-atomics implementation for an=
y
given Linux-kernel operation.  For example, a given architecture might
be able to use fetch_add(1, memory_order_relaxed) for atomic_inc() but
assembly for atomic_add_return().  This is because atomic_inc() is not
required to have any particular ordering properties, while as discussed
previously, atomic_add_return() requires tighter ordering than the C11
standard provides.

> > > There are also so many ways to blow your head off it's untrue. Fo=
r example,
> > > cmpxchg takes a separate memory model parameter for failure and s=
uccess, but
> > > then there are restrictions on the sets you can use for each. It'=
s not hard
> > > to find well-known memory-ordering experts shouting "Just use
> > > memory_model_seq_cst for everything, it's too hard otherwise". Th=
en there's
> > > the fun of load-consume vs load-acquire (arm64 GCC completely ign=
ores consume
> > > atm and optimises all of the data dependencies away) as well as t=
he definition
> > > of "data races", which seem to be used as an excuse to miscompile=
 a program
> > > at the earliest opportunity.
> >=20
> > Trust me, rcu_dereference() is not going to be defined in terms of
> > memory_order_consume until the compilers implement it both correctl=
y and
> > efficiently.  They are not there yet, and there is currently no sho=
rtage
> > of compiler writers who would prefer to ignore memory_order_consume=
=2E
>=20
> Do you have any input on
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=3D59448?  In particular, =
the
> language standard's definition of dependencies?

Let's see...  1.10p9 says that a dependency must be carried unless:

=E2=80=94 B is an invocation of any specialization of std::kill_depende=
ncy (29.3), or
=E2=80=94 A is the left operand of a built-in logical AND (&&, see 5.14=
) or logical OR (||, see 5.15) operator,
or
=E2=80=94 A is the left operand of a conditional (?:, see 5.16) operato=
r, or
=E2=80=94 A is the left operand of the built-in comma (,) operator (5.1=
8);

So the use of "flag" before the "?" is ignored.  But the "flag - flag"
after the "?" will carry a dependency, so the code fragment in 59448
needs to do the ordering rather than just optimizing "flag - flag" out
of existence.  One way to do that on both ARM and Power is to actually
emit code for "flag - flag", but there are a number of other ways to
make that work.

BTW, there is some discussion on 1.10p9's handling of && and ||, and
that clause is likely to change.  And yes, I am behind on analyzing
usage in the Linux kernel to find out if Linux cares...

> > And rcu_dereference() will need per-arch overrides for some time du=
ring
> > any transition to memory_order_consume.
> >=20
> > > Trying to introduce system concepts (writes to devices, interrupt=
s,
> > > non-coherent agents) into this mess is going to be an uphill batt=
le IMHO. I'd
> > > just rather stick to the semantics we have and the asm volatile b=
arriers.
> >=20
> > And barrier() isn't going to go away any time soon, either.  And
> > ACCESS_ONCE() needs to keep volatile semantics until there is some
> > memory_order_whatever that prevents loads and stores from being coa=
lesced.
>=20
> I'd be happy to discuss something like this in ISO C++ SG1 (or has th=
is
> been discussed in the past already?).  But it needs to have a paper I
> suppose.

The current position of the usual suspects other than me is that this
falls into the category of forward-progress guarantees, which are
considers (again, by the usual suspects other than me) to be out
of scope.

> Will you be in Issaquah for the C++ meeting next week?

Weather permitting, I will be there!

							Thanx, Paul