From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:45205)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UpZgu-0000FK-B4
	for qemu-devel@nongnu.org; Thu, 20 Jun 2013 03:54:01 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UpZgr-0005zX-QY
	for qemu-devel@nongnu.org; Thu, 20 Jun 2013 03:54:00 -0400
Received: from mail-ee0-x232.google.com ([2a00:1450:4013:c00::232]:40514)
	by eggs.gnu.org with esmtp (Exim 4.71)
	(envelope-from <paolo.bonzini@gmail.com>) id 1UpZgr-0005zN-Fs
	for qemu-devel@nongnu.org; Thu, 20 Jun 2013 03:53:57 -0400
Received: by mail-ee0-f50.google.com with SMTP id d49so3758996eek.9
	for <qemu-devel@nongnu.org>; Thu, 20 Jun 2013 00:53:56 -0700 (PDT)
Sender: Paolo Bonzini <paolo.bonzini@gmail.com>
Message-ID: <51C2B50D.90807@redhat.com>
Date: Thu, 20 Jun 2013 09:53:49 +0200
From: Paolo Bonzini <pbonzini@redhat.com>
MIME-Version: 1.0
References: <1371381681-14252-1-git-send-email-pingfanl@linux.vnet.ibm.com>
	<1371381681-14252-2-git-send-email-pingfanl@linux.vnet.ibm.com>
	<51BF5C0F.6020209@twiddle.net> <51C05F88.2090308@redhat.com>
	<20130618145033.GN5146@linux.vnet.ibm.com>
	<51C085EF.1040303@redhat.com>
	<1371573518.16968.23603.camel@triegel.csb>
	<51C17A5D.909@redhat.com>
	<1371647713.16968.25060.camel@triegel.csb>
	<51C1CAE3.6050908@redhat.com>
	<1371673503.16968.25960.camel@triegel.csb>
In-Reply-To: <1371673503.16968.25960.camel@triegel.csb>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] Java volatile vs. C11 seq_cst (was Re: [PATCH v2
 1/2] add a header file for atomic operations)
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Torvald Riegel <triegel@redhat.com>
Cc: Andrew Haley <aph@redhat.com>, qemu-devel@nongnu.org, Liu Ping Fan <qemulist@gmail.com>, Anthony Liguori <anthony@codemonkey.ws>, paulmck@linux.vnet.ibm.com, Richard Henderson <rth@twiddle.net>

Il 19/06/2013 22:25, Torvald Riegel ha scritto:
> On Wed, 2013-06-19 at 17:14 +0200, Paolo Bonzini wrote:
>> (1) I don't care about relaxed RMW ops (loads/stores occur in hot paths,
>> but RMW shouldn't be that bad.  I don't care if reference counting is a
>> little slower than it could be, for example);
> 
> I doubt relaxed RMW ops are sufficient even for reference counting.

They are enough on the increment side, or so says boost...

http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters

>>    [An aside: Java guarantees that volatile stores are not reordered
>>    with volatile loads.  This is not guaranteed by just using release
>>    stores and acquire stores, and is why IIUC acq_rel < Java < seq_cst].
>
> Or maybe Java volatile is acq for loads and seq_cst for stores...

Perhaps (but I'm not 100% sure).

>> As long as you only have a producer and a consumer, C11 is fine, because
>> all you need is load-acquire/store-release.  In fact, if it weren't for
>> the experience factor, C11 is easier than manually placing acquire and
>> release barriers.  But as soon as two or more threads are reading _and_
>> writing the shared memory, it gets complicated and I want to provide
>> something simple that people can use.  This is the reason for (2) above.
> 
> I can't quite follow you here.  There is a total order for all
> modifications to a single variable, and if you use acq/rel combined with
> loads and stores on this variable, then you basically can make use of
> the total order.  (All loads that read-from a certain store get a
> synchronized-with (and thus happens-before edge) with the store, and the
> stores are in a total order.)  This is independent of the number of
> readers and writers.  The difference starts once you want to sync with
> more than one variable, and need to establish an order between those
> accesses.

You're right of course.  More specifically when there is a thread where
some variables are stored while others are loaded.

>> There will still be a few cases that need to be optimized, and here are
>> where the difficult requirements come:
>>
>> (R1) the primitives *should* not be alien to people who know Linux.
>>
>> (R2) those optimizations *must* be easy to do and review; at least as
>> easy as these things go.
>>
>> The two are obviously related.  Ease of review is why it is important to
>> make things familiar to people who know Linux.
>>
>> In C11, relaxing SC loads and stores is complicated, and more
>> specifically hard to explain!
> 
> I can't see why that would be harder than reasoning about equally weaker
> Java semantics.  But you obviously know your community, and I don't :)

Because Java semantics are "almost" SC, and as Paul mentioned the
difference doesn't matter in practice (IRIW/RWC is where it matters, WRC
works even on Power; see
http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc051.html#toc5, row
WRC+lwsyncs).  It hasn't ever mattered for Linux, at least.

>> By contrast, Java volatile semantics are easily converted to a sequence
>> of relaxed loads, relaxed stores, and acq/rel/sc fences.
> 
> The same holds for C11/C++11.  If you look at either the standard or the
> Batty model, you'll see that for every pair like store(rel)--load(acq),
> there is also store(rel)--fence(acq)+load(relaxed),
> store(relaxed)+fence(rel)--fence(acq)+load(relaxed), etc. defined,
> giving the same semantics.  Likewise for SC.

Do you have a pointer to that?  It would help.

> You can also build Dekker with SC stores and acq loads, if I'm not
> mistaken.  Typically one would probably use SC fences and relaxed
> stores/loads.

Yes.

>>> I guess so.  But you also have to consider the legacy that you create.
>>> I do think the C11/C++11 model will used widely, and more and more
>>> people will used to it.
>>
>> I don't think many people will learn how to use the various non-seqcst
>> modes...  At least so far I punted. :)
> 
> But you already use similarly weaker orderings that the other
> abstractions provide (e.g., Java), so you're half-way there :)

True.  On the other hand you can treat Java like "kinda SC but don't
worry, you won't see the difference".  It is both worrisome and appealing...

Paolo