From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45205) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UpZgu-0000FK-B4 for qemu-devel@nongnu.org; Thu, 20 Jun 2013 03:54:01 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1UpZgr-0005zX-QY for qemu-devel@nongnu.org; Thu, 20 Jun 2013 03:54:00 -0400 Received: from mail-ee0-x232.google.com ([2a00:1450:4013:c00::232]:40514) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1UpZgr-0005zN-Fs for qemu-devel@nongnu.org; Thu, 20 Jun 2013 03:53:57 -0400 Received: by mail-ee0-f50.google.com with SMTP id d49so3758996eek.9 for ; Thu, 20 Jun 2013 00:53:56 -0700 (PDT) Sender: Paolo Bonzini Message-ID: <51C2B50D.90807@redhat.com> Date: Thu, 20 Jun 2013 09:53:49 +0200 From: Paolo Bonzini MIME-Version: 1.0 References: <1371381681-14252-1-git-send-email-pingfanl@linux.vnet.ibm.com> <1371381681-14252-2-git-send-email-pingfanl@linux.vnet.ibm.com> <51BF5C0F.6020209@twiddle.net> <51C05F88.2090308@redhat.com> <20130618145033.GN5146@linux.vnet.ibm.com> <51C085EF.1040303@redhat.com> <1371573518.16968.23603.camel@triegel.csb> <51C17A5D.909@redhat.com> <1371647713.16968.25060.camel@triegel.csb> <51C1CAE3.6050908@redhat.com> <1371673503.16968.25960.camel@triegel.csb> In-Reply-To: <1371673503.16968.25960.camel@triegel.csb> Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] Java volatile vs. C11 seq_cst (was Re: [PATCH v2 1/2] add a header file for atomic operations) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Torvald Riegel Cc: Andrew Haley , qemu-devel@nongnu.org, Liu Ping Fan , Anthony Liguori , paulmck@linux.vnet.ibm.com, Richard Henderson Il 19/06/2013 22:25, Torvald Riegel ha scritto: > On Wed, 2013-06-19 at 17:14 +0200, Paolo Bonzini wrote: >> (1) I don't care about relaxed RMW ops (loads/stores occur in hot paths, >> but RMW shouldn't be that bad. I don't care if reference counting is a >> little slower than it could be, for example); > > I doubt relaxed RMW ops are sufficient even for reference counting. They are enough on the increment side, or so says boost... http://www.chaoticmind.net/~hcb/projects/boost.atomic/doc/atomic/usage_examples.html#boost_atomic.usage_examples.example_reference_counters >> [An aside: Java guarantees that volatile stores are not reordered >> with volatile loads. This is not guaranteed by just using release >> stores and acquire stores, and is why IIUC acq_rel < Java < seq_cst]. > > Or maybe Java volatile is acq for loads and seq_cst for stores... Perhaps (but I'm not 100% sure). >> As long as you only have a producer and a consumer, C11 is fine, because >> all you need is load-acquire/store-release. In fact, if it weren't for >> the experience factor, C11 is easier than manually placing acquire and >> release barriers. But as soon as two or more threads are reading _and_ >> writing the shared memory, it gets complicated and I want to provide >> something simple that people can use. This is the reason for (2) above. > > I can't quite follow you here. There is a total order for all > modifications to a single variable, and if you use acq/rel combined with > loads and stores on this variable, then you basically can make use of > the total order. (All loads that read-from a certain store get a > synchronized-with (and thus happens-before edge) with the store, and the > stores are in a total order.) This is independent of the number of > readers and writers. The difference starts once you want to sync with > more than one variable, and need to establish an order between those > accesses. You're right of course. More specifically when there is a thread where some variables are stored while others are loaded. >> There will still be a few cases that need to be optimized, and here are >> where the difficult requirements come: >> >> (R1) the primitives *should* not be alien to people who know Linux. >> >> (R2) those optimizations *must* be easy to do and review; at least as >> easy as these things go. >> >> The two are obviously related. Ease of review is why it is important to >> make things familiar to people who know Linux. >> >> In C11, relaxing SC loads and stores is complicated, and more >> specifically hard to explain! > > I can't see why that would be harder than reasoning about equally weaker > Java semantics. But you obviously know your community, and I don't :) Because Java semantics are "almost" SC, and as Paul mentioned the difference doesn't matter in practice (IRIW/RWC is where it matters, WRC works even on Power; see http://www.cl.cam.ac.uk/~pes20/ppc-supplemental/ppc051.html#toc5, row WRC+lwsyncs). It hasn't ever mattered for Linux, at least. >> By contrast, Java volatile semantics are easily converted to a sequence >> of relaxed loads, relaxed stores, and acq/rel/sc fences. > > The same holds for C11/C++11. If you look at either the standard or the > Batty model, you'll see that for every pair like store(rel)--load(acq), > there is also store(rel)--fence(acq)+load(relaxed), > store(relaxed)+fence(rel)--fence(acq)+load(relaxed), etc. defined, > giving the same semantics. Likewise for SC. Do you have a pointer to that? It would help. > You can also build Dekker with SC stores and acq loads, if I'm not > mistaken. Typically one would probably use SC fences and relaxed > stores/loads. Yes. >>> I guess so. But you also have to consider the legacy that you create. >>> I do think the C11/C++11 model will used widely, and more and more >>> people will used to it. >> >> I don't think many people will learn how to use the various non-seqcst >> modes... At least so far I punted. :) > > But you already use similarly weaker orderings that the other > abstractions provide (e.g., Java), so you're half-way there :) True. On the other hand you can treat Java like "kinda SC but don't worry, you won't see the difference". It is both worrisome and appealing... Paolo