From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:33547) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFgBR-0001dL-5H for qemu-devel@nongnu.org; Wed, 22 Jun 2016 07:19:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bFgBM-0003Wl-1T for qemu-devel@nongnu.org; Wed, 22 Jun 2016 07:19:00 -0400 Received: from mail-lf0-x242.google.com ([2a00:1450:4010:c07::242]:35433) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bFgBL-0003Vp-Ky for qemu-devel@nongnu.org; Wed, 22 Jun 2016 07:18:55 -0400 Received: by mail-lf0-x242.google.com with SMTP id w130so13452113lfd.2 for ; Wed, 22 Jun 2016 04:18:55 -0700 (PDT) References: <20160618040343.19517-1-bobby.prani@gmail.com> <20160618040343.19517-15-bobby.prani@gmail.com> <06f042ba-91b2-d751-e53e-6e5ca56081d3@redhat.com> From: Sergey Fedorov Message-ID: <576A741C.6070205@gmail.com> Date: Wed, 22 Jun 2016 14:18:52 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Pranith Kumar , Peter Maydell Cc: Paolo Bonzini , "open list:All patches CC here" , =?UTF-8?Q?Alex_Benn=c3=a9e?= , Eduardo Habkost , Richard Henderson On 21/06/16 21:03, Pranith Kumar wrote: > On Tue, Jun 21, 2016 at 1:54 PM, Peter Maydell wrote: >> On 21 June 2016 at 18:28, Pranith Kumar wrote: >>> Reg. the second point, I did consider this situation of running x86 on >>> ARM where such barriers are necessary for correctness. But, I am >>> really apprehensive of the cost it will impose. I am not sure if there >>> are any alternative solutions to avoid generating barriers for each >>> memory operation, but it would be great if we could reduce them. >> I vaguely recall an idea that you could avoid needing >> explicit barriers by turning all the guest load/stores into >> host load-acquire/store-release, but I have no idea whether >> that's (a) actually true (b) any better than piles of >> explicit barriers. > Yes, this is true for ARMv8(not sure about ia64). The > load-acquire/store-release operations are sequentially consistent to > each other. But this does not work for ARMv7 and as you said... I > think the cost here too is really prohibitive. As I understand, there's no requirement for sequential consistency even on a systems with pretty strong memory model such as x86. Due to the presence of store queue, earlier regular stores are allowed to be completed after the following regular loads. This relaxation breaks sequential consistency requirement, if I understand correctly, since it allows a CPU to see its own stores with respect to other CPU stores in different order. However, such a model can perfectly match acquire/release semantics, even as it is defined by Itanium memory model. Lets break it down: (1) if a load-acquire must not be reordered with any subsequent loads and stores, (2) and if a store-release must not be reordered with any preceding loads and stores, (3) thus if all loads are load-acquires and all stores are store-releases, then the only possible reordering can be a store-release reordered after the subsequent load-acquire. Considering this, I think that strongly-ordered memory model semantics such (as in x86 memory model) can be translated directly into relaxed acquire/release memory model semantics (as in Itanium memory model or a bit more strong ARMv8). And I believe this can perform better than inserting separate memory barriers on those architectures which provide acquire/release semantics since it is more relaxed and permit certain hardware optimizations like store-after-load reordering. Kind regards, Sergey