From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:41361) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b9xF4-0005eN-FY for qemu-devel@nongnu.org; Mon, 06 Jun 2016 12:19:10 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1b9xEz-0006xM-4t for qemu-devel@nongnu.org; Mon, 06 Jun 2016 12:19:05 -0400 Received: from mail-wm0-x229.google.com ([2a00:1450:400c:c09::229]:36377) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1b9xEy-0006xA-Uw for qemu-devel@nongnu.org; Mon, 06 Jun 2016 12:19:01 -0400 Received: by mail-wm0-x229.google.com with SMTP id n184so100199290wmn.1 for ; Mon, 06 Jun 2016 09:19:00 -0700 (PDT) From: Alex =?utf-8?Q?Benn=C3=A9e?= In-reply-to: <5751DF2D.5040709@gmail.com> Date: Mon, 06 Jun 2016 17:19:13 +0100 Message-ID: <874m96i932.fsf@linaro.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit Subject: Re: [Qemu-devel] [RFC v2 PATCH 01/13] Introduce TCGOpcode for memory barrier List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Sergey Fedorov Cc: Pranith Kumar , Richard Henderson , "open list:All patches CC here" Sergey Fedorov writes: > On 03/06/16 21:30, Pranith Kumar wrote: >> On Thu, Jun 2, 2016 at 9:08 PM, Richard Henderson wrote: >>> On 06/02/2016 02:37 PM, Sergey Fedorov wrote: >>>> >>>> It would give us three TCG operations for each memory operation instead >>>> of one. But then we might like to combine these barrier operations back >>>> with memory operations in each backend. If we propagate memory ordering >>>> semantics up to the backend, it can decide itself what instructions are >>>> best to generate. >>> >>> A strongly ordered target would generally only set BEFORE bits or AFTER >>> bits, but not both (and I suggest we canonicalize on AFTER for all such >>> targets). Thus a strongly ordered target would produce only 2 opcodes per >>> memory op. >>> >>> I supplied both to make it easier to handle a weakly ordered target with >>> acquire/release bits. >>> >>> I would *not* combine the barrier operations back with memory operations in >>> the backend. Only armv8 and ia64 can do that, and given the optimization >>> level at which we generate code, I doubt it would really make much >>> difference above separate barriers. >>> >> On armv8, using load_acquire/store_release instructions makes a >> significant difference in performance when compared to plain >> dmb+memory instruction sequence. So I would really like to keep the >> option of generating acq/rel instructions(by combining barrier and >> memory or some other way) open. > > I'm not so sure about acq/rel flags. Is there any architecture which has > explicit acq/rel barriers? I suppose acq/rel memory access instructions > are always load-link and store-conditional and thus rely on exclusive > memory monitor to support that "conditional" behaviour. Nope, you can have acq/rel memory operations without exclusive operations (see ARMv8 ldar and stlr). The exclusive operations also have ordered and non-ordered variants (ldxr, strx). > To emulate this > behaviour we need something more special like "Slow-path for atomic > instruction translation" [1]. > > [1] http://thread.gmane.org/gmane.comp.emulators.qemu/407419 > > Kind regards, > Sergey -- Alex Bennée