Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Sergey Fedorov <serge.fdrv@gmail.com>
To: Pranith Kumar <bobby.prani@gmail.com>
Cc: "Richard Henderson" <rth@twiddle.net>,
	"open list:All patches CC here" <qemu-devel@nongnu.org>,
	"Alex Bennée" <alex.bennee@linaro.org>
Subject: Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
Date: Wed, 22 Jun 2016 18:50:05 +0300	[thread overview]
Message-ID: <576AB3AD.7050302@gmail.com> (raw)
In-Reply-To: <CAJhHMCB-jxvJK4EZvgcR6XWPRSDVUu0SknDV9isFkrdCW_Ydyg@mail.gmail.com>

On 21/06/16 17:52, Pranith Kumar wrote:
> Hi Sergey,
>
> On Mon, Jun 20, 2016 at 5:21 PM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>> On 18/06/16 07:03, Pranith Kumar wrote:
>>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>>> index db6a062..36feca9 100644
>>> --- a/tcg/tcg.h
>>> +++ b/tcg/tcg.h
>>> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>>>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>>>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>>>
>>> +typedef enum {
>>> +    TCG_MO_LD_LD    = 1,
>>> +    TCG_MO_ST_LD    = 2,
>>> +    TCG_MO_LD_ST    = 4,
>>> +    TCG_MO_ST_ST    = 8,
>>> +    TCG_MO_ALL      = 0xF, // OR of all above
>> So TCG_MO_ALL specifies a so called "full" memory barrier?
> This enum just specifies what loads and stores need to be ordered.
>
> TCG_MO_ALL specifies that we need to order both previous loads and
> stores with later loads and stores. To get a full memory barrier you
> will need to pair it with BAR_SC:
>
> TCG_MO_ALL | TCG_BAR_SC

If we define the semantics for the flags above as it is defined for
corresponding Sparc MEMBAR instruction mmask bits (which I think really
makes sense), then a combination of all of these flags makes a full
memory barrier which guarantees transitivity, i.e. sequential
consistency. Let me just quote [Sparc v9 manual] regarding MEMBAR
instruction mmask encoding:

    #StoreStore The effects of all stores appearing prior to the MEMBAR
    instruction must be visible *to all processors* before the effect of
    any stores following the MEMBAR.
    #LoadStore  All loads appearing prior to the MEMBAR instruction must
    have been performed before the effect of any stores following the
    MEMBAR is visible *to any other processor*.
    #StoreLoad  The effects of all stores appearing prior to the MEMBAR
    instruction must be visible *to all processors* before loads
    following the MEMBAR may be performed.
    #LoadLoad   All loads appearing prior to the MEMBAR instruction must
    have been performed before any loads following the MEMBAR may be
    performed.

I'm emphasising "to all processors" and "to any other processor" here
because these expressions suggest transitivity, if I understand it
correctly.

[Sparc v9 manual] http://sparc.org/wp-content/uploads/2014/01/SPARCV9.pdf.gz

>
>>> +} TCGOrder;
>>> +
>>> +typedef enum {
>>> +    TCG_BAR_ACQ     = 32,
>>> +    TCG_BAR_REL     = 64,
>> I'm convinced that the only practical way to represent a standalone
>> acquire memory barrier is to order all previous loads with all
>> subsequent loads and stores. Similarly, a standalone release memory
>> barrier would order all previous loads and stores with all subsequent
>> stores. [1]
> Yes, here acquire would be:
>
> (TCG_MO_LD_ST | TCG_MO_LD_LD) | TCG_BAR_ACQ
>
> and release would be:
>
> (TCG_MO_ST_ST | TCG_MO_LD_ST) | TCG_BAR_REL

Could you please explain the difference between:

(TCG_MO_LD_ST | TCG_MO_LD_LD) | TCG_BAR_ACQ

and

(TCG_MO_LD_ST | TCG_MO_LD_LD)

and

TCG_BAR_ACQ

or between:

(TCG_MO_ST_ST | TCG_MO_LD_ST) | TCG_BAR_REL

and

(TCG_MO_ST_ST | TCG_MO_LD_ST)

and

TCG_BAR_REL

?

(Please first consider the comments below and above.)

>
>> On the other hand, acquire or release semantic associated with a memory
>> operation itself can be directly mapped into e.g. AArch64's Load-Acquire
>> (LDAR) and Store-Release (STLR) instructions. A standalone barrier
>> adjacent to a memory operation shouldn't be mapped this way because it
>> should provide more strict guarantees than e.g. AArch64 instructions
>> mentioned above.
> You are right. That is why the load-acquire operation generates the
> stronger barrier:
>
> TCG_MO_ALL | TCG_BAR_ACQ and not the acquire barrier above. Similarly
> for store-release.

I meant that e.g. Sparc TSO load could be efficiently mapped to ARMv8 or
Itanium load-acquire and Sparc TSO store - to ARMv8 or Itanium
store-release. But Sparc TSO load + membar #LoadLoad | #LoadStore cannot
be mapped this way because ARMv8 or Itanium load-acquire semantics is
weaker than the corresponding standalone barrier semantics.

>
>> Therefore, I advocate for clear distinction between standalone memory
>> barriers and implicit memory ordering semantics associated with memory
>> operations themselves.
> Any suggestions on how to make the distinction clearer? I will add a
> detailed comment like the above but please let me know if you have
> anything in mind.

My suggestion is to separate standalone memory barriers and implicit
ordering requirements of loads/stores.

Then a standalone guest memory barrier instruction will translate into a
standalone TCG memory barrier operation which will translate into a
standalone host memory barrier instruction (possibly no-op). We can take
the semantic of Sparc v9 MEMBAR instruction mmask bits as the semantic
of our standalone TCG memory barrier operation as described above. There
seems to be no special "release" or "acquire" memory barrier as a
standalone instruction in our guest/host architectures. I'm convinced
that the best way to represent [C11] acquire fence is a combined memory
barrier for load-load and load-store; the best way to represent [C11]
release fence is a combined barrier for store-store and load-store; and
the best way to represent [C11] sequential consistency fence is a
combined barrier for all four flags.

Orthogonally, we can attribute each guest memory load/store TCG
operation with "acquire", "release" and "sequentially consistent" flags
with the semantics as defined for [C11] 'memory_order' enum constants. I
think we can skip "consume" semantics since it does only make sense for
the ancient Alpha architecture which we don't support as QEMU host;
let's just require that each load is always a consume-load. Then e.g.
x86 or Sparc TSO regular load instruction will translate into TCG guest
memory load operation with acquire flag set which will translate into
e.g. Itanium or ARMv8 load-acquire instruction; and e.g. x86 or Sparc
TSO regular store instruction will translate into TCG guest memory store
operation with release flag set which will translate into e.g. Itanium
or ARMv8 store-release instruction. That's supposed to be the most
efficient way to support a strongly-ordered guest on a weakly-ordered
host. Even if we disable MTTCG for such guest-host combinations, we may
still like to support user-mode emulation for them which can happen to
be multi-threaded.

The key difference between: (1) a regular load followed by a combined
memory barrier for load-load and load-store, and load-acquire; (2) a
combined barrier for store-store and load-store followed by a regular
store, and store-release - is that load-acquire/store-release is always
associated with a particular address loaded/stored whereas a standalone
barrier is supposed to order *all* corresponding loads/stores across the
barrier. Thus a standalone barrier are stronger then a
load-acquire/store-release and cannot be an efficient intermediate
representation for this semantics.

See also this email:
http://thread.gmane.org/gmane.comp.emulators.qemu/420223/focus=421309

I would suggest to deal only with explicit standalone memory barrier
instruction translation in this series. After that, we could start with
the second part of implicit memory ordering requirements which is more
complex topic anyway.

[C11] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

>
>> [1] http://preshing.com/20130922/acquire-and-release-fences/
>>
>>> +    TCG_BAR_SC      = 128,
>> How's that different from TCG_MO_ALL?
> TCG_BAR_* tells us what ordering is enforced. TCG_MO_* tells what on
> what operations the ordering is to be enforced.

It sound like unnecessary duplication of the same information, see the
explanation above.

Kind regards,
Sergey

next prev parent reply	other threads:[~2016-06-22 15:50 UTC|newest]

Thread overview: 66+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier Pranith Kumar
2016-06-20 21:21   ` Sergey Fedorov
2016-06-21 14:52     ` Pranith Kumar
2016-06-21 15:09       ` Alex Bennée
2016-06-21 18:06         ` Pranith Kumar
2016-06-22 15:50       ` Sergey Fedorov [this message]
2016-06-21  7:30   ` Paolo Bonzini
2016-06-21 18:04   ` Alex Bennée
2016-06-21 18:09     ` Pranith Kumar
2016-06-21 18:23       ` Alex Bennée
2016-06-21 19:40         ` Richard Henderson
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence Pranith Kumar
2016-06-21  7:24   ` Paolo Bonzini
2016-06-22 16:25   ` Alex Bennée
2016-06-22 16:49     ` Richard Henderson
2016-06-22 18:18       ` Alex Bennée
2016-06-18  4:03 ` [RFC v3 PATCH 03/14] tcg/aarch64: " Pranith Kumar
2016-06-18  4:03   ` [Qemu-devel] " Pranith Kumar
2016-06-23 16:18   ` Alex Bennée
2016-06-23 16:18     ` [Qemu-devel] " Alex Bennée
2016-06-23 16:50     ` Richard Henderson
2016-06-23 16:50       ` [Qemu-devel] " Richard Henderson
2016-06-23 19:58       ` Alex Bennée
2016-06-23 19:58         ` [Qemu-devel] " Alex Bennée
2016-06-18  4:03 ` [RFC v3 PATCH 04/14] tcg/arm: " Pranith Kumar
2016-06-18  4:03   ` [Qemu-devel] " Pranith Kumar
2016-06-23 16:30   ` Alex Bennée
2016-06-23 16:30     ` [Qemu-devel] " Alex Bennée
2016-06-23 16:49     ` Richard Henderson
2016-06-23 16:49       ` [Qemu-devel] " Richard Henderson
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 05/14] tcg/ia64: " Pranith Kumar
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 06/14] tcg/mips: " Pranith Kumar
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: " Pranith Kumar
2016-06-22 19:50   ` Sergey Fedorov
2016-06-22 20:21     ` Richard Henderson
2016-06-22 20:27       ` Sergey Fedorov
2016-06-23 14:42     ` Sergey Fedorov
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 08/14] tcg/s390: " Pranith Kumar
2016-06-21  7:26   ` Paolo Bonzini
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 09/14] tcg/sparc: " Pranith Kumar
2016-06-22 19:56   ` Sergey Fedorov
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: " Pranith Kumar
2016-06-22 19:57   ` Sergey Fedorov
2016-06-22 20:25     ` Richard Henderson
2016-06-22 20:28       ` Sergey Fedorov
2016-06-18  4:03 ` [RFC v3 PATCH 11/14] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
2016-06-18  4:03   ` [Qemu-devel] " Pranith Kumar
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 12/14] target-alpha: Generate fence op Pranith Kumar
2016-06-18  4:03 ` [RFC v3 PATCH 13/14] aarch64: Generate fences for aarch64 Pranith Kumar
2016-06-18  4:03   ` [Qemu-devel] " Pranith Kumar
2016-06-24 16:17   ` Alex Bennée
2016-06-24 16:17     ` [Qemu-devel] " Alex Bennée
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86 Pranith Kumar
2016-06-18  5:48   ` Richard Henderson
2016-06-20 15:05     ` Pranith Kumar
2016-06-21  7:28   ` Paolo Bonzini
2016-06-21 15:57     ` Richard Henderson
2016-06-21 16:12       ` Paolo Bonzini
2016-06-21 16:23         ` Richard Henderson
2016-06-21 16:33           ` Paolo Bonzini
2016-06-21 17:28     ` Pranith Kumar
2016-06-21 17:54       ` Peter Maydell
2016-06-21 18:03         ` Pranith Kumar
2016-06-21 18:25           ` Alex Bennée
2016-06-22 11:18           ` Sergey Fedorov
2016-06-18  4:08 ` [Qemu-devel] [RFC v3 PATCH 00/14] tcg: Add fence gen support Pranith Kumar

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=576AB3AD.7050302@gmail.com \
    --to=serge.fdrv@gmail.com \
    --cc=alex.bennee@linaro.org \
    --cc=bobby.prani@gmail.com \
    --cc=qemu-devel@nongnu.org \
    --cc=rth@twiddle.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.