[Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-20 21:21   ` Sergey Fedorov
                     ` (2 more replies)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence Pranith Kumar
                   ` (13 subsequent siblings)
  14 siblings, 3 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Richard Henderson, open list:All patches CC here; +Cc: alex.bennee, serge.fdrv

This commit introduces the TCGOpcode for memory barrier instruction.

This opcode takes an argument which is the type of memory barrier
which should be generated.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/README    | 17 +++++++++++++++++
 tcg/tcg-op.c  | 11 +++++++++++
 tcg/tcg-op.h  |  2 ++
 tcg/tcg-opc.h |  2 ++
 tcg/tcg.h     | 14 ++++++++++++++
 5 files changed, 46 insertions(+)

diff --git a/tcg/README b/tcg/README
index ce8beba..1d48aa9 100644
--- a/tcg/README
+++ b/tcg/README
@@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
 
 Similar to mulu2, except the two inputs T1 and T2 are signed.
 
+********* Memory Barrier support
+
+* mb <$arg>
+
+Generate a target memory barrier instruction to ensure memory ordering as being
+enforced by a corresponding guest memory barrier instruction. The ordering
+enforced by the backend may be stricter than the ordering required by the guest.
+It cannot be weaker. This opcode takes a constant argument which is required to
+generate the appropriate barrier instruction. The backend should take care to
+emit the target barrier instruction only when necessary i.e., for SMP guests and
+when MTTCG is enabled.
+
+The guest translators should generate this opcode for all guest instructions
+which have ordering side effects.
+
+Please see docs/atomics.txt for more information on memory barriers.
+
 ********* 64-bit guest on 32-bit host support
 
 The following opcodes are internal to TCG.  Thus they are to be implemented by
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 54c0277..08f7858 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -39,6 +39,8 @@ extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64);
 #define TCGV_HIGH TCGV_HIGH_link_error
 #endif
 
+extern int smp_cpus;
+
 /* Note that this is optimized for sequential allocation during translate.
    Up to and including filling in the forward link immediately.  We'll do
    proper termination of the end of the list after we finish translation.  */
@@ -146,6 +148,15 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
     tcg_emit_op(ctx, opc, pi);
 }
 
+void tcg_gen_mb(TCGArg mb_type)
+{
+#ifndef CONFIG_USER_ONLY
+    if (qemu_tcg_mttcg_enabled() && smp_cpus > 1) {
+        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
+    }
+#endif
+}
+
 /* 32 bit ops */
 
 void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
index f217e80..41890cc 100644
--- a/tcg/tcg-op.h
+++ b/tcg/tcg-op.h
@@ -261,6 +261,8 @@ static inline void tcg_gen_br(TCGLabel *l)
     tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
 }
 
+void tcg_gen_mb(TCGArg a);
+
 /* Helper calls. */
 
 /* 32 bit ops */
diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
index 6d0410c..45528d2 100644
--- a/tcg/tcg-opc.h
+++ b/tcg/tcg-opc.h
@@ -42,6 +42,8 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
 # define IMPL64  TCG_OPF_64BIT
 #endif
 
+DEF(mb, 0, 0, 1, 0)
+
 DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
 DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
 DEF(setcond_i32, 1, 2, 1, 0)
diff --git a/tcg/tcg.h b/tcg/tcg.h
index db6a062..36feca9 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
 #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
 #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
 
+typedef enum {
+    TCG_MO_LD_LD    = 1,
+    TCG_MO_ST_LD    = 2,
+    TCG_MO_LD_ST    = 4,
+    TCG_MO_ST_ST    = 8,
+    TCG_MO_ALL      = 0xF, // OR of all above
+} TCGOrder;
+
+typedef enum {
+    TCG_BAR_ACQ     = 32,
+    TCG_BAR_REL     = 64,
+    TCG_BAR_SC      = 128,
+} TCGBar;
+
 /* Conditions.  Note that these are laid out for easy manipulation by
    the functions below:
      bit 0 is used for inverting;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier Pranith Kumar
@ 2016-06-20 21:21   ` Sergey Fedorov
  2016-06-21 14:52     ` Pranith Kumar
  2016-06-21  7:30   ` Paolo Bonzini
  2016-06-21 18:04   ` Alex Bennée
  2 siblings, 1 reply; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-20 21:21 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee

On 18/06/16 07:03, Pranith Kumar wrote:
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index db6a062..36feca9 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>  
> +typedef enum {
> +    TCG_MO_LD_LD    = 1,
> +    TCG_MO_ST_LD    = 2,
> +    TCG_MO_LD_ST    = 4,
> +    TCG_MO_ST_ST    = 8,
> +    TCG_MO_ALL      = 0xF, // OR of all above

So TCG_MO_ALL specifies a so called "full" memory barrier?

> +} TCGOrder;
> +
> +typedef enum {
> +    TCG_BAR_ACQ     = 32,
> +    TCG_BAR_REL     = 64,

I'm convinced that the only practical way to represent a standalone
acquire memory barrier is to order all previous loads with all
subsequent loads and stores. Similarly, a standalone release memory
barrier would order all previous loads and stores with all subsequent
stores. [1]

On the other hand, acquire or release semantic associated with a memory
operation itself can be directly mapped into e.g. AArch64's Load-Acquire
(LDAR) and Store-Release (STLR) instructions. A standalone barrier
adjacent to a memory operation shouldn't be mapped this way because it
should provide more strict guarantees than e.g. AArch64 instructions
mentioned above.

Therefore, I advocate for clear distinction between standalone memory
barriers and implicit memory ordering semantics associated with memory
operations themselves.

[1] http://preshing.com/20130922/acquire-and-release-fences/

> +    TCG_BAR_SC      = 128,

How's that different from TCG_MO_ALL?

> +} TCGBar;
> +
>  /* Conditions.  Note that these are laid out for easy manipulation by
>     the functions below:
>       bit 0 is used for inverting;

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-20 21:21   ` Sergey Fedorov
@ 2016-06-21 14:52     ` Pranith Kumar
  2016-06-21 15:09       ` Alex Bennée
  2016-06-22 15:50       ` Sergey Fedorov
  0 siblings, 2 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-21 14:52 UTC (permalink / raw)
  To: Sergey Fedorov
  Cc: Richard Henderson, open list:All patches CC here,
	Alex Bennée

Hi Sergey,

On Mon, Jun 20, 2016 at 5:21 PM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
> On 18/06/16 07:03, Pranith Kumar wrote:
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index db6a062..36feca9 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>>
>> +typedef enum {
>> +    TCG_MO_LD_LD    = 1,
>> +    TCG_MO_ST_LD    = 2,
>> +    TCG_MO_LD_ST    = 4,
>> +    TCG_MO_ST_ST    = 8,
>> +    TCG_MO_ALL      = 0xF, // OR of all above
>
> So TCG_MO_ALL specifies a so called "full" memory barrier?

This enum just specifies what loads and stores need to be ordered.

TCG_MO_ALL specifies that we need to order both previous loads and
stores with later loads and stores. To get a full memory barrier you
will need to pair it with BAR_SC:

TCG_MO_ALL | TCG_BAR_SC

>
>> +} TCGOrder;
>> +
>> +typedef enum {
>> +    TCG_BAR_ACQ     = 32,
>> +    TCG_BAR_REL     = 64,
>
> I'm convinced that the only practical way to represent a standalone
> acquire memory barrier is to order all previous loads with all
> subsequent loads and stores. Similarly, a standalone release memory
> barrier would order all previous loads and stores with all subsequent
> stores. [1]

Yes, here acquire would be:

(TCG_MO_LD_ST | TCG_MO_LD_LD) | TCG_BAR_ACQ

and release would be:

(TCG_MO_ST_ST | TCG_MO_LD_ST) | TCG_BAR_REL

>
> On the other hand, acquire or release semantic associated with a memory
> operation itself can be directly mapped into e.g. AArch64's Load-Acquire
> (LDAR) and Store-Release (STLR) instructions. A standalone barrier
> adjacent to a memory operation shouldn't be mapped this way because it
> should provide more strict guarantees than e.g. AArch64 instructions
> mentioned above.

You are right. That is why the load-acquire operation generates the
stronger barrier:

TCG_MO_ALL | TCG_BAR_ACQ and not the acquire barrier above. Similarly
for store-release.

>
> Therefore, I advocate for clear distinction between standalone memory
> barriers and implicit memory ordering semantics associated with memory
> operations themselves.

Any suggestions on how to make the distinction clearer? I will add a
detailed comment like the above but please let me know if you have
anything in mind.

>
> [1] http://preshing.com/20130922/acquire-and-release-fences/
>
>> +    TCG_BAR_SC      = 128,
>
> How's that different from TCG_MO_ALL?

TCG_BAR_* tells us what ordering is enforced. TCG_MO_* tells what on
what operations the ordering is to be enforced.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-21 14:52     ` Pranith Kumar
@ 2016-06-21 15:09       ` Alex Bennée
  2016-06-21 18:06         ` Pranith Kumar
  2016-06-22 15:50       ` Sergey Fedorov
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Bennée @ 2016-06-21 15:09 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Sergey Fedorov, Richard Henderson, open list:All patches CC here


Pranith Kumar <bobby.prani@gmail.com> writes:

> Hi Sergey,
>
> On Mon, Jun 20, 2016 at 5:21 PM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>> On 18/06/16 07:03, Pranith Kumar wrote:
>>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>>> index db6a062..36feca9 100644
>>> --- a/tcg/tcg.h
>>> +++ b/tcg/tcg.h
>>> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>>>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>>>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>>>
>>> +typedef enum {
>>> +    TCG_MO_LD_LD    = 1,
>>> +    TCG_MO_ST_LD    = 2,
>>> +    TCG_MO_LD_ST    = 4,
>>> +    TCG_MO_ST_ST    = 8,
>>> +    TCG_MO_ALL      = 0xF, // OR of all above
>>
>> So TCG_MO_ALL specifies a so called "full" memory barrier?
>
> This enum just specifies what loads and stores need to be ordered.
>
> TCG_MO_ALL specifies that we need to order both previous loads and
> stores with later loads and stores. To get a full memory barrier you
> will need to pair it with BAR_SC:
>
> TCG_MO_ALL | TCG_BAR_SC
>
>>
>>> +} TCGOrder;
>>> +
>>> +typedef enum {
>>> +    TCG_BAR_ACQ     = 32,
>>> +    TCG_BAR_REL     = 64,
>>
>> I'm convinced that the only practical way to represent a standalone
>> acquire memory barrier is to order all previous loads with all
>> subsequent loads and stores. Similarly, a standalone release memory
>> barrier would order all previous loads and stores with all subsequent
>> stores. [1]
>
> Yes, here acquire would be:
>
> (TCG_MO_LD_ST | TCG_MO_LD_LD) | TCG_BAR_ACQ
>
> and release would be:
>
> (TCG_MO_ST_ST | TCG_MO_LD_ST) | TCG_BAR_REL
>
>>
>> On the other hand, acquire or release semantic associated with a memory
>> operation itself can be directly mapped into e.g. AArch64's Load-Acquire
>> (LDAR) and Store-Release (STLR) instructions. A standalone barrier
>> adjacent to a memory operation shouldn't be mapped this way because it
>> should provide more strict guarantees than e.g. AArch64 instructions
>> mentioned above.
>
> You are right. That is why the load-acquire operation generates the
> stronger barrier:
>
> TCG_MO_ALL | TCG_BAR_ACQ and not the acquire barrier above. Similarly
> for store-release.
>
>>
>> Therefore, I advocate for clear distinction between standalone memory
>> barriers and implicit memory ordering semantics associated with memory
>> operations themselves.
>
> Any suggestions on how to make the distinction clearer? I will add a
> detailed comment like the above but please let me know if you have
> anything in mind.
>
>>
>> [1] http://preshing.com/20130922/acquire-and-release-fences/
>>
>>> +    TCG_BAR_SC      = 128,
>>
>> How's that different from TCG_MO_ALL?
>
> TCG_BAR_* tells us what ordering is enforced. TCG_MO_* tells what on
> what operations the ordering is to be enforced.

This would be worthwhile in the comments. I'm confused by the fact we
have two sets of enums that are going to be merged when building TCGOp
parameters.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-21 15:09       ` Alex Bennée
@ 2016-06-21 18:06         ` Pranith Kumar
  0 siblings, 0 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-21 18:06 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Sergey Fedorov, Richard Henderson, open list:All patches CC here

On Tue, Jun 21, 2016 at 11:09 AM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>>
>>>> +    TCG_BAR_SC      = 128,
>>>
>>> How's that different from TCG_MO_ALL?
>>
>> TCG_BAR_* tells us what ordering is enforced. TCG_MO_* tells what on
>> what operations the ordering is to be enforced.
>
> This would be worthwhile in the comments. I'm confused by the fact we
> have two sets of enums that are going to be merged when building TCGOp
> parameters.

OK, I will add these comments with the details.

-- 
Pranith

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-21 14:52     ` Pranith Kumar
  2016-06-21 15:09       ` Alex Bennée
@ 2016-06-22 15:50       ` Sergey Fedorov
  1 sibling, 0 replies; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-22 15:50 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Richard Henderson, open list:All patches CC here,
	Alex Bennée

On 21/06/16 17:52, Pranith Kumar wrote:
> Hi Sergey,
>
> On Mon, Jun 20, 2016 at 5:21 PM, Sergey Fedorov <serge.fdrv@gmail.com> wrote:
>> On 18/06/16 07:03, Pranith Kumar wrote:
>>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>>> index db6a062..36feca9 100644
>>> --- a/tcg/tcg.h
>>> +++ b/tcg/tcg.h
>>> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>>>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>>>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>>>
>>> +typedef enum {
>>> +    TCG_MO_LD_LD    = 1,
>>> +    TCG_MO_ST_LD    = 2,
>>> +    TCG_MO_LD_ST    = 4,
>>> +    TCG_MO_ST_ST    = 8,
>>> +    TCG_MO_ALL      = 0xF, // OR of all above
>> So TCG_MO_ALL specifies a so called "full" memory barrier?
> This enum just specifies what loads and stores need to be ordered.
>
> TCG_MO_ALL specifies that we need to order both previous loads and
> stores with later loads and stores. To get a full memory barrier you
> will need to pair it with BAR_SC:
>
> TCG_MO_ALL | TCG_BAR_SC

If we define the semantics for the flags above as it is defined for
corresponding Sparc MEMBAR instruction mmask bits (which I think really
makes sense), then a combination of all of these flags makes a full
memory barrier which guarantees transitivity, i.e. sequential
consistency. Let me just quote [Sparc v9 manual] regarding MEMBAR
instruction mmask encoding:

    #StoreStore The effects of all stores appearing prior to the MEMBAR
    instruction must be visible *to all processors* before the effect of
    any stores following the MEMBAR.
    #LoadStore  All loads appearing prior to the MEMBAR instruction must
    have been performed before the effect of any stores following the
    MEMBAR is visible *to any other processor*.
    #StoreLoad  The effects of all stores appearing prior to the MEMBAR
    instruction must be visible *to all processors* before loads
    following the MEMBAR may be performed.
    #LoadLoad   All loads appearing prior to the MEMBAR instruction must
    have been performed before any loads following the MEMBAR may be
    performed.

I'm emphasising "to all processors" and "to any other processor" here
because these expressions suggest transitivity, if I understand it
correctly.

[Sparc v9 manual] http://sparc.org/wp-content/uploads/2014/01/SPARCV9.pdf.gz

>
>>> +} TCGOrder;
>>> +
>>> +typedef enum {
>>> +    TCG_BAR_ACQ     = 32,
>>> +    TCG_BAR_REL     = 64,
>> I'm convinced that the only practical way to represent a standalone
>> acquire memory barrier is to order all previous loads with all
>> subsequent loads and stores. Similarly, a standalone release memory
>> barrier would order all previous loads and stores with all subsequent
>> stores. [1]
> Yes, here acquire would be:
>
> (TCG_MO_LD_ST | TCG_MO_LD_LD) | TCG_BAR_ACQ
>
> and release would be:
>
> (TCG_MO_ST_ST | TCG_MO_LD_ST) | TCG_BAR_REL

Could you please explain the difference between:

(TCG_MO_LD_ST | TCG_MO_LD_LD) | TCG_BAR_ACQ

and

(TCG_MO_LD_ST | TCG_MO_LD_LD)

and

TCG_BAR_ACQ

or between:

(TCG_MO_ST_ST | TCG_MO_LD_ST) | TCG_BAR_REL

and

(TCG_MO_ST_ST | TCG_MO_LD_ST)

and

TCG_BAR_REL

?

(Please first consider the comments below and above.)

>
>> On the other hand, acquire or release semantic associated with a memory
>> operation itself can be directly mapped into e.g. AArch64's Load-Acquire
>> (LDAR) and Store-Release (STLR) instructions. A standalone barrier
>> adjacent to a memory operation shouldn't be mapped this way because it
>> should provide more strict guarantees than e.g. AArch64 instructions
>> mentioned above.
> You are right. That is why the load-acquire operation generates the
> stronger barrier:
>
> TCG_MO_ALL | TCG_BAR_ACQ and not the acquire barrier above. Similarly
> for store-release.

I meant that e.g. Sparc TSO load could be efficiently mapped to ARMv8 or
Itanium load-acquire and Sparc TSO store - to ARMv8 or Itanium
store-release. But Sparc TSO load + membar #LoadLoad | #LoadStore cannot
be mapped this way because ARMv8 or Itanium load-acquire semantics is
weaker than the corresponding standalone barrier semantics.

>
>> Therefore, I advocate for clear distinction between standalone memory
>> barriers and implicit memory ordering semantics associated with memory
>> operations themselves.
> Any suggestions on how to make the distinction clearer? I will add a
> detailed comment like the above but please let me know if you have
> anything in mind.

My suggestion is to separate standalone memory barriers and implicit
ordering requirements of loads/stores.

Then a standalone guest memory barrier instruction will translate into a
standalone TCG memory barrier operation which will translate into a
standalone host memory barrier instruction (possibly no-op). We can take
the semantic of Sparc v9 MEMBAR instruction mmask bits as the semantic
of our standalone TCG memory barrier operation as described above. There
seems to be no special "release" or "acquire" memory barrier as a
standalone instruction in our guest/host architectures. I'm convinced
that the best way to represent [C11] acquire fence is a combined memory
barrier for load-load and load-store; the best way to represent [C11]
release fence is a combined barrier for store-store and load-store; and
the best way to represent [C11] sequential consistency fence is a
combined barrier for all four flags.

Orthogonally, we can attribute each guest memory load/store TCG
operation with "acquire", "release" and "sequentially consistent" flags
with the semantics as defined for [C11] 'memory_order' enum constants. I
think we can skip "consume" semantics since it does only make sense for
the ancient Alpha architecture which we don't support as QEMU host;
let's just require that each load is always a consume-load. Then e.g.
x86 or Sparc TSO regular load instruction will translate into TCG guest
memory load operation with acquire flag set which will translate into
e.g. Itanium or ARMv8 load-acquire instruction; and e.g. x86 or Sparc
TSO regular store instruction will translate into TCG guest memory store
operation with release flag set which will translate into e.g. Itanium
or ARMv8 store-release instruction. That's supposed to be the most
efficient way to support a strongly-ordered guest on a weakly-ordered
host. Even if we disable MTTCG for such guest-host combinations, we may
still like to support user-mode emulation for them which can happen to
be multi-threaded.

The key difference between: (1) a regular load followed by a combined
memory barrier for load-load and load-store, and load-acquire; (2) a
combined barrier for store-store and load-store followed by a regular
store, and store-release - is that load-acquire/store-release is always
associated with a particular address loaded/stored whereas a standalone
barrier is supposed to order *all* corresponding loads/stores across the
barrier. Thus a standalone barrier are stronger then a
load-acquire/store-release and cannot be an efficient intermediate
representation for this semantics.

See also this email:
http://thread.gmane.org/gmane.comp.emulators.qemu/420223/focus=421309

I would suggest to deal only with explicit standalone memory barrier
instruction translation in this series. After that, we could start with
the second part of implicit memory ordering requirements which is more
complex topic anyway.

[C11] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf

>
>> [1] http://preshing.com/20130922/acquire-and-release-fences/
>>
>>> +    TCG_BAR_SC      = 128,
>> How's that different from TCG_MO_ALL?
> TCG_BAR_* tells us what ordering is enforced. TCG_MO_* tells what on
> what operations the ordering is to be enforced.

It sound like unnecessary duplication of the same information, see the
explanation above.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier Pranith Kumar
  2016-06-20 21:21   ` Sergey Fedorov
@ 2016-06-21  7:30   ` Paolo Bonzini
  2016-06-21 18:04   ` Alex Bennée
  2 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2016-06-21  7:30 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson, open list:All patches CC here
  Cc: serge.fdrv, alex.bennee



On 18/06/2016 06:03, Pranith Kumar wrote:
> +typedef enum {
> +    TCG_MO_LD_LD    = 1,
> +    TCG_MO_ST_LD    = 2,
> +    TCG_MO_LD_ST    = 4,
> +    TCG_MO_ST_ST    = 8,

I like the idea of making this a bitmask.  However, most of the code you
wrote for the backends looks at these as an enum.  For example,


+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    switch (a0 & TCG_MO_ALL) {
+    case TCG_MO_LD_LD:
+        tcg_out32(s, DMB_ISH | DMB_LD);
+        break;
+    case TCG_MO_ST_ST:
+        tcg_out32(s, DMB_ISH | DMB_ST);
+        break;
+    default:
+        tcg_out32(s, DMB_ISH | DMB_LD | DMB_ST);
+        break;
+    }
+}

should rather be

   if (a0 & (ST_LD|LD_ST)) {
       output dmb ish
       return
   }
   if (a0 & LD_LD) {
       output dmb ishld
   }
   if (a0 & LD_ST) {
       output dmb ishst
   }

Paolo

> +    TCG_MO_ALL      = 0xF, // OR of all above
> +} TCGOrder;
> +
> +typedef enum {
> +    TCG_BAR_ACQ     = 32,
> +    TCG_BAR_REL     = 64,
> +    TCG_BAR_SC      = 128,
> +} TCGBar;

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier Pranith Kumar
  2016-06-20 21:21   ` Sergey Fedorov
  2016-06-21  7:30   ` Paolo Bonzini
@ 2016-06-21 18:04   ` Alex Bennée
  2016-06-21 18:09     ` Pranith Kumar
  2 siblings, 1 reply; 56+ messages in thread
From: Alex Bennée @ 2016-06-21 18:04 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Richard Henderson, open list:All patches CC here, serge.fdrv


Pranith Kumar <bobby.prani@gmail.com> writes:

> This commit introduces the TCGOpcode for memory barrier instruction.
>
> This opcode takes an argument which is the type of memory barrier
> which should be generated.
>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/README    | 17 +++++++++++++++++
>  tcg/tcg-op.c  | 11 +++++++++++
>  tcg/tcg-op.h  |  2 ++
>  tcg/tcg-opc.h |  2 ++
>  tcg/tcg.h     | 14 ++++++++++++++
>  5 files changed, 46 insertions(+)
>
> diff --git a/tcg/README b/tcg/README
> index ce8beba..1d48aa9 100644
> --- a/tcg/README
> +++ b/tcg/README
> @@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
>
>  Similar to mulu2, except the two inputs T1 and T2 are signed.
>
> +********* Memory Barrier support
> +
> +* mb <$arg>
> +
> +Generate a target memory barrier instruction to ensure memory ordering as being
> +enforced by a corresponding guest memory barrier instruction. The ordering
> +enforced by the backend may be stricter than the ordering required by the guest.
> +It cannot be weaker. This opcode takes a constant argument which is required to
> +generate the appropriate barrier instruction. The backend should take care to
> +emit the target barrier instruction only when necessary i.e., for SMP guests and
> +when MTTCG is enabled.
> +
> +The guest translators should generate this opcode for all guest instructions
> +which have ordering side effects.
> +
> +Please see docs/atomics.txt for more information on memory barriers.
> +
>  ********* 64-bit guest on 32-bit host support
>
>  The following opcodes are internal to TCG.  Thus they are to be implemented by
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 54c0277..08f7858 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -39,6 +39,8 @@ extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64);
>  #define TCGV_HIGH TCGV_HIGH_link_error
>  #endif
>
> +extern int smp_cpus;
> +
>  /* Note that this is optimized for sequential allocation during translate.
>     Up to and including filling in the forward link immediately.  We'll do
>     proper termination of the end of the list after we finish translation.  */
> @@ -146,6 +148,15 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
>      tcg_emit_op(ctx, opc, pi);
>  }
>
> +void tcg_gen_mb(TCGArg mb_type)
> +{
> +#ifndef CONFIG_USER_ONLY
> +    if (qemu_tcg_mttcg_enabled() && smp_cpus > 1) {
> +        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
> +    }
> +#endif
> +}
> +

This introduces a dependency on MTTCG which maybe is best handled in
that patch series. I get the feeling we might be able to merge this
series before MTTCG as barriers are still required for user-mode. Maybe
something like this makes more sense to keep the patch series
independent for now:

    void tcg_gen_mb(TCGArg mb_type)
    {
    #ifdef CONFIG_USER_ONLY
        const bool emit_barriers = true;
    #else
        /* TODO: When MTTCG is available for system mode
         * we will enable when qemu_tcg_mttcg_enabled() && smp_cpus > 1
         */
        bool emit_barriers = false;
    #endif
        if (emit_barriers) {
            tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
        }
    }

Maybe it is time to look at the user-mode memory model testers like:

  http://diy.inria.fr/

Rich,

What do you think? Do you think the memory barrier stuff should be kept
independent so it can be merged once ready?

>  /* 32 bit ops */
>
>  void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
> index f217e80..41890cc 100644
> --- a/tcg/tcg-op.h
> +++ b/tcg/tcg-op.h
> @@ -261,6 +261,8 @@ static inline void tcg_gen_br(TCGLabel *l)
>      tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
>  }
>
> +void tcg_gen_mb(TCGArg a);
> +
>  /* Helper calls. */
>
>  /* 32 bit ops */
> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
> index 6d0410c..45528d2 100644
> --- a/tcg/tcg-opc.h
> +++ b/tcg/tcg-opc.h
> @@ -42,6 +42,8 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
>  # define IMPL64  TCG_OPF_64BIT
>  #endif
>
> +DEF(mb, 0, 0, 1, 0)
> +
>  DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
>  DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
>  DEF(setcond_i32, 1, 2, 1, 0)
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index db6a062..36feca9 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>
> +typedef enum {
> +    TCG_MO_LD_LD    = 1,
> +    TCG_MO_ST_LD    = 2,
> +    TCG_MO_LD_ST    = 4,
> +    TCG_MO_ST_ST    = 8,
> +    TCG_MO_ALL      = 0xF, // OR of all above
> +} TCGOrder;
> +
> +typedef enum {
> +    TCG_BAR_ACQ     = 32,
> +    TCG_BAR_REL     = 64,
> +    TCG_BAR_SC      = 128,
> +} TCGBar;

I mentioned my confusing in another email about having separate enums here.

> +
>  /* Conditions.  Note that these are laid out for easy manipulation by
>     the functions below:
>       bit 0 is used for inverting;


--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-21 18:04   ` Alex Bennée
@ 2016-06-21 18:09     ` Pranith Kumar
  2016-06-21 18:23       ` Alex Bennée
  0 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-21 18:09 UTC (permalink / raw)
  To: Alex Bennée
  Cc: Richard Henderson, open list:All patches CC here, Sergey Fedorov

On Tue, Jun 21, 2016 at 2:04 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>
> Pranith Kumar <bobby.prani@gmail.com> writes:
>
>> This commit introduces the TCGOpcode for memory barrier instruction.
>>
>> This opcode takes an argument which is the type of memory barrier
>> which should be generated.
>>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  tcg/README    | 17 +++++++++++++++++
>>  tcg/tcg-op.c  | 11 +++++++++++
>>  tcg/tcg-op.h  |  2 ++
>>  tcg/tcg-opc.h |  2 ++
>>  tcg/tcg.h     | 14 ++++++++++++++
>>  5 files changed, 46 insertions(+)
>>
>> diff --git a/tcg/README b/tcg/README
>> index ce8beba..1d48aa9 100644
>> --- a/tcg/README
>> +++ b/tcg/README
>> @@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
>>
>>  Similar to mulu2, except the two inputs T1 and T2 are signed.
>>
>> +********* Memory Barrier support
>> +
>> +* mb <$arg>
>> +
>> +Generate a target memory barrier instruction to ensure memory ordering as being
>> +enforced by a corresponding guest memory barrier instruction. The ordering
>> +enforced by the backend may be stricter than the ordering required by the guest.
>> +It cannot be weaker. This opcode takes a constant argument which is required to
>> +generate the appropriate barrier instruction. The backend should take care to
>> +emit the target barrier instruction only when necessary i.e., for SMP guests and
>> +when MTTCG is enabled.
>> +
>> +The guest translators should generate this opcode for all guest instructions
>> +which have ordering side effects.
>> +
>> +Please see docs/atomics.txt for more information on memory barriers.
>> +
>>  ********* 64-bit guest on 32-bit host support
>>
>>  The following opcodes are internal to TCG.  Thus they are to be implemented by
>> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
>> index 54c0277..08f7858 100644
>> --- a/tcg/tcg-op.c
>> +++ b/tcg/tcg-op.c
>> @@ -39,6 +39,8 @@ extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64);
>>  #define TCGV_HIGH TCGV_HIGH_link_error
>>  #endif
>>
>> +extern int smp_cpus;
>> +
>>  /* Note that this is optimized for sequential allocation during translate.
>>     Up to and including filling in the forward link immediately.  We'll do
>>     proper termination of the end of the list after we finish translation.  */
>> @@ -146,6 +148,15 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
>>      tcg_emit_op(ctx, opc, pi);
>>  }
>>
>> +void tcg_gen_mb(TCGArg mb_type)
>> +{
>> +#ifndef CONFIG_USER_ONLY
>> +    if (qemu_tcg_mttcg_enabled() && smp_cpus > 1) {
>> +        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
>> +    }
>> +#endif
>> +}
>> +
>
> This introduces a dependency on MTTCG which maybe is best handled in
> that patch series. I get the feeling we might be able to merge this
> series before MTTCG as barriers are still required for user-mode. Maybe
> something like this makes more sense to keep the patch series
> independent for now:
>
>     void tcg_gen_mb(TCGArg mb_type)
>     {
>     #ifdef CONFIG_USER_ONLY
>         const bool emit_barriers = true;
>     #else
>         /* TODO: When MTTCG is available for system mode
>          * we will enable when qemu_tcg_mttcg_enabled() && smp_cpus > 1
>          */
>         bool emit_barriers = false;
>     #endif
>         if (emit_barriers) {
>             tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
>         }
>     }

I was basing my patches on top of your mttcg base tree, since I can
test these patches only with MTTCG enabled :)

If you think I should rebase on mainine, I will do so with the changes
you suggested above.

>
> Maybe it is time to look at the user-mode memory model testers like:
>
>   http://diy.inria.fr/

Interesting. I will take a look. Thanks for the link.

>
> Rich,
>
> What do you think? Do you think the memory barrier stuff should be kept
> independent so it can be merged once ready?
>
>>  /* 32 bit ops */
>>
>>  void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
>> index f217e80..41890cc 100644
>> --- a/tcg/tcg-op.h
>> +++ b/tcg/tcg-op.h
>> @@ -261,6 +261,8 @@ static inline void tcg_gen_br(TCGLabel *l)
>>      tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
>>  }
>>
>> +void tcg_gen_mb(TCGArg a);
>> +
>>  /* Helper calls. */
>>
>>  /* 32 bit ops */
>> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
>> index 6d0410c..45528d2 100644
>> --- a/tcg/tcg-opc.h
>> +++ b/tcg/tcg-opc.h
>> @@ -42,6 +42,8 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
>>  # define IMPL64  TCG_OPF_64BIT
>>  #endif
>>
>> +DEF(mb, 0, 0, 1, 0)
>> +
>>  DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
>>  DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
>>  DEF(setcond_i32, 1, 2, 1, 0)
>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>> index db6a062..36feca9 100644
>> --- a/tcg/tcg.h
>> +++ b/tcg/tcg.h
>> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>>
>> +typedef enum {
>> +    TCG_MO_LD_LD    = 1,
>> +    TCG_MO_ST_LD    = 2,
>> +    TCG_MO_LD_ST    = 4,
>> +    TCG_MO_ST_ST    = 8,
>> +    TCG_MO_ALL      = 0xF, // OR of all above
>> +} TCGOrder;
>> +
>> +typedef enum {
>> +    TCG_BAR_ACQ     = 32,
>> +    TCG_BAR_REL     = 64,
>> +    TCG_BAR_SC      = 128,
>> +} TCGBar;
>
> I mentioned my confusing in another email about having separate enums here.


Noted. I will add a comment explaining the split.

-- 
Pranith

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-21 18:09     ` Pranith Kumar
@ 2016-06-21 18:23       ` Alex Bennée
  2016-06-21 19:40         ` Richard Henderson
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Bennée @ 2016-06-21 18:23 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Richard Henderson, open list:All patches CC here, Sergey Fedorov


Pranith Kumar <bobby.prani@gmail.com> writes:

> On Tue, Jun 21, 2016 at 2:04 PM, Alex Bennée <alex.bennee@linaro.org> wrote:
>>
>> Pranith Kumar <bobby.prani@gmail.com> writes:
>>
>>> This commit introduces the TCGOpcode for memory barrier instruction.
>>>
>>> This opcode takes an argument which is the type of memory barrier
>>> which should be generated.
>>>
>>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>> ---
>>>  tcg/README    | 17 +++++++++++++++++
>>>  tcg/tcg-op.c  | 11 +++++++++++
>>>  tcg/tcg-op.h  |  2 ++
>>>  tcg/tcg-opc.h |  2 ++
>>>  tcg/tcg.h     | 14 ++++++++++++++
>>>  5 files changed, 46 insertions(+)
>>>
>>> diff --git a/tcg/README b/tcg/README
>>> index ce8beba..1d48aa9 100644
>>> --- a/tcg/README
>>> +++ b/tcg/README
>>> @@ -402,6 +402,23 @@ double-word product T0.  The later is returned in two single-word outputs.
>>>
>>>  Similar to mulu2, except the two inputs T1 and T2 are signed.
>>>
>>> +********* Memory Barrier support
>>> +
>>> +* mb <$arg>
>>> +
>>> +Generate a target memory barrier instruction to ensure memory ordering as being
>>> +enforced by a corresponding guest memory barrier instruction. The ordering
>>> +enforced by the backend may be stricter than the ordering required by the guest.
>>> +It cannot be weaker. This opcode takes a constant argument which is required to
>>> +generate the appropriate barrier instruction. The backend should take care to
>>> +emit the target barrier instruction only when necessary i.e., for SMP guests and
>>> +when MTTCG is enabled.
>>> +
>>> +The guest translators should generate this opcode for all guest instructions
>>> +which have ordering side effects.
>>> +
>>> +Please see docs/atomics.txt for more information on memory barriers.
>>> +
>>>  ********* 64-bit guest on 32-bit host support
>>>
>>>  The following opcodes are internal to TCG.  Thus they are to be implemented by
>>> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
>>> index 54c0277..08f7858 100644
>>> --- a/tcg/tcg-op.c
>>> +++ b/tcg/tcg-op.c
>>> @@ -39,6 +39,8 @@ extern TCGv_i32 TCGV_HIGH_link_error(TCGv_i64);
>>>  #define TCGV_HIGH TCGV_HIGH_link_error
>>>  #endif
>>>
>>> +extern int smp_cpus;
>>> +
>>>  /* Note that this is optimized for sequential allocation during translate.
>>>     Up to and including filling in the forward link immediately.  We'll do
>>>     proper termination of the end of the list after we finish translation.  */
>>> @@ -146,6 +148,15 @@ void tcg_gen_op6(TCGContext *ctx, TCGOpcode opc, TCGArg a1, TCGArg a2,
>>>      tcg_emit_op(ctx, opc, pi);
>>>  }
>>>
>>> +void tcg_gen_mb(TCGArg mb_type)
>>> +{
>>> +#ifndef CONFIG_USER_ONLY
>>> +    if (qemu_tcg_mttcg_enabled() && smp_cpus > 1) {
>>> +        tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
>>> +    }
>>> +#endif
>>> +}
>>> +
>>
>> This introduces a dependency on MTTCG which maybe is best handled in
>> that patch series. I get the feeling we might be able to merge this
>> series before MTTCG as barriers are still required for user-mode. Maybe
>> something like this makes more sense to keep the patch series
>> independent for now:
>>
>>     void tcg_gen_mb(TCGArg mb_type)
>>     {
>>     #ifdef CONFIG_USER_ONLY
>>         const bool emit_barriers = true;
>>     #else
>>         /* TODO: When MTTCG is available for system mode
>>          * we will enable when qemu_tcg_mttcg_enabled() && smp_cpus > 1
>>          */
>>         bool emit_barriers = false;
>>     #endif
>>         if (emit_barriers) {
>>             tcg_gen_op1(&tcg_ctx, INDEX_op_mb, mb_type);
>>         }
>>     }
>
> I was basing my patches on top of your mttcg base tree, since I can
> test these patches only with MTTCG enabled :)
>
> If you think I should rebase on mainine, I will do so with the changes
> you suggested above.

Well I'll be guided by what Richard thinks about the chances of this
merging ahead of the main MTTCG patches. You were mentioning the trouble
with testing with the kvm-unit-tests based tests which all need MTTCG
system mode to trigger problems. However user mode emulation "works" now
(even better since Peter's signal fixes) so it might be worth
investigating if we can leverage some user-mode testing instead.

Also generally keeping patch series clean and self contained stops too
large a pile of patches getting pilled up waiting for the controversial
bits to get reviewed and merged.

>
>>
>> Maybe it is time to look at the user-mode memory model testers like:
>>
>>   http://diy.inria.fr/
>
> Interesting. I will take a look. Thanks for the link.
>
>>
>> Rich,
>>
>> What do you think? Do you think the memory barrier stuff should be kept
>> independent so it can be merged once ready?
>>
>>>  /* 32 bit ops */
>>>
>>>  void tcg_gen_addi_i32(TCGv_i32 ret, TCGv_i32 arg1, int32_t arg2)
>>> diff --git a/tcg/tcg-op.h b/tcg/tcg-op.h
>>> index f217e80..41890cc 100644
>>> --- a/tcg/tcg-op.h
>>> +++ b/tcg/tcg-op.h
>>> @@ -261,6 +261,8 @@ static inline void tcg_gen_br(TCGLabel *l)
>>>      tcg_gen_op1(&tcg_ctx, INDEX_op_br, label_arg(l));
>>>  }
>>>
>>> +void tcg_gen_mb(TCGArg a);
>>> +
>>>  /* Helper calls. */
>>>
>>>  /* 32 bit ops */
>>> diff --git a/tcg/tcg-opc.h b/tcg/tcg-opc.h
>>> index 6d0410c..45528d2 100644
>>> --- a/tcg/tcg-opc.h
>>> +++ b/tcg/tcg-opc.h
>>> @@ -42,6 +42,8 @@ DEF(br, 0, 0, 1, TCG_OPF_BB_END)
>>>  # define IMPL64  TCG_OPF_64BIT
>>>  #endif
>>>
>>> +DEF(mb, 0, 0, 1, 0)
>>> +
>>>  DEF(mov_i32, 1, 1, 0, TCG_OPF_NOT_PRESENT)
>>>  DEF(movi_i32, 1, 0, 1, TCG_OPF_NOT_PRESENT)
>>>  DEF(setcond_i32, 1, 2, 1, 0)
>>> diff --git a/tcg/tcg.h b/tcg/tcg.h
>>> index db6a062..36feca9 100644
>>> --- a/tcg/tcg.h
>>> +++ b/tcg/tcg.h
>>> @@ -408,6 +408,20 @@ static inline intptr_t QEMU_ARTIFICIAL GET_TCGV_PTR(TCGv_ptr t)
>>>  #define TCG_CALL_DUMMY_TCGV     MAKE_TCGV_I32(-1)
>>>  #define TCG_CALL_DUMMY_ARG      ((TCGArg)(-1))
>>>
>>> +typedef enum {
>>> +    TCG_MO_LD_LD    = 1,
>>> +    TCG_MO_ST_LD    = 2,
>>> +    TCG_MO_LD_ST    = 4,
>>> +    TCG_MO_ST_ST    = 8,
>>> +    TCG_MO_ALL      = 0xF, // OR of all above
>>> +} TCGOrder;
>>> +
>>> +typedef enum {
>>> +    TCG_BAR_ACQ     = 32,
>>> +    TCG_BAR_REL     = 64,
>>> +    TCG_BAR_SC      = 128,
>>> +} TCGBar;
>>
>> I mentioned my confusing in another email about having separate enums here.
>
>
> Noted. I will add a comment explaining the split.


--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier
  2016-06-21 18:23       ` Alex Bennée
@ 2016-06-21 19:40         ` Richard Henderson
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Henderson @ 2016-06-21 19:40 UTC (permalink / raw)
  To: Alex Bennée, Pranith Kumar
  Cc: open list:All patches CC here, Sergey Fedorov

On 06/21/2016 11:23 AM, Alex Bennée wrote:
>> If you think I should rebase on mainine, I will do so with the changes
>> you suggested above.
>
> Well I'll be guided by what Richard thinks about the chances of this
> merging ahead of the main MTTCG patches. You were mentioning the trouble
> with testing with the kvm-unit-tests based tests which all need MTTCG
> system mode to trigger problems. However user mode emulation "works" now
> (even better since Peter's signal fixes) so it might be worth
> investigating if we can leverage some user-mode testing instead.
>
> Also generally keeping patch series clean and self contained stops too
> large a pile of patches getting pilled up waiting for the controversial
> bits to get reviewed and merged.

I'm willing to merge this before mttcg.  I'm with Alex that we should minimize 
the amount of code that gets blocked by that patch set.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-21  7:24   ` Paolo Bonzini
  2016-06-22 16:25   ` Alex Bennée
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: " Pranith Kumar
                   ` (12 subsequent siblings)
  14 siblings, 2 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Richard Henderson, open list:i386 target; +Cc: alex.bennee, serge.fdrv

Generate mfence/sfence/lfence instruction on SSE2 enabled
processors. For older processors, generate a 'lock orl $0,0(%esp)'
instruction which has full ordering semantics.

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
[rth: Check for sse2, fallback to locked memory op otherwise.]
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.inc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 47 insertions(+)

diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 317484c..0748652 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -121,6 +121,16 @@ static bool have_cmov;
 # define have_cmov 0
 #endif
 
+/* For 32-bit, we are going to attempt to determine at runtime whether
+   sse2 support is available.  */
+#if TCG_TARGET_REG_BITS == 64 || defined(__SSE2__)
+# define have_sse2 1
+#elif defined(CONFIG_CPUID_H) && defined(bit_SSE2)
+static bool have_sse2;
+#else
+# define have_sse2 0
+#endif
+
 /* If bit_MOVBE is defined in cpuid.h (added in GCC version 4.6), we are
    going to attempt to determine at runtime whether movbe is available.  */
 #if defined(CONFIG_CPUID_H) && defined(bit_MOVBE)
@@ -686,6 +696,32 @@ static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
     }
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    if (have_sse2) {
+        tcg_out16(s, 0xae0f);
+        switch (a0 & TCG_MO_ALL) {
+        case TCG_MO_LD_LD:
+            /* lfence */
+            tcg_out8(s, 0xe8);
+            break;
+        case TCG_MO_ST_ST:
+            /* sfence */
+            tcg_out8(s, 0xf8);
+            break;
+        default:
+            /* mfence */
+            tcg_out8(s, 0xf0);
+            break;
+        }
+    } else {
+        /* lock orl $0,0(%esp) */
+        tcg_out8(s, 0xf0);
+        tcg_out_modrm_offset(s, OPC_ARITH_EvIb, ARITH_OR, TCG_REG_ESP, 0);
+        tcg_out8(s, 0);
+    }
+}
+
 static inline void tcg_out_push(TCGContext *s, int reg)
 {
     tcg_out_opc(s, OPC_PUSH_r32 + LOWREGMASK(reg), 0, reg, 0);
@@ -2120,6 +2156,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         }
         break;
 
+    case INDEX_op_mb:
+        assert(args[0] != 0);
+        tcg_out_mb(s, args[0]);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2185,6 +2225,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
     { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
     { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } },
 
+    { INDEX_op_mb, { } },
+
 #if TCG_TARGET_REG_BITS == 32
     { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } },
     { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } },
@@ -2362,6 +2404,11 @@ static void tcg_target_init(TCGContext *s)
            available, we'll use a small forward branch.  */
         have_cmov = (d & bit_CMOV) != 0;
 #endif
+#ifndef have_sse2
+        /* Likewise, almost all hardware supports SSE2, but we do
+           have a locked memory operation to use as a substitute.  */
+        have_sse2 = (d & bit_SSE2) != 0;
+#endif
 #ifndef have_movbe
         /* MOVBE is only available on Intel Atom and Haswell CPUs, so we
            need to probe for it.  */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence Pranith Kumar
@ 2016-06-21  7:24   ` Paolo Bonzini
  2016-06-22 16:25   ` Alex Bennée
  1 sibling, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2016-06-21  7:24 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson, open list:i386 target
  Cc: serge.fdrv, alex.bennee



On 18/06/2016 06:03, Pranith Kumar wrote:
> Generate mfence/sfence/lfence instruction on SSE2 enabled
> processors. For older processors, generate a 'lock orl $0,0(%esp)'
> instruction which has full ordering semantics.
> 
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> [rth: Check for sse2, fallback to locked memory op otherwise.]
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/i386/tcg-target.inc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
> 
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 317484c..0748652 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -121,6 +121,16 @@ static bool have_cmov;
>  # define have_cmov 0
>  #endif
>  
> +/* For 32-bit, we are going to attempt to determine at runtime whether
> +   sse2 support is available.  */
> +#if TCG_TARGET_REG_BITS == 64 || defined(__SSE2__)
> +# define have_sse2 1
> +#elif defined(CONFIG_CPUID_H) && defined(bit_SSE2)
> +static bool have_sse2;
> +#else
> +# define have_sse2 0
> +#endif
> +
>  /* If bit_MOVBE is defined in cpuid.h (added in GCC version 4.6), we are
>     going to attempt to determine at runtime whether movbe is available.  */
>  #if defined(CONFIG_CPUID_H) && defined(bit_MOVBE)
> @@ -686,6 +696,32 @@ static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
>      }
>  }
>  
> +static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
> +{
> +    if (have_sse2) {
> +        tcg_out16(s, 0xae0f);
> +        switch (a0 & TCG_MO_ALL) {
> +        case TCG_MO_LD_LD:
> +            /* lfence */
> +            tcg_out8(s, 0xe8);
> +            break;
> +        case TCG_MO_ST_ST:
> +            /* sfence */
> +            tcg_out8(s, 0xf8);
> +            break;

These two barriers are unnecessary on x86, and so is TCG_MO_LD_ST.

> +        default:
> +            /* mfence */
> +            tcg_out8(s, 0xf0);
> +            break;

Please use lock orl here too, it turns out to be faster.

> +        }
> +    } else {
> +        /* lock orl $0,0(%esp) */
> +        tcg_out8(s, 0xf0);
> +        tcg_out_modrm_offset(s, OPC_ARITH_EvIb, ARITH_OR, TCG_REG_ESP, 0);
> +        tcg_out8(s, 0);

This is only needed for TCG_MO_ST_LD.

Paolo

> +    }
> +}
> +
>  static inline void tcg_out_push(TCGContext *s, int reg)
>  {
>      tcg_out_opc(s, OPC_PUSH_r32 + LOWREGMASK(reg), 0, reg, 0);
> @@ -2120,6 +2156,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          }
>          break;
>  
> +    case INDEX_op_mb:
> +        assert(args[0] != 0);
> +        tcg_out_mb(s, args[0]);
> +        break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> @@ -2185,6 +2225,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
>      { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
>      { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } },
>  
> +    { INDEX_op_mb, { } },
> +
>  #if TCG_TARGET_REG_BITS == 32
>      { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } },
>      { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } },
> @@ -2362,6 +2404,11 @@ static void tcg_target_init(TCGContext *s)
>             available, we'll use a small forward branch.  */
>          have_cmov = (d & bit_CMOV) != 0;
>  #endif
> +#ifndef have_sse2
> +        /* Likewise, almost all hardware supports SSE2, but we do
> +           have a locked memory operation to use as a substitute.  */
> +        have_sse2 = (d & bit_SSE2) != 0;
> +#endif
>  #ifndef have_movbe
>          /* MOVBE is only available on Intel Atom and Haswell CPUs, so we
>             need to probe for it.  */
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence Pranith Kumar
  2016-06-21  7:24   ` Paolo Bonzini
@ 2016-06-22 16:25   ` Alex Bennée
  2016-06-22 16:49     ` Richard Henderson
  1 sibling, 1 reply; 56+ messages in thread
From: Alex Bennée @ 2016-06-22 16:25 UTC (permalink / raw)
  To: Pranith Kumar; +Cc: Richard Henderson, open list:i386 target, serge.fdrv


Pranith Kumar <bobby.prani@gmail.com> writes:

> Generate mfence/sfence/lfence instruction on SSE2 enabled
> processors. For older processors, generate a 'lock orl $0,0(%esp)'
> instruction which has full ordering semantics.
>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> [rth: Check for sse2, fallback to locked memory op otherwise.]
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/i386/tcg-target.inc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>  1 file changed, 47 insertions(+)
>
> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
> index 317484c..0748652 100644
> --- a/tcg/i386/tcg-target.inc.c
> +++ b/tcg/i386/tcg-target.inc.c
> @@ -121,6 +121,16 @@ static bool have_cmov;
>  # define have_cmov 0
>  #endif
>
> +/* For 32-bit, we are going to attempt to determine at runtime whether
> +   sse2 support is available.  */
> +#if TCG_TARGET_REG_BITS == 64 || defined(__SSE2__)

Hmm checkpatch.pl warns against including architecture specific defines.
Is the || leg only going to trigger when building 32 bit x86 with custom
compiler flags to force SSE2 code? Perhaps it is worth just leaving this
case to the cpuid code?

> +# define have_sse2 1
> +#elif defined(CONFIG_CPUID_H) && defined(bit_SSE2)
> +static bool have_sse2;
> +#else
> +# define have_sse2 0
> +#endif

I was going to say the mixing of define and parameter seems a bit icky
but I see other code in this function does the same thing.

> +
>  /* If bit_MOVBE is defined in cpuid.h (added in GCC version 4.6), we are
>     going to attempt to determine at runtime whether movbe is available.  */
>  #if defined(CONFIG_CPUID_H) && defined(bit_MOVBE)
> @@ -686,6 +696,32 @@ static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
>      }
>  }
>
> +static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
> +{
> +    if (have_sse2) {
> +        tcg_out16(s, 0xae0f);
> +        switch (a0 & TCG_MO_ALL) {
> +        case TCG_MO_LD_LD:
> +            /* lfence */
> +            tcg_out8(s, 0xe8);
> +            break;
> +        case TCG_MO_ST_ST:
> +            /* sfence */
> +            tcg_out8(s, 0xf8);
> +            break;
> +        default:
> +            /* mfence */
> +            tcg_out8(s, 0xf0);
> +            break;
> +        }
> +    } else {
> +        /* lock orl $0,0(%esp) */
> +        tcg_out8(s, 0xf0);
> +        tcg_out_modrm_offset(s, OPC_ARITH_EvIb, ARITH_OR, TCG_REG_ESP, 0);
> +        tcg_out8(s, 0);
> +    }
> +}
> +
>  static inline void tcg_out_push(TCGContext *s, int reg)
>  {
>      tcg_out_opc(s, OPC_PUSH_r32 + LOWREGMASK(reg), 0, reg, 0);
> @@ -2120,6 +2156,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          }
>          break;
>
> +    case INDEX_op_mb:
> +        assert(args[0] != 0);

Please use tcg_debug_assert for this.

> +        tcg_out_mb(s, args[0]);
> +        break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> @@ -2185,6 +2225,8 @@ static const TCGTargetOpDef x86_op_defs[] = {
>      { INDEX_op_add2_i32, { "r", "r", "0", "1", "ri", "ri" } },
>      { INDEX_op_sub2_i32, { "r", "r", "0", "1", "ri", "ri" } },
>
> +    { INDEX_op_mb, { } },
> +
>  #if TCG_TARGET_REG_BITS == 32
>      { INDEX_op_brcond2_i32, { "r", "r", "ri", "ri" } },
>      { INDEX_op_setcond2_i32, { "r", "r", "r", "ri", "ri" } },
> @@ -2362,6 +2404,11 @@ static void tcg_target_init(TCGContext *s)
>             available, we'll use a small forward branch.  */
>          have_cmov = (d & bit_CMOV) != 0;
>  #endif
> +#ifndef have_sse2
> +        /* Likewise, almost all hardware supports SSE2, but we do
> +           have a locked memory operation to use as a substitute.  */
> +        have_sse2 = (d & bit_SSE2) != 0;
> +#endif
>  #ifndef have_movbe
>          /* MOVBE is only available on Intel Atom and Haswell CPUs, so we
>             need to probe for it.  */


--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence
  2016-06-22 16:25   ` Alex Bennée
@ 2016-06-22 16:49     ` Richard Henderson
  2016-06-22 18:18       ` Alex Bennée
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Henderson @ 2016-06-22 16:49 UTC (permalink / raw)
  To: Alex Bennée, Pranith Kumar; +Cc: open list:i386 target, serge.fdrv

On 06/22/2016 09:25 AM, Alex Bennée wrote:
> 
> Pranith Kumar <bobby.prani@gmail.com> writes:
> 
>> Generate mfence/sfence/lfence instruction on SSE2 enabled
>> processors. For older processors, generate a 'lock orl $0,0(%esp)'
>> instruction which has full ordering semantics.
>>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> [rth: Check for sse2, fallback to locked memory op otherwise.]
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  tcg/i386/tcg-target.inc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>>  1 file changed, 47 insertions(+)
>>
>> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
>> index 317484c..0748652 100644
>> --- a/tcg/i386/tcg-target.inc.c
>> +++ b/tcg/i386/tcg-target.inc.c
>> @@ -121,6 +121,16 @@ static bool have_cmov;
>>  # define have_cmov 0
>>  #endif
>>
>> +/* For 32-bit, we are going to attempt to determine at runtime whether
>> +   sse2 support is available.  */
>> +#if TCG_TARGET_REG_BITS == 64 || defined(__SSE2__)
> 
> Hmm checkpatch.pl warns against including architecture specific defines.
> Is the || leg only going to trigger when building 32 bit x86 with custom
> compiler flags to force SSE2 code?

Yes, e.g. -march=native.

I think checkpatch should be ignored in this situation.  There's precedent
elsewhere in the tcg backends.  And it's definitely architecture specific code.
 ;-)


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence
  2016-06-22 16:49     ` Richard Henderson
@ 2016-06-22 18:18       ` Alex Bennée
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Bennée @ 2016-06-22 18:18 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Pranith Kumar, open list:i386 target, serge.fdrv


Richard Henderson <rth@twiddle.net> writes:

> On 06/22/2016 09:25 AM, Alex Bennée wrote:
>>
>> Pranith Kumar <bobby.prani@gmail.com> writes:
>>
>>> Generate mfence/sfence/lfence instruction on SSE2 enabled
>>> processors. For older processors, generate a 'lock orl $0,0(%esp)'
>>> instruction which has full ordering semantics.
>>>
>>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>>> [rth: Check for sse2, fallback to locked memory op otherwise.]
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>> ---
>>>  tcg/i386/tcg-target.inc.c | 47 +++++++++++++++++++++++++++++++++++++++++++++++
>>>  1 file changed, 47 insertions(+)
>>>
>>> diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
>>> index 317484c..0748652 100644
>>> --- a/tcg/i386/tcg-target.inc.c
>>> +++ b/tcg/i386/tcg-target.inc.c
>>> @@ -121,6 +121,16 @@ static bool have_cmov;
>>>  # define have_cmov 0
>>>  #endif
>>>
>>> +/* For 32-bit, we are going to attempt to determine at runtime whether
>>> +   sse2 support is available.  */
>>> +#if TCG_TARGET_REG_BITS == 64 || defined(__SSE2__)
>>
>> Hmm checkpatch.pl warns against including architecture specific defines.
>> Is the || leg only going to trigger when building 32 bit x86 with custom
>> compiler flags to force SSE2 code?
>
> Yes, e.g. -march=native.
>
> I think checkpatch should be ignored in this situation.  There's precedent
> elsewhere in the tcg backends.  And it's definitely architecture specific code.
>  ;-)

Fair enough, I bow to your maintainerly view ;-)

--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier Pranith Kumar
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-23 16:18   ` Alex Bennée
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 04/14] tcg/arm: " Pranith Kumar
                   ` (11 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Claudio Fontana, Richard Henderson, open list:AArch64 target,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, Claudio Fontana

Cc: Claudio Fontana <claudio.fontana@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/aarch64/tcg-target.inc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1447f7c..bc8ac9c 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -372,6 +372,11 @@ typedef enum {
     I3510_EOR       = 0x4a000000,
     I3510_EON       = 0x4a200000,
     I3510_ANDS      = 0x6a000000,
+
+    /* System instructions.  */
+    DMB_ISH         = 0xd50338bf,
+    DMB_LD          = 0x00000100,
+    DMB_ST          = 0x00000200,
 } AArch64Insn;
 
 static inline uint32_t tcg_in32(TCGContext *s)
@@ -971,6 +976,21 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
     tcg_out_mov(s, ext, orig_rl, rl);
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    switch (a0 & TCG_MO_ALL) {
+    case TCG_MO_LD_LD:
+        tcg_out32(s, DMB_ISH | DMB_LD);
+        break;
+    case TCG_MO_ST_ST:
+        tcg_out32(s, DMB_ISH | DMB_ST);
+        break;
+    default:
+        tcg_out32(s, DMB_ISH | DMB_LD | DMB_ST);
+        break;
+    }
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     TCGMemOpIdx oi, uintptr_t ra)
@@ -1637,6 +1657,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_insn(s, 3508, SMULH, TCG_TYPE_I64, a0, a1, a2);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1761,6 +1785,7 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
     { INDEX_op_muluh_i64, { "r", "r", "r" } },
     { INDEX_op_mulsh_i64, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: " Pranith Kumar
@ 2016-06-23 16:18   ` Alex Bennée
  2016-06-23 16:50     ` Richard Henderson
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Bennée @ 2016-06-23 16:18 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Claudio Fontana, Richard Henderson, open list:AArch64 target,
	open list:All patches CC here, serge.fdrv, Claudio Fontana


Pranith Kumar <bobby.prani@gmail.com> writes:

> Cc: Claudio Fontana <claudio.fontana@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  tcg/aarch64/tcg-target.inc.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
> index 1447f7c..bc8ac9c 100644
> --- a/tcg/aarch64/tcg-target.inc.c
> +++ b/tcg/aarch64/tcg-target.inc.c
> @@ -372,6 +372,11 @@ typedef enum {
>      I3510_EOR       = 0x4a000000,
>      I3510_EON       = 0x4a200000,
>      I3510_ANDS      = 0x6a000000,
> +
> +    /* System instructions.  */
> +    DMB_ISH         = 0xd50338bf,

As ISH is part of the CRm encoding I wonder if this should be split into
the main DMB encoding (0xd50330bf) and a separate set of CRm defines.

In fact the documentation of the struct above implies you should
probably have:

       I6260_DMB       = 0xd50330bf,

And then:

static void tcg_out_insn_6260(TCGContext *s, AArch64Insn insn, int crm);
{
    tcg_out32(s, insn | (crm & 0xf) << 8);
}


Claudio,

Does this actually gain anything over doing a direct tcg_out? I guess it
makes a bit more sense for more complex instruction encodings.

> +    DMB_LD          = 0x00000100,
> +    DMB_ST          = 0x00000200,
>  } AArch64Insn;
>
>  static inline uint32_t tcg_in32(TCGContext *s)
> @@ -971,6 +976,21 @@ static inline void tcg_out_addsub2(TCGContext *s, int ext, TCGReg rl,
>      tcg_out_mov(s, ext, orig_rl, rl);
>  }
>
> +static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
> +{
> +    switch (a0 & TCG_MO_ALL) {
> +    case TCG_MO_LD_LD:
> +        tcg_out32(s, DMB_ISH | DMB_LD);

This would then become:

    tcg_out_insn(s, 6260, DMB, R_LD)

> +        break;
> +    case TCG_MO_ST_ST:
> +        tcg_out32(s, DMB_ISH | DMB_ST);

    tcg_out_insn(s, 6260, DMB, R_ST)

> +        break;
> +    default:
> +        tcg_out32(s, DMB_ISH | DMB_LD | DMB_ST);

    tcg_out_insn(s, 6260, DMB, R_LD|R_ST)

> +        break;
> +    }
> +}
> +
>  #ifdef CONFIG_SOFTMMU
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     TCGMemOpIdx oi, uintptr_t ra)
> @@ -1637,6 +1657,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_insn(s, 3508, SMULH, TCG_TYPE_I64, a0, a1, a2);
>          break;
>
> +    case INDEX_op_mb:
> +        tcg_out_mb(s, a0);
> +        break;
> +
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> @@ -1761,6 +1785,7 @@ static const TCGTargetOpDef aarch64_op_defs[] = {
>      { INDEX_op_muluh_i64, { "r", "r", "r" } },
>      { INDEX_op_mulsh_i64, { "r", "r", "r" } },
>
> +    { INDEX_op_mb, { } },
>      { -1 },
>  };


--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: Add support for fence
  2016-06-23 16:18   ` Alex Bennée
@ 2016-06-23 16:50     ` Richard Henderson
  2016-06-23 19:58       ` Alex Bennée
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Henderson @ 2016-06-23 16:50 UTC (permalink / raw)
  To: Alex Bennée, Pranith Kumar
  Cc: Claudio Fontana, open list:AArch64 target,
	open list:All patches CC here, serge.fdrv, Claudio Fontana

On 06/23/2016 09:18 AM, Alex Bennée wrote:
>> > +    /* System instructions.  */
>> > +    DMB_ISH         = 0xd50338bf,
> As ISH is part of the CRm encoding I wonder if this should be split into
> the main DMB encoding (0xd50330bf) and a separate set of CRm defines.
>
> In fact the documentation of the struct above implies you should
> probably have:
>
>        I6260_DMB       = 0xd50330bf,
>
> And then:
>
> static void tcg_out_insn_6260(TCGContext *s, AArch64Insn insn, int crm);
> {
>     tcg_out32(s, insn | (crm & 0xf) << 8);
> }

I don't see any benefit to doing this over or-ing in the pre-shifted values.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: Add support for fence
  2016-06-23 16:50     ` Richard Henderson
@ 2016-06-23 19:58       ` Alex Bennée
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Bennée @ 2016-06-23 19:58 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Pranith Kumar, Claudio Fontana, open list:AArch64 target,
	open list:All patches CC here, serge.fdrv, Claudio Fontana


Richard Henderson <rth@twiddle.net> writes:

> On 06/23/2016 09:18 AM, Alex Bennée wrote:
>>> > +    /* System instructions.  */
>>> > +    DMB_ISH         = 0xd50338bf,
>> As ISH is part of the CRm encoding I wonder if this should be split into
>> the main DMB encoding (0xd50330bf) and a separate set of CRm defines.
>>
>> In fact the documentation of the struct above implies you should
>> probably have:
>>
>>        I6260_DMB       = 0xd50330bf,
>>
>> And then:
>>
>> static void tcg_out_insn_6260(TCGContext *s, AArch64Insn insn, int crm);
>> {
>>     tcg_out32(s, insn | (crm & 0xf) << 8);
>> }
>
> I don't see any benefit to doing this over or-ing in the pre-shifted
> values.

I'm mainly thinking about keeping in line with the layout of the rest of
the code. I would home the compiler would generate the same either way!

--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 04/14] tcg/arm: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (2 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-23 16:30   ` Alex Bennée
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 05/14] tcg/ia64: " Pranith Kumar
                   ` (10 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Andrzej Zaborowski, Richard Henderson, open list:ARM target,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, Peter Maydell

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.inc.c | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index f9f54c6..1447aa8 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -313,6 +313,10 @@ typedef enum {
     INSN_LDRD_REG  = 0x000000d0,
     INSN_STRD_IMM  = 0x004000f0,
     INSN_STRD_REG  = 0x000000f0,
+
+    INSN_DMB_ISH   = 0x5bf07ff5,
+    INSN_DMB_MCR   = 0xba0f07ee,
+
 } ARMInsn;
 
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
@@ -1066,6 +1070,15 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, TCGLabel *l)
     }
 }
 
+static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    if (use_armv7_instructions) {
+        tcg_out32(s, INSN_DMB_ISH);
+    } else if (use_armv6_instructions) {
+        tcg_out32(s, INSN_DMB_MCR);
+    }
+}
+
 #ifdef CONFIG_SOFTMMU
 /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
  *                                     int mmu_idx, uintptr_t ra)
@@ -1923,6 +1936,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_udiv(s, COND_AL, args[0], args[1], args[2]);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -1997,6 +2014,7 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_div_i32, { "r", "r", "r" } },
     { INDEX_op_divu_i32, { "r", "r", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 04/14] tcg/arm: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 04/14] tcg/arm: " Pranith Kumar
@ 2016-06-23 16:30   ` Alex Bennée
  2016-06-23 16:49     ` Richard Henderson
  0 siblings, 1 reply; 56+ messages in thread
From: Alex Bennée @ 2016-06-23 16:30 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Andrzej Zaborowski, Richard Henderson, open list:ARM target,
	open list:All patches CC here, serge.fdrv, Peter Maydell


Pranith Kumar <bobby.prani@gmail.com> writes:

> Cc: Andrzej Zaborowski <balrogg@gmail.com>
> Cc: Peter Maydell <peter.maydell@linaro.org>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/arm/tcg-target.inc.c | 18 ++++++++++++++++++
>  1 file changed, 18 insertions(+)
>
> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
> index f9f54c6..1447aa8 100644
> --- a/tcg/arm/tcg-target.inc.c
> +++ b/tcg/arm/tcg-target.inc.c
> @@ -313,6 +313,10 @@ typedef enum {
>      INSN_LDRD_REG  = 0x000000d0,
>      INSN_STRD_IMM  = 0x004000f0,
>      INSN_STRD_REG  = 0x000000f0,
> +
> +    INSN_DMB_ISH   = 0x5bf07ff5,
> +    INSN_DMB_MCR   = 0xba0f07ee,

Again I think you might want to split the instruction encoding. Also
where did you get these encoding from? Is it right the byte-order has
been reversed if it is being written out by endian aware helpers?

> +
>  } ARMInsn;
>
>  #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
> @@ -1066,6 +1070,15 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, TCGLabel *l)
>      }
>  }
>
> +static inline void tcg_out_mb(TCGContext *s, TCGArg a0)
> +{
> +    if (use_armv7_instructions) {
> +        tcg_out32(s, INSN_DMB_ISH);
> +    } else if (use_armv6_instructions) {
> +        tcg_out32(s, INSN_DMB_MCR);
> +    }
> +}
> +
>  #ifdef CONFIG_SOFTMMU
>  /* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
>   *                                     int mmu_idx, uintptr_t ra)
> @@ -1923,6 +1936,10 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
>          tcg_out_udiv(s, COND_AL, args[0], args[1], args[2]);
>          break;
>
> +    case INDEX_op_mb:
> +        tcg_out_mb(s, args[0]);
> +        break;
> +
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
>      case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
> @@ -1997,6 +2014,7 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { INDEX_op_div_i32, { "r", "r", "r" } },
>      { INDEX_op_divu_i32, { "r", "r", "r" } },
>
> +    { INDEX_op_mb, { } },
>      { -1 },
>  };


--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 04/14] tcg/arm: Add support for fence
  2016-06-23 16:30   ` Alex Bennée
@ 2016-06-23 16:49     ` Richard Henderson
  0 siblings, 0 replies; 56+ messages in thread
From: Richard Henderson @ 2016-06-23 16:49 UTC (permalink / raw)
  To: Alex Bennée, Pranith Kumar
  Cc: Andrzej Zaborowski, open list:ARM target,
	open list:All patches CC here, serge.fdrv, Peter Maydell

On 06/23/2016 09:30 AM, Alex Bennée wrote:
>
> Pranith Kumar <bobby.prani@gmail.com> writes:
>
>> Cc: Andrzej Zaborowski <balrogg@gmail.com>
>> Cc: Peter Maydell <peter.maydell@linaro.org>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> ---
>>  tcg/arm/tcg-target.inc.c | 18 ++++++++++++++++++
>>  1 file changed, 18 insertions(+)
>>
>> diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
>> index f9f54c6..1447aa8 100644
>> --- a/tcg/arm/tcg-target.inc.c
>> +++ b/tcg/arm/tcg-target.inc.c
>> @@ -313,6 +313,10 @@ typedef enum {
>>      INSN_LDRD_REG  = 0x000000d0,
>>      INSN_STRD_IMM  = 0x004000f0,
>>      INSN_STRD_REG  = 0x000000f0,
>> +
>> +    INSN_DMB_ISH   = 0x5bf07ff5,
>> +    INSN_DMB_MCR   = 0xba0f07ee,
>
> Again I think you might want to split the instruction encoding. Also
> where did you get these encoding from? Is it right the byte-order has
> been reversed if it is being written out by endian aware helpers?

I don't think there's any point in separating out ISH, or all the parts of the 
MCR move.

As for the endianness... I must have read the number out of objdump incorrectly.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 05/14] tcg/ia64: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (3 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 04/14] tcg/arm: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 06/14] tcg/mips: " Pranith Kumar
                   ` (9 subsequent siblings)
  14 siblings, 0 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Aurelien Jarno, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/ia64/tcg-target.inc.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index 395223e..7b220a7 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -247,6 +247,7 @@ enum {
     OPC_LD4_M3                = 0x0a080000000ull,
     OPC_LD8_M1                = 0x080c0000000ull,
     OPC_LD8_M3                = 0x0a0c0000000ull,
+    OPC_MF_M24                = 0x00110000000ull,
     OPC_MUX1_I3               = 0x0eca0000000ull,
     OPC_NOP_B9                = 0x04008000000ull,
     OPC_NOP_F16               = 0x00008000000ull,
@@ -2213,6 +2214,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tcg_out_qemu_st(s, args);
         break;
 
+    case INDEX_op_mb:
+        tcg_out_bundle(s, mmI, OPC_MF_M24, INSN_NOP_M, INSN_NOP_I);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2326,6 +2330,7 @@ static const TCGTargetOpDef ia64_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "SZ", "r" } },
     { INDEX_op_qemu_st_i64, { "SZ", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 06/14] tcg/mips: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (4 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 05/14] tcg/ia64: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: " Pranith Kumar
                   ` (8 subsequent siblings)
  14 siblings, 0 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Aurelien Jarno, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/mips/tcg-target.inc.c | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 50e98ea..fb6cb3e 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -292,6 +292,7 @@ typedef enum {
     OPC_JALR     = OPC_SPECIAL | 0x09,
     OPC_MOVZ     = OPC_SPECIAL | 0x0A,
     OPC_MOVN     = OPC_SPECIAL | 0x0B,
+    OPC_SYNC     = OPC_SPECIAL | 0x0F,
     OPC_MFHI     = OPC_SPECIAL | 0x10,
     OPC_MFLO     = OPC_SPECIAL | 0x12,
     OPC_MULT     = OPC_SPECIAL | 0x18,
@@ -1636,6 +1637,9 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
                         const_args[4], const_args[5], true);
         break;
 
+    case INDEX_op_mb:
+        tcg_out32(s, OPC_SYNC);
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
     case INDEX_op_call:     /* Always emitted via tcg_out_call.  */
@@ -1716,6 +1720,8 @@ static const TCGTargetOpDef mips_op_defs[] = {
     { INDEX_op_qemu_ld_i64, { "L", "L", "lZ", "lZ" } },
     { INDEX_op_qemu_st_i64, { "SZ", "SZ", "SZ", "SZ" } },
 #endif
+
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (5 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 06/14] tcg/mips: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-22 19:50   ` Sergey Fedorov
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 08/14] tcg/s390: " Pranith Kumar
                   ` (7 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Vassili Karpov (malc), Richard Henderson,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/ppc/tcg-target.inc.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index da10052..766848e 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -469,6 +469,10 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
 #define STHX   XO31(407)
 #define STWX   XO31(151)
 
+#define EIEIO  XO31(854)
+#define HWSYNC XO31(598)
+#define LWSYNC (HWSYNC | (1u << 21))
+
 #define SPR(a, b) ((((a)<<5)|(b))<<11)
 #define LR     SPR(8, 0)
 #define CTR    SPR(9, 0)
@@ -1237,6 +1241,21 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
     tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
 }
 
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    switch (a0 & TCG_MO_ALL) {
+    case TCG_MO_LD_LD:
+        tcg_out32(s, LWSYNC);
+        break;
+    case TCG_MO_ST_ST:
+        tcg_out32(s, EIEIO);
+        break;
+    default:
+        tcg_out32(s, HWSYNC);
+        break;
+    }
+}
+
 #ifdef __powerpc64__
 void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
 {
@@ -2439,6 +2458,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         tcg_out32(s, MULHD | TAB(args[0], args[1], args[2]));
         break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, args[0]);
+        break;
+
     case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
@@ -2586,6 +2609,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
     { INDEX_op_qemu_st_i64, { "S", "S", "S", "S" } },
 #endif
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: " Pranith Kumar
@ 2016-06-22 19:50   ` Sergey Fedorov
  2016-06-22 20:21     ` Richard Henderson
  2016-06-23 14:42     ` Sergey Fedorov
  0 siblings, 2 replies; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-22 19:50 UTC (permalink / raw)
  To: Pranith Kumar, Vassili Karpov (malc), Richard Henderson,
	open list:All patches CC here
  Cc: alex.bennee

On 18/06/16 07:03, Pranith Kumar wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  tcg/ppc/tcg-target.inc.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
>
> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
> index da10052..766848e 100644
> --- a/tcg/ppc/tcg-target.inc.c
> +++ b/tcg/ppc/tcg-target.inc.c
> @@ -469,6 +469,10 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
>  #define STHX   XO31(407)
>  #define STWX   XO31(151)
>  
> +#define EIEIO  XO31(854)
> +#define HWSYNC XO31(598)
> +#define LWSYNC (HWSYNC | (1u << 21))
> +
>  #define SPR(a, b) ((((a)<<5)|(b))<<11)
>  #define LR     SPR(8, 0)
>  #define CTR    SPR(9, 0)
> @@ -1237,6 +1241,21 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
>      tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
>  }
>  
> +static void tcg_out_mb(TCGContext *s, TCGArg a0)
> +{
> +    switch (a0 & TCG_MO_ALL) {
> +    case TCG_MO_LD_LD:
> +        tcg_out32(s, LWSYNC);

lwsync can be used for all cases except store-load which requires
hwsync. eieio is for synchronizing memory-mapped IO which is not used by
TCG at all. So there should be:

    switch (a0 & TCG_MO_ALL) {
    case TCG_MO_ST_LD:
        tcg_out32(s, HWSYNC);
        break;
    default:
        tcg_out32(s, LWSYNC);
        break;
    }

Kind regards,
Sergey

> +        break;
> +    case TCG_MO_ST_ST:
> +        tcg_out32(s, EIEIO);
> +        break;
> +    default:
> +        tcg_out32(s, HWSYNC);
> +        break;
> +    }
> +}
> +
>  #ifdef __powerpc64__
>  void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
>  {
> @@ -2439,6 +2458,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>          tcg_out32(s, MULHD | TAB(args[0], args[1], args[2]));
>          break;
>  
> +    case INDEX_op_mb:
> +        tcg_out_mb(s, args[0]);
> +        break;
> +
>      case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
> @@ -2586,6 +2609,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>      { INDEX_op_qemu_st_i64, { "S", "S", "S", "S" } },
>  #endif
>  
> +    { INDEX_op_mb, { } },
>      { -1 },
>  };
>  

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: Add support for fence
  2016-06-22 19:50   ` Sergey Fedorov
@ 2016-06-22 20:21     ` Richard Henderson
  2016-06-22 20:27       ` Sergey Fedorov
  2016-06-23 14:42     ` Sergey Fedorov
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Henderson @ 2016-06-22 20:21 UTC (permalink / raw)
  To: Sergey Fedorov, Pranith Kumar, Vassili Karpov (malc),
	open list:All patches CC here
  Cc: alex.bennee

On 06/22/2016 12:50 PM, Sergey Fedorov wrote:
> On 18/06/16 07:03, Pranith Kumar wrote:
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>>  tcg/ppc/tcg-target.inc.c | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
>> index da10052..766848e 100644
>> --- a/tcg/ppc/tcg-target.inc.c
>> +++ b/tcg/ppc/tcg-target.inc.c
>> @@ -469,6 +469,10 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
>>  #define STHX   XO31(407)
>>  #define STWX   XO31(151)
>>  
>> +#define EIEIO  XO31(854)
>> +#define HWSYNC XO31(598)
>> +#define LWSYNC (HWSYNC | (1u << 21))
>> +
>>  #define SPR(a, b) ((((a)<<5)|(b))<<11)
>>  #define LR     SPR(8, 0)
>>  #define CTR    SPR(9, 0)
>> @@ -1237,6 +1241,21 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
>>      tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
>>  }
>>  
>> +static void tcg_out_mb(TCGContext *s, TCGArg a0)
>> +{
>> +    switch (a0 & TCG_MO_ALL) {
>> +    case TCG_MO_LD_LD:
>> +        tcg_out32(s, LWSYNC);
> 
> lwsync can be used for all cases except store-load which requires
> hwsync. eieio is for synchronizing memory-mapped IO which is not used by
> TCG at all. 

Have a look through linux/arch/powerpc/include/asm/barrier.h wherein we find

  # The eieio instruction is a barrier providing an ordering ...
  # for (a) cacheable stores ....

  # However, on CPUs that don't support lwsync, lwsync actually maps to a
  # heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.

And elsewhere,

#if defined(__powerpc64__) || defined(CONFIG_PPC_E500MC)
#define __SUBARCH_HAS_LWSYNC
#endif

Which suggests that ppc64 should use lwsync and ppc32 should use eieio for ST_ST.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: Add support for fence
  2016-06-22 20:21     ` Richard Henderson
@ 2016-06-22 20:27       ` Sergey Fedorov
  0 siblings, 0 replies; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-22 20:27 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, Vassili Karpov (malc),
	open list:All patches CC here
  Cc: alex.bennee

On 22/06/16 23:21, Richard Henderson wrote:
> On 06/22/2016 12:50 PM, Sergey Fedorov wrote:
>> On 18/06/16 07:03, Pranith Kumar wrote:
>>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>>> ---
>>>  tcg/ppc/tcg-target.inc.c | 24 ++++++++++++++++++++++++
>>>  1 file changed, 24 insertions(+)
>>>
>>> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
>>> index da10052..766848e 100644
>>> --- a/tcg/ppc/tcg-target.inc.c
>>> +++ b/tcg/ppc/tcg-target.inc.c
>>> @@ -469,6 +469,10 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
>>>  #define STHX   XO31(407)
>>>  #define STWX   XO31(151)
>>>  
>>> +#define EIEIO  XO31(854)
>>> +#define HWSYNC XO31(598)
>>> +#define LWSYNC (HWSYNC | (1u << 21))
>>> +
>>>  #define SPR(a, b) ((((a)<<5)|(b))<<11)
>>>  #define LR     SPR(8, 0)
>>>  #define CTR    SPR(9, 0)
>>> @@ -1237,6 +1241,21 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
>>>      tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
>>>  }
>>>  
>>> +static void tcg_out_mb(TCGContext *s, TCGArg a0)
>>> +{
>>> +    switch (a0 & TCG_MO_ALL) {
>>> +    case TCG_MO_LD_LD:
>>> +        tcg_out32(s, LWSYNC);
>> lwsync can be used for all cases except store-load which requires
>> hwsync. eieio is for synchronizing memory-mapped IO which is not used by
>> TCG at all. 
> Have a look through linux/arch/powerpc/include/asm/barrier.h wherein we find
>
>   # The eieio instruction is a barrier providing an ordering ...
>   # for (a) cacheable stores ....
>
>   # However, on CPUs that don't support lwsync, lwsync actually maps to a
>   # heavy-weight sync, so smp_wmb() can be a lighter-weight eieio.
>
> And elsewhere,
>
> #if defined(__powerpc64__) || defined(CONFIG_PPC_E500MC)
> #define __SUBARCH_HAS_LWSYNC
> #endif
>
> Which suggests that ppc64 should use lwsync and ppc32 should use eieio for ST_ST.

Hmm, it's always useful to look at Linux kernel :) Looks like we should
also make this check and use eieio in place of lwsync if it's not supported.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: Add support for fence
  2016-06-22 19:50   ` Sergey Fedorov
  2016-06-22 20:21     ` Richard Henderson
@ 2016-06-23 14:42     ` Sergey Fedorov
  1 sibling, 0 replies; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-23 14:42 UTC (permalink / raw)
  To: Pranith Kumar, Vassili Karpov (malc), Richard Henderson,
	open list:All patches CC here
  Cc: alex.bennee

On 22/06/16 22:50, Sergey Fedorov wrote:
> On 18/06/16 07:03, Pranith Kumar wrote:
>> Signed-off-by: Richard Henderson <rth@twiddle.net>
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>>  tcg/ppc/tcg-target.inc.c | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
>> index da10052..766848e 100644
>> --- a/tcg/ppc/tcg-target.inc.c
>> +++ b/tcg/ppc/tcg-target.inc.c
>> @@ -469,6 +469,10 @@ static int tcg_target_const_match(tcg_target_long val, TCGType type,
>>  #define STHX   XO31(407)
>>  #define STWX   XO31(151)
>>  
>> +#define EIEIO  XO31(854)
>> +#define HWSYNC XO31(598)
>> +#define LWSYNC (HWSYNC | (1u << 21))
>> +
>>  #define SPR(a, b) ((((a)<<5)|(b))<<11)
>>  #define LR     SPR(8, 0)
>>  #define CTR    SPR(9, 0)
>> @@ -1237,6 +1241,21 @@ static void tcg_out_brcond2 (TCGContext *s, const TCGArg *args,
>>      tcg_out_bc(s, BC | BI(7, CR_EQ) | BO_COND_TRUE, arg_label(args[5]));
>>  }
>>  
>> +static void tcg_out_mb(TCGContext *s, TCGArg a0)
>> +{
>> +    switch (a0 & TCG_MO_ALL) {
>> +    case TCG_MO_LD_LD:
>> +        tcg_out32(s, LWSYNC);
> lwsync can be used for all cases except store-load which requires
> hwsync. eieio is for synchronizing memory-mapped IO which is not used by
> TCG at all. 

However, if we think about the possibility to make read HW passthrough
in TCG mode, that doesn't seem impossible. Probably we'd like to take
care of this and introduce some support of memory barrier which can
order IO memory access and normal guest memory access.

Kind regards,
Sergey

> So there should be:
>
>     switch (a0 & TCG_MO_ALL) {
>     case TCG_MO_ST_LD:
>         tcg_out32(s, HWSYNC);
>         break;
>     default:
>         tcg_out32(s, LWSYNC);
>         break;
>     }
>
> Kind regards,
> Sergey
>
>> +        break;
>> +    case TCG_MO_ST_ST:
>> +        tcg_out32(s, EIEIO);
>> +        break;
>> +    default:
>> +        tcg_out32(s, HWSYNC);
>> +        break;
>> +    }
>> +}
>> +
>>  #ifdef __powerpc64__
>>  void ppc_tb_set_jmp_target(uintptr_t jmp_addr, uintptr_t addr)
>>  {
>> @@ -2439,6 +2458,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>>          tcg_out32(s, MULHD | TAB(args[0], args[1], args[2]));
>>          break;
>>  
>> +    case INDEX_op_mb:
>> +        tcg_out_mb(s, args[0]);
>> +        break;
>> +
>>      case INDEX_op_mov_i32:   /* Always emitted via tcg_out_mov.  */
>>      case INDEX_op_mov_i64:
>>      case INDEX_op_movi_i32:  /* Always emitted via tcg_out_movi.  */
>> @@ -2586,6 +2609,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
>>      { INDEX_op_qemu_st_i64, { "S", "S", "S", "S" } },
>>  #endif
>>  
>> +    { INDEX_op_mb, { } },
>>      { -1 },
>>  };
>>  

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 08/14] tcg/s390: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (6 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-21  7:26   ` Paolo Bonzini
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 09/14] tcg/sparc: " Pranith Kumar
                   ` (6 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Alexander Graf, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Alexander Graf <agraf@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/s390/tcg-target.inc.c | 9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index e0a60e6..b83b65b 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -343,6 +343,7 @@ static tcg_insn_unit *tb_ret_addr;
 #define FACILITY_EXT_IMM	(1ULL << (63 - 21))
 #define FACILITY_GEN_INST_EXT	(1ULL << (63 - 34))
 #define FACILITY_LOAD_ON_COND   (1ULL << (63 - 45))
+#define FACILITY_FAST_BCR_SER   FACILITY_LOAD_ON_COND
 
 static uint64_t facilities;
 
@@ -2165,6 +2166,13 @@ static inline void tcg_out_op(TCGContext *s, TCGOpcode opc,
         tgen_deposit(s, args[0], args[2], args[3], args[4]);
         break;
 
+    case INDEX_op_mb:
+        /* The host memory model is quite strong, we simply need to
+           serialize the instruction stream.  */
+        tcg_out_insn(s, RR, BCR,
+		     facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -2286,6 +2294,7 @@ static const TCGTargetOpDef s390_op_defs[] = {
     { INDEX_op_movcond_i64, { "r", "r", "rC", "r", "0" } },
     { INDEX_op_deposit_i64, { "r", "0", "r" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 08/14] tcg/s390: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 08/14] tcg/s390: " Pranith Kumar
@ 2016-06-21  7:26   ` Paolo Bonzini
  0 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2016-06-21  7:26 UTC (permalink / raw)
  To: Pranith Kumar, Alexander Graf, Richard Henderson,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee



On 18/06/2016 06:03, Pranith Kumar wrote:
> +    case INDEX_op_mb:
> +        /* The host memory model is quite strong, we simply need to
> +           serialize the instruction stream.  */
> +        tcg_out_insn(s, RR, BCR,
> +		     facilities & FACILITY_FAST_BCR_SER ? 14 : 15, 0);
> +        break;
> +

Only for st-ld here, too.

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 09/14] tcg/sparc: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (7 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 08/14] tcg/s390: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-22 19:56   ` Sergey Fedorov
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: " Pranith Kumar
                   ` (5 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Blue Swirl, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/sparc/tcg-target.inc.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 9938a50..af8a300 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -249,6 +249,8 @@ static const int tcg_target_call_oarg_regs[] = {
 #define STWA       (INSN_OP(3) | INSN_OP3(0x14))
 #define STXA       (INSN_OP(3) | INSN_OP3(0x1e))
 
+#define MEMBAR     (INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(15) | (1 << 13))
+
 #ifndef ASI_PRIMARY_LITTLE
 #define ASI_PRIMARY_LITTLE 0x88
 #endif
@@ -825,6 +827,24 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
     tcg_out_nop(s);
 }
 
+static void tcg_out_mb(TCGContext *s, TCGArg a0)
+{
+    uint8_t bar_opcode;
+    switch (a0 & TCG_MO_ALL) {
+    case TCG_MO_LD_LD:
+        bar_opcode = 0x5; 
+        break;
+    case TCG_MO_ST_ST:
+        bar_opcode = 0xa;
+        break;
+    default:
+        /* #StoreLoad | #LoadStore */
+        bar_opcode = 0xf;
+        break;
+    }
+    tcg_out32(s, MEMBAR | bar_opcode);
+}
+
 #ifdef CONFIG_SOFTMMU
 static tcg_insn_unit *qemu_ld_trampoline[16];
 static tcg_insn_unit *qemu_st_trampoline[16];
@@ -1450,6 +1470,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
 	tcg_out_arithc(s, a0, TCG_REG_G0, a1, const_args[1], c);
 	break;
 
+    case INDEX_op_mb:
+        tcg_out_mb(s, a0);
+        break;
+
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
@@ -1551,6 +1575,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
     { INDEX_op_qemu_st_i32, { "sZ", "A" } },
     { INDEX_op_qemu_st_i64, { "SZ", "A" } },
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 09/14] tcg/sparc: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 09/14] tcg/sparc: " Pranith Kumar
@ 2016-06-22 19:56   ` Sergey Fedorov
  0 siblings, 0 replies; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-22 19:56 UTC (permalink / raw)
  To: Pranith Kumar, Blue Swirl, Richard Henderson,
	open list:All patches CC here
  Cc: alex.bennee

On 18/06/16 07:03, Pranith Kumar wrote:
> Cc: Blue Swirl <blauwirbel@gmail.com>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  tcg/sparc/tcg-target.inc.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
> index 9938a50..af8a300 100644
> --- a/tcg/sparc/tcg-target.inc.c
> +++ b/tcg/sparc/tcg-target.inc.c
> @@ -249,6 +249,8 @@ static const int tcg_target_call_oarg_regs[] = {
>  #define STWA       (INSN_OP(3) | INSN_OP3(0x14))
>  #define STXA       (INSN_OP(3) | INSN_OP3(0x1e))
>  
> +#define MEMBAR     (INSN_OP(2) | INSN_OP3(0x28) | INSN_RS1(15) | (1 << 13))
> +
>  #ifndef ASI_PRIMARY_LITTLE
>  #define ASI_PRIMARY_LITTLE 0x88
>  #endif
> @@ -825,6 +827,24 @@ static void tcg_out_call(TCGContext *s, tcg_insn_unit *dest)
>      tcg_out_nop(s);
>  }
>  
> +static void tcg_out_mb(TCGContext *s, TCGArg a0)
> +{
> +    uint8_t bar_opcode;
> +    switch (a0 & TCG_MO_ALL) {
> +    case TCG_MO_LD_LD:
> +        bar_opcode = 0x5; 
> +        break;
> +    case TCG_MO_ST_ST:
> +        bar_opcode = 0xa;
> +        break;
> +    default:
> +        /* #StoreLoad | #LoadStore */
> +        bar_opcode = 0xf;
> +        break;
> +    }
> +    tcg_out32(s, MEMBAR | bar_opcode);

Why don't translate each TCG mb flag to corresponding Sparc MEMBAR mmask
bit exactly here?

Kind regards,
Sergey

> +}
> +
>  #ifdef CONFIG_SOFTMMU
>  static tcg_insn_unit *qemu_ld_trampoline[16];
>  static tcg_insn_unit *qemu_st_trampoline[16];
> @@ -1450,6 +1470,10 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc,
>  	tcg_out_arithc(s, a0, TCG_REG_G0, a1, const_args[1], c);
>  	break;
>  
> +    case INDEX_op_mb:
> +        tcg_out_mb(s, a0);
> +        break;
> +
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> @@ -1551,6 +1575,7 @@ static const TCGTargetOpDef sparc_op_defs[] = {
>      { INDEX_op_qemu_st_i32, { "sZ", "A" } },
>      { INDEX_op_qemu_st_i64, { "SZ", "A" } },
>  
> +    { INDEX_op_mb, { } },
>      { -1 },
>  };
>  

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: Add support for fence
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (8 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 09/14] tcg/sparc: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-22 19:57   ` Sergey Fedorov
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 11/14] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
                   ` (4 subsequent siblings)
  14 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Stefan Weil, Richard Henderson, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Cc: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 tcg/tci/tcg-target.inc.c | 3 +++
 tci.c                    | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index fa74d52..8e950df 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -255,6 +255,7 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
     { INDEX_op_bswap32_i32, { R, R } },
 #endif
 
+    { INDEX_op_mb, { } },
     { -1 },
 };
 
@@ -800,6 +801,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
         }
         tcg_out_i(s, *args++);
         break;
+    case INDEX_op_mb:
+        break;
     case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
     case INDEX_op_mov_i64:
     case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
diff --git a/tci.c b/tci.c
index b488c0d..4081e61 100644
--- a/tci.c
+++ b/tci.c
@@ -1236,6 +1236,9 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
                 tcg_abort();
             }
             break;
+        case INDEX_op_mb:
+            smp_mb();
+            break;
         default:
             TODO();
             break;
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: Add support for fence
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: " Pranith Kumar
@ 2016-06-22 19:57   ` Sergey Fedorov
  2016-06-22 20:25     ` Richard Henderson
  0 siblings, 1 reply; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-22 19:57 UTC (permalink / raw)
  To: Pranith Kumar, Stefan Weil, Richard Henderson,
	open list:All patches CC here
  Cc: alex.bennee

On 18/06/16 07:03, Pranith Kumar wrote:
> Cc: Stefan Weil <sw@weilnetz.de>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  tcg/tci/tcg-target.inc.c | 3 +++
>  tci.c                    | 3 +++
>  2 files changed, 6 insertions(+)
>
> diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
> index fa74d52..8e950df 100644
> --- a/tcg/tci/tcg-target.inc.c
> +++ b/tcg/tci/tcg-target.inc.c
> @@ -255,6 +255,7 @@ static const TCGTargetOpDef tcg_target_op_defs[] = {
>      { INDEX_op_bswap32_i32, { R, R } },
>  #endif
>  
> +    { INDEX_op_mb, { } },
>      { -1 },
>  };
>  
> @@ -800,6 +801,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
>          }
>          tcg_out_i(s, *args++);
>          break;
> +    case INDEX_op_mb:
> +        break;
>      case INDEX_op_mov_i32:  /* Always emitted via tcg_out_mov.  */
>      case INDEX_op_mov_i64:
>      case INDEX_op_movi_i32: /* Always emitted via tcg_out_movi.  */
> diff --git a/tci.c b/tci.c
> index b488c0d..4081e61 100644
> --- a/tci.c
> +++ b/tci.c
> @@ -1236,6 +1236,9 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
>                  tcg_abort();
>              }
>              break;
> +        case INDEX_op_mb:
> +            smp_mb();

I think we could distinguish smp_rmb(), smp_wmb() and smp_mb() here.

Kind regards,
Sergey

> +            break;
>          default:
>              TODO();
>              break;

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: Add support for fence
  2016-06-22 19:57   ` Sergey Fedorov
@ 2016-06-22 20:25     ` Richard Henderson
  2016-06-22 20:28       ` Sergey Fedorov
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Henderson @ 2016-06-22 20:25 UTC (permalink / raw)
  To: Sergey Fedorov, Pranith Kumar, Stefan Weil,
	open list:All patches CC here
  Cc: alex.bennee

On 06/22/2016 12:57 PM, Sergey Fedorov wrote:
>> > @@ -1236,6 +1236,9 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
>> >                  tcg_abort();
>> >              }
>> >              break;
>> > +        case INDEX_op_mb:
>> > +            smp_mb();
> I think we could distinguish smp_rmb(), smp_wmb() and smp_mb() here.

I don't.  The extra unpredictable branch is not likely to be worthwhile.

If we want anything finer grained for tci, would should revisit the complete
rewrite that I did a year to two ago.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: Add support for fence
  2016-06-22 20:25     ` Richard Henderson
@ 2016-06-22 20:28       ` Sergey Fedorov
  0 siblings, 0 replies; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-22 20:28 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, Stefan Weil,
	open list:All patches CC here
  Cc: alex.bennee

On 22/06/16 23:25, Richard Henderson wrote:
> On 06/22/2016 12:57 PM, Sergey Fedorov wrote:
>>>> @@ -1236,6 +1236,9 @@ uintptr_t tcg_qemu_tb_exec(CPUArchState *env, uint8_t *tb_ptr)
>>>>                  tcg_abort();
>>>>              }
>>>>              break;
>>>> +        case INDEX_op_mb:
>>>> +            smp_mb();
>> I think we could distinguish smp_rmb(), smp_wmb() and smp_mb() here.
> I don't.  The extra unpredictable branch is not likely to be worthwhile.
>
> If we want anything finer grained for tci, would should revisit the complete
> rewrite that I did a year to two ago.
>

Fair enough.

Thanks,
Sergey

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 11/14] target-arm: Generate fences in ARMv7 frontend
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (9 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: " Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 12/14] target-alpha: Generate fence op Pranith Kumar
                   ` (3 subsequent siblings)
  14 siblings, 0 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Peter Maydell, open list:ARM, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, rth

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 target-arm/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-arm/translate.c b/target-arm/translate.c
index e525f1e..012e450 100644
--- a/target-arm/translate.c
+++ b/target-arm/translate.c
@@ -8054,7 +8054,7 @@ static void disas_arm_insn(DisasContext *s, unsigned int insn)
             case 4: /* dsb */
             case 5: /* dmb */
                 ARCH(7);
-                /* We don't emulate caches so these are a no-op.  */
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
                 return;
             case 6: /* isb */
                 /* We need to break the TB after this insn to execute
@@ -10403,7 +10403,7 @@ static int disas_thumb2_insn(CPUARMState *env, DisasContext *s, uint16_t insn_hw
                             break;
                         case 4: /* dsb */
                         case 5: /* dmb */
-                            /* These execute as NOPs.  */
+                            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
                             break;
                         case 6: /* isb */
                             /* We need to break the TB after this insn
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 12/14] target-alpha: Generate fence op
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (10 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 11/14] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 13/14] aarch64: Generate fences for aarch64 Pranith Kumar
                   ` (2 subsequent siblings)
  14 siblings, 0 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Richard Henderson, open list:All patches CC here; +Cc: alex.bennee, serge.fdrv

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 target-alpha/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target-alpha/translate.c b/target-alpha/translate.c
index 76dab15..f0bba40 100644
--- a/target-alpha/translate.c
+++ b/target-alpha/translate.c
@@ -2334,11 +2334,11 @@ static ExitStatus translate_one(DisasContext *ctx, uint32_t insn)
             break;
         case 0x4000:
             /* MB */
-            /* No-op */
+            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
             break;
         case 0x4400:
             /* WMB */
-            /* No-op */
+            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
             break;
         case 0x8000:
             /* FETCH */
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 13/14] aarch64: Generate fences for aarch64
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (11 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 12/14] target-alpha: Generate fence op Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-24 16:17   ` Alex Bennée
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86 Pranith Kumar
  2016-06-18  4:08 ` [Qemu-devel] [RFC v3 PATCH 00/14] tcg: Add fence gen support Pranith Kumar
  14 siblings, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Peter Maydell, open list:ARM, open list:All patches CC here
  Cc: alex.bennee, serge.fdrv, rth

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 target-arm/translate-a64.c | 18 +++++++++++++-----
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
index ce8141a..fa24bf2 100644
--- a/target-arm/translate-a64.c
+++ b/target-arm/translate-a64.c
@@ -1250,7 +1250,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
         return;
     case 4: /* DSB */
     case 5: /* DMB */
-        /* We don't emulate caches so barriers are no-ops */
+        tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
         return;
     case 6: /* ISB */
         /* We need to break the TB after this insn to execute
@@ -1855,23 +1855,31 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
     }
     tcg_addr = read_cpu_reg_sp(s, rn, 1);
 
-    /* Note that since TCG is single threaded load-acquire/store-release
-     * semantics require no extra if (is_lasr) { ... } handling.
-     */
-
     if (is_excl) {
         if (!is_store) {
             s->is_ldex = true;
             gen_load_exclusive(s, rt, rt2, tcg_addr, size, is_pair);
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_ACQ);
+            }
         } else {
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_REL);
+            }
             gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, is_pair);
         }
     } else {
         TCGv_i64 tcg_rt = cpu_reg(s, rt);
         if (is_store) {
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_REL);
+            }
             do_gpr_st(s, tcg_rt, tcg_addr, size);
         } else {
             do_gpr_ld(s, tcg_rt, tcg_addr, size, false, false);
+            if (is_lasr) {
+                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_ACQ);
+            }
         }
     }
 }
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 13/14] aarch64: Generate fences for aarch64
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 13/14] aarch64: Generate fences for aarch64 Pranith Kumar
@ 2016-06-24 16:17   ` Alex Bennée
  0 siblings, 0 replies; 56+ messages in thread
From: Alex Bennée @ 2016-06-24 16:17 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Peter Maydell, open list:ARM, open list:All patches CC here,
	serge.fdrv, rth


Pranith Kumar <bobby.prani@gmail.com> writes:

> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  target-arm/translate-a64.c | 18 +++++++++++++-----
>  1 file changed, 13 insertions(+), 5 deletions(-)
>
> diff --git a/target-arm/translate-a64.c b/target-arm/translate-a64.c
> index ce8141a..fa24bf2 100644
> --- a/target-arm/translate-a64.c
> +++ b/target-arm/translate-a64.c
> @@ -1250,7 +1250,7 @@ static void handle_sync(DisasContext *s, uint32_t insn,
>          return;
>      case 4: /* DSB */
>      case 5: /* DMB */
> -        /* We don't emulate caches so barriers are no-ops */
> +        tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
>          return;
>      case 6: /* ISB */
>          /* We need to break the TB after this insn to execute
> @@ -1855,23 +1855,31 @@ static void disas_ldst_excl(DisasContext *s, uint32_t insn)
>      }
>      tcg_addr = read_cpu_reg_sp(s, rn, 1);
>
> -    /* Note that since TCG is single threaded load-acquire/store-release
> -     * semantics require no extra if (is_lasr) { ... } handling.
> -     */
> -
>      if (is_excl) {
>          if (!is_store) {
>              s->is_ldex = true;
>              gen_load_exclusive(s, rt, rt2, tcg_addr, size, is_pair);
> +            if (is_lasr) {
> +                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_ACQ);
> +            }
>          } else {
> +            if (is_lasr) {
> +                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_REL);
> +            }
>              gen_store_exclusive(s, rs, rt, rt2, tcg_addr, size, is_pair);
>          }
>      } else {
>          TCGv_i64 tcg_rt = cpu_reg(s, rt);
>          if (is_store) {
> +            if (is_lasr) {
> +                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_REL);
> +            }
>              do_gpr_st(s, tcg_rt, tcg_addr, size);
>          } else {
>              do_gpr_ld(s, tcg_rt, tcg_addr, size, false, false);
> +            if (is_lasr) {
> +                tcg_gen_mb(TCG_MO_ALL | TCG_BAR_ACQ);
> +            }

See the private email I sent you with the litmus tests. I think you'll
need to confirm this is working as expected.

>          }
>      }
>  }


--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (12 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 13/14] aarch64: Generate fences for aarch64 Pranith Kumar
@ 2016-06-18  4:03 ` Pranith Kumar
  2016-06-18  5:48   ` Richard Henderson
  2016-06-21  7:28   ` Paolo Bonzini
  2016-06-18  4:08 ` [Qemu-devel] [RFC v3 PATCH 00/14] tcg: Add fence gen support Pranith Kumar
  14 siblings, 2 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:03 UTC (permalink / raw)
  To: Paolo Bonzini, Richard Henderson, Eduardo Habkost,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
---
 target-i386/translate.c | 4 ++++
 1 file changed, 4 insertions(+)

diff --git a/target-i386/translate.c b/target-i386/translate.c
index bf33e6b..32b0f5c 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8012,13 +8012,17 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 || (prefixes & PREFIX_LOCK)) {
                 goto illegal_op;
             }
+            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
             break;
         case 0xe8 ... 0xef: /* lfence */
+            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
+            break;
         case 0xf0 ... 0xf7: /* mfence */
             if (!(s->cpuid_features & CPUID_SSE2)
                 || (prefixes & PREFIX_LOCK)) {
                 goto illegal_op;
             }
+            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
             break;
 
         default:
-- 
2.9.0

^ permalink raw reply related	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86 Pranith Kumar
@ 2016-06-18  5:48   ` Richard Henderson
  2016-06-20 15:05     ` Pranith Kumar
  2016-06-21  7:28   ` Paolo Bonzini
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Henderson @ 2016-06-18  5:48 UTC (permalink / raw)
  To: Pranith Kumar, Paolo Bonzini, Eduardo Habkost,
	open list:All patches CC here
  Cc: alex.bennee, serge.fdrv

On 06/17/2016 09:03 PM, Pranith Kumar wrote:
>          case 0xe8 ... 0xef: /* lfence */
> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
> +            break;
>          case 0xf0 ... 0xf7: /* mfence */
>              if (!(s->cpuid_features & CPUID_SSE2)
>                  || (prefixes & PREFIX_LOCK)) {
>                  goto illegal_op;
>              }
> +            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);

You need to duplicate the sse2 check if you split lfence from mfence.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-18  5:48   ` Richard Henderson
@ 2016-06-20 15:05     ` Pranith Kumar
  0 siblings, 0 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-20 15:05 UTC (permalink / raw)
  To: Richard Henderson
  Cc: Paolo Bonzini, Eduardo Habkost, open list:All patches CC here,
	Alex Bennée, Sergey Fedorov

On Sat, Jun 18, 2016 at 1:48 AM, Richard Henderson <rth@twiddle.net> wrote:
> On 06/17/2016 09:03 PM, Pranith Kumar wrote:
>>
>>          case 0xe8 ... 0xef: /* lfence */
>> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
>> +            break;
>>          case 0xf0 ... 0xf7: /* mfence */
>>              if (!(s->cpuid_features & CPUID_SSE2)
>>                  || (prefixes & PREFIX_LOCK)) {
>>                  goto illegal_op;
>>              }
>> +            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
>
>
> You need to duplicate the sse2 check if you split lfence from mfence.
>

Yes, I missed this one. I will fix it.

Thanks for the review.

-- 
Pranith

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86 Pranith Kumar
  2016-06-18  5:48   ` Richard Henderson
@ 2016-06-21  7:28   ` Paolo Bonzini
  2016-06-21 15:57     ` Richard Henderson
  2016-06-21 17:28     ` Pranith Kumar
  1 sibling, 2 replies; 56+ messages in thread
From: Paolo Bonzini @ 2016-06-21  7:28 UTC (permalink / raw)
  To: Pranith Kumar, Richard Henderson, Eduardo Habkost,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee



On 18/06/2016 06:03, Pranith Kumar wrote:
> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
> ---
>  target-i386/translate.c | 4 ++++
>  1 file changed, 4 insertions(+)
> 
> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index bf33e6b..32b0f5c 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -8012,13 +8012,17 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>                  || (prefixes & PREFIX_LOCK)) {
>                  goto illegal_op;
>              }
> +            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
>              break;
>          case 0xe8 ... 0xef: /* lfence */
> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
> +            break;

These are unnecessary.  On the other hand, _each and every load_ must be
followed by a LD_LD | LD_ST barrier, and each and every store must be
preceded by a LD_ST | ST_ST barrier.

Paolo

>          case 0xf0 ... 0xf7: /* mfence */
>              if (!(s->cpuid_features & CPUID_SSE2)
>                  || (prefixes & PREFIX_LOCK)) {
>                  goto illegal_op;
>              }
> +            tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC);
>              break;
>  
>          default:
> 

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21  7:28   ` Paolo Bonzini
@ 2016-06-21 15:57     ` Richard Henderson
  2016-06-21 16:12       ` Paolo Bonzini
  2016-06-21 17:28     ` Pranith Kumar
  1 sibling, 1 reply; 56+ messages in thread
From: Richard Henderson @ 2016-06-21 15:57 UTC (permalink / raw)
  To: Paolo Bonzini, Pranith Kumar, Eduardo Habkost,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 06/21/2016 12:28 AM, Paolo Bonzini wrote:
>
>
> On 18/06/2016 06:03, Pranith Kumar wrote:
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>>  target-i386/translate.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/target-i386/translate.c b/target-i386/translate.c
>> index bf33e6b..32b0f5c 100644
>> --- a/target-i386/translate.c
>> +++ b/target-i386/translate.c
>> @@ -8012,13 +8012,17 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>>                  || (prefixes & PREFIX_LOCK)) {
>>                  goto illegal_op;
>>              }
>> +            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
>>              break;
>>          case 0xe8 ... 0xef: /* lfence */
>> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
>> +            break;
>
> These are unnecessary.  On the other hand, _each and every load_ must be
> followed by a LD_LD | LD_ST barrier, and each and every store must be
> preceded by a LD_ST | ST_ST barrier.

They're not unnecessary if we (1) add those barriers for normal loads and 
stores and (2) omit them from the non-temporal loads and stores.

In the meantime they're good documentation.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21 15:57     ` Richard Henderson
@ 2016-06-21 16:12       ` Paolo Bonzini
  2016-06-21 16:23         ` Richard Henderson
  0 siblings, 1 reply; 56+ messages in thread
From: Paolo Bonzini @ 2016-06-21 16:12 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, Eduardo Habkost,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee



On 21/06/2016 17:57, Richard Henderson wrote:
>>>
>>>                  || (prefixes & PREFIX_LOCK)) {
>>>                  goto illegal_op;
>>>              }
>>> +            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
>>>              break;
>>>          case 0xe8 ... 0xef: /* lfence */
>>> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
>>> +            break;
>>
>> These are unnecessary.  On the other hand, _each and every load_ must be
>> followed by a LD_LD | LD_ST barrier, and each and every store must be
>> preceded by a LD_ST | ST_ST barrier.
> 
> They're not unnecessary if we (1) add those barriers for normal loads
> and stores and (2) omit them from the non-temporal loads and stores.

When does TCG generate non-temporal loads and stores?

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21 16:12       ` Paolo Bonzini
@ 2016-06-21 16:23         ` Richard Henderson
  2016-06-21 16:33           ` Paolo Bonzini
  0 siblings, 1 reply; 56+ messages in thread
From: Richard Henderson @ 2016-06-21 16:23 UTC (permalink / raw)
  To: Paolo Bonzini, Pranith Kumar, Eduardo Habkost,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee

On 06/21/2016 09:12 AM, Paolo Bonzini wrote:
>
>
> On 21/06/2016 17:57, Richard Henderson wrote:
>>>>
>>>>                  || (prefixes & PREFIX_LOCK)) {
>>>>                  goto illegal_op;
>>>>              }
>>>> +            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
>>>>              break;
>>>>          case 0xe8 ... 0xef: /* lfence */
>>>> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
>>>> +            break;
>>>
>>> These are unnecessary.  On the other hand, _each and every load_ must be
>>> followed by a LD_LD | LD_ST barrier, and each and every store must be
>>> preceded by a LD_ST | ST_ST barrier.
>>
>> They're not unnecessary if we (1) add those barriers for normal loads
>> and stores and (2) omit them from the non-temporal loads and stores.
>
> When does TCG generate non-temporal loads and stores?

I was talking about the guest non-temporal loads and stores.


r~

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21 16:23         ` Richard Henderson
@ 2016-06-21 16:33           ` Paolo Bonzini
  0 siblings, 0 replies; 56+ messages in thread
From: Paolo Bonzini @ 2016-06-21 16:33 UTC (permalink / raw)
  To: Richard Henderson, Pranith Kumar, Eduardo Habkost,
	open list:All patches CC here
  Cc: serge.fdrv, alex.bennee



On 21/06/2016 18:23, Richard Henderson wrote:
> On 06/21/2016 09:12 AM, Paolo Bonzini wrote:
>>
>>
>> On 21/06/2016 17:57, Richard Henderson wrote:
>>>>>
>>>>>                  || (prefixes & PREFIX_LOCK)) {
>>>>>                  goto illegal_op;
>>>>>              }
>>>>> +            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
>>>>>              break;
>>>>>          case 0xe8 ... 0xef: /* lfence */
>>>>> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
>>>>> +            break;
>>>>
>>>> These are unnecessary.  On the other hand, _each and every load_
>>>> must be
>>>> followed by a LD_LD | LD_ST barrier, and each and every store must be
>>>> preceded by a LD_ST | ST_ST barrier.
>>>
>>> They're not unnecessary if we (1) add those barriers for normal loads
>>> and stores and (2) omit them from the non-temporal loads and stores.
>>
>> When does TCG generate non-temporal loads and stores?
> 
> I was talking about the guest non-temporal loads and stores.

Oh, you're right---sorry, I confused target-i386 and tcg/i386.

Thanks,

Paolo

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21  7:28   ` Paolo Bonzini
  2016-06-21 15:57     ` Richard Henderson
@ 2016-06-21 17:28     ` Pranith Kumar
  2016-06-21 17:54       ` Peter Maydell
  1 sibling, 1 reply; 56+ messages in thread
From: Pranith Kumar @ 2016-06-21 17:28 UTC (permalink / raw)
  To: Paolo Bonzini
  Cc: Richard Henderson, Eduardo Habkost, open list:All patches CC here,
	Sergey Fedorov, Alex Bennée

On Tue, Jun 21, 2016 at 3:28 AM, Paolo Bonzini <pbonzini@redhat.com> wrote:
>
>
> On 18/06/2016 06:03, Pranith Kumar wrote:
>> Signed-off-by: Pranith Kumar <bobby.prani@gmail.com>
>> ---
>>  target-i386/translate.c | 4 ++++
>>  1 file changed, 4 insertions(+)
>>
>> diff --git a/target-i386/translate.c b/target-i386/translate.c
>> index bf33e6b..32b0f5c 100644
>> --- a/target-i386/translate.c
>> +++ b/target-i386/translate.c
>> @@ -8012,13 +8012,17 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
>>                  || (prefixes & PREFIX_LOCK)) {
>>                  goto illegal_op;
>>              }
>> +            tcg_gen_mb(TCG_MO_ST_ST | TCG_BAR_SC);
>>              break;
>>          case 0xe8 ... 0xef: /* lfence */
>> +            tcg_gen_mb(TCG_MO_LD_LD | TCG_BAR_SC);
>> +            break;
>
> These are unnecessary.  On the other hand, _each and every load_ must be
> followed by a LD_LD | LD_ST barrier, and each and every store must be
> preceded by a LD_ST | ST_ST barrier.
>

Reg. the second point, I did consider this situation of running x86 on
ARM where such barriers are necessary for correctness. But, I am
really apprehensive of the cost it will impose. I am not sure if there
are any alternative solutions to avoid generating barriers for each
memory operation, but it would be great if we could reduce them.

Thanks,
-- 
Pranith

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21 17:28     ` Pranith Kumar
@ 2016-06-21 17:54       ` Peter Maydell
  2016-06-21 18:03         ` Pranith Kumar
  0 siblings, 1 reply; 56+ messages in thread
From: Peter Maydell @ 2016-06-21 17:54 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Paolo Bonzini, open list:All patches CC here, Sergey Fedorov,
	Alex Bennée, Eduardo Habkost, Richard Henderson

On 21 June 2016 at 18:28, Pranith Kumar <bobby.prani@gmail.com> wrote:
> Reg. the second point, I did consider this situation of running x86 on
> ARM where such barriers are necessary for correctness. But, I am
> really apprehensive of the cost it will impose. I am not sure if there
> are any alternative solutions to avoid generating barriers for each
> memory operation, but it would be great if we could reduce them.

I vaguely recall an idea that you could avoid needing
explicit barriers by turning all the guest load/stores into
host load-acquire/store-release, but I have no idea whether
that's (a) actually true (b) any better than piles of
explicit barriers.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21 17:54       ` Peter Maydell
@ 2016-06-21 18:03         ` Pranith Kumar
  2016-06-21 18:25           ` Alex Bennée
  2016-06-22 11:18           ` Sergey Fedorov
  0 siblings, 2 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-21 18:03 UTC (permalink / raw)
  To: Peter Maydell
  Cc: Paolo Bonzini, open list:All patches CC here, Sergey Fedorov,
	Alex Bennée, Eduardo Habkost, Richard Henderson

On Tue, Jun 21, 2016 at 1:54 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
> On 21 June 2016 at 18:28, Pranith Kumar <bobby.prani@gmail.com> wrote:
>> Reg. the second point, I did consider this situation of running x86 on
>> ARM where such barriers are necessary for correctness. But, I am
>> really apprehensive of the cost it will impose. I am not sure if there
>> are any alternative solutions to avoid generating barriers for each
>> memory operation, but it would be great if we could reduce them.
>
> I vaguely recall an idea that you could avoid needing
> explicit barriers by turning all the guest load/stores into
> host load-acquire/store-release, but I have no idea whether
> that's (a) actually true (b) any better than piles of
> explicit barriers.

Yes, this is true for ARMv8(not sure about ia64). The
load-acquire/store-release operations are sequentially consistent to
each other. But this does not work for ARMv7 and as you said... I
think the cost here too is really prohibitive.

-- 
Pranith

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21 18:03         ` Pranith Kumar
@ 2016-06-21 18:25           ` Alex Bennée
  2016-06-22 11:18           ` Sergey Fedorov
  1 sibling, 0 replies; 56+ messages in thread
From: Alex Bennée @ 2016-06-21 18:25 UTC (permalink / raw)
  To: Pranith Kumar
  Cc: Peter Maydell, Paolo Bonzini, open list:All patches CC here,
	Sergey Fedorov, Eduardo Habkost, Richard Henderson


Pranith Kumar <bobby.prani@gmail.com> writes:

> On Tue, Jun 21, 2016 at 1:54 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 21 June 2016 at 18:28, Pranith Kumar <bobby.prani@gmail.com> wrote:
>>> Reg. the second point, I did consider this situation of running x86 on
>>> ARM where such barriers are necessary for correctness. But, I am
>>> really apprehensive of the cost it will impose. I am not sure if there
>>> are any alternative solutions to avoid generating barriers for each
>>> memory operation, but it would be great if we could reduce them.
>>
>> I vaguely recall an idea that you could avoid needing
>> explicit barriers by turning all the guest load/stores into
>> host load-acquire/store-release, but I have no idea whether
>> that's (a) actually true (b) any better than piles of
>> explicit barriers.
>
> Yes, this is true for ARMv8(not sure about ia64). The
> load-acquire/store-release operations are sequentially consistent to
> each other. But this does not work for ARMv7 and as you said... I
> think the cost here too is really prohibitive.

If the cost ends up being too prohibitive (as in -smp N runs slower and
slower) then we may just keep -accel tcg,thread=single the default for
that combination.

We need some hard numbers either way.

--
Alex Bennée

^ permalink raw reply	[flat|nested] 56+ messages in thread

* Re: [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86
  2016-06-21 18:03         ` Pranith Kumar
  2016-06-21 18:25           ` Alex Bennée
@ 2016-06-22 11:18           ` Sergey Fedorov
  1 sibling, 0 replies; 56+ messages in thread
From: Sergey Fedorov @ 2016-06-22 11:18 UTC (permalink / raw)
  To: Pranith Kumar, Peter Maydell
  Cc: Paolo Bonzini, open list:All patches CC here, Alex Bennée,
	Eduardo Habkost, Richard Henderson

On 21/06/16 21:03, Pranith Kumar wrote:
> On Tue, Jun 21, 2016 at 1:54 PM, Peter Maydell <peter.maydell@linaro.org> wrote:
>> On 21 June 2016 at 18:28, Pranith Kumar <bobby.prani@gmail.com> wrote:
>>> Reg. the second point, I did consider this situation of running x86 on
>>> ARM where such barriers are necessary for correctness. But, I am
>>> really apprehensive of the cost it will impose. I am not sure if there
>>> are any alternative solutions to avoid generating barriers for each
>>> memory operation, but it would be great if we could reduce them.
>> I vaguely recall an idea that you could avoid needing
>> explicit barriers by turning all the guest load/stores into
>> host load-acquire/store-release, but I have no idea whether
>> that's (a) actually true (b) any better than piles of
>> explicit barriers.
> Yes, this is true for ARMv8(not sure about ia64). The
> load-acquire/store-release operations are sequentially consistent to
> each other. But this does not work for ARMv7 and as you said... I
> think the cost here too is really prohibitive.

As I understand, there's no requirement for sequential consistency even
on a systems with pretty strong memory model such as x86. Due to the
presence of store queue, earlier regular stores are allowed to be
completed after the following regular loads. This relaxation breaks
sequential consistency requirement, if I understand correctly, since it
allows a CPU to see its own stores with respect to other CPU stores in
different order. However, such a model can perfectly match
acquire/release semantics, even as it is defined by Itanium memory
model. Lets break it down:
(1) if a load-acquire must not be reordered with any subsequent loads
and stores,
(2) and if a store-release must not be reordered with any preceding
loads and stores,
(3) thus if all loads are load-acquires and all stores are
store-releases, then the only possible reordering can be a store-release
reordered after the subsequent load-acquire.

Considering this, I think that strongly-ordered memory model semantics
such (as in x86 memory model) can be translated directly into relaxed
acquire/release memory model semantics (as in Itanium memory model or a
bit more strong ARMv8). And I believe this can perform better than
inserting separate memory barriers on those architectures which provide
acquire/release semantics since it is more relaxed and permit certain
hardware optimizations like store-after-load reordering.

Kind regards,
Sergey

^ permalink raw reply	[flat|nested] 56+ messages in thread

* [Qemu-devel] [RFC v3 PATCH 00/14] tcg: Add fence gen support
       [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
                   ` (13 preceding siblings ...)
  2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86 Pranith Kumar
@ 2016-06-18  4:08 ` Pranith Kumar
  14 siblings, 0 replies; 56+ messages in thread
From: Pranith Kumar @ 2016-06-18  4:08 UTC (permalink / raw)
  Cc: alex.bennee, serge.fdrv, rth, qemu-devel

Hello,

The following series adds fence instruction generation support to
TCG. Based on feedback to the last series, I added the four
combinations of orderings modeled after Sparc membar.

This has been tested and confirmed to fix ordering issues on
x86/armv7/aarch64 hosts with MTTCG enabled for an ARMv7 guest using
KVM unit tests.

TODO:

* The acquire/release order is not utilized yet. Currently we generate
  SC barriers even for acquire/release barriers. The idea is to write
  a pass which combines acquire/release barrier with its corresponding
  load/store operation to generate the load acquire/store release
  instruction on hosts which have such instruction(aarch64 for
  now). Also the pass should try to optimize some trivial scenarios
  like removing unnecessary barrier instructions.

* Complete support for fence generation in other architectures.

v3:

- Create different types of barriers. The barrier tcg opcode now takes
  an argument to generate the appropriate barrier instruction.
- Also add acquire/release/sc ordering flag to argument.

v2:

- Rebase on Richard's patches generating fences for other
  architectures.

v1:

- Initial version: Introduce memory barrier tcg opcode.

Pranith Kumar (14):
  Introduce TCGOpcode for memory barrier
  tcg/i386: Add support for fence
  tcg/aarch64: Add support for fence
  tcg/arm: Add support for fence
  tcg/ia64: Add support for fence
  tcg/mips: Add support for fence
  tcg/ppc: Add support for fence
  tcg/s390: Add support for fence
  tcg/sparc: Add support for fence
  tcg/tci: Add support for fence
  target-arm: Generate fences in ARMv7 frontend
  target-alpha: Generate fence op
  aarch64: Generate fences for aarch64
  target-i386: Generate fences for x86

 target-alpha/translate.c     |  4 ++--
 target-arm/translate-a64.c   | 18 ++++++++++++-----
 target-arm/translate.c       |  4 ++--
 target-i386/translate.c      |  4 ++++
 tcg/README                   | 17 ++++++++++++++++
 tcg/aarch64/tcg-target.inc.c | 25 +++++++++++++++++++++++
 tcg/arm/tcg-target.inc.c     | 18 +++++++++++++++++
 tcg/i386/tcg-target.inc.c    | 47 ++++++++++++++++++++++++++++++++++++++++++++
 tcg/ia64/tcg-target.inc.c    |  5 +++++
 tcg/mips/tcg-target.inc.c    |  6 ++++++
 tcg/ppc/tcg-target.inc.c     | 24 ++++++++++++++++++++++
 tcg/s390/tcg-target.inc.c    |  9 +++++++++
 tcg/sparc/tcg-target.inc.c   | 25 +++++++++++++++++++++++
 tcg/tcg-op.c                 | 11 +++++++++++
 tcg/tcg-op.h                 |  2 ++
 tcg/tcg-opc.h                |  2 ++
 tcg/tcg.h                    | 14 +++++++++++++
 tcg/tci/tcg-target.inc.c     |  3 +++
 tci.c                        |  3 +++
 19 files changed, 232 insertions(+), 9 deletions(-)

-- 
2.9.0

^ permalink raw reply	[flat|nested] 56+ messages in thread

end of thread, other threads:[~2016-06-24 16:17 UTC | newest]

Thread overview: 56+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20160618040343.19517-1-bobby.prani@gmail.com>
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 01/14] Introduce TCGOpcode for memory barrier Pranith Kumar
2016-06-20 21:21   ` Sergey Fedorov
2016-06-21 14:52     ` Pranith Kumar
2016-06-21 15:09       ` Alex Bennée
2016-06-21 18:06         ` Pranith Kumar
2016-06-22 15:50       ` Sergey Fedorov
2016-06-21  7:30   ` Paolo Bonzini
2016-06-21 18:04   ` Alex Bennée
2016-06-21 18:09     ` Pranith Kumar
2016-06-21 18:23       ` Alex Bennée
2016-06-21 19:40         ` Richard Henderson
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 02/14] tcg/i386: Add support for fence Pranith Kumar
2016-06-21  7:24   ` Paolo Bonzini
2016-06-22 16:25   ` Alex Bennée
2016-06-22 16:49     ` Richard Henderson
2016-06-22 18:18       ` Alex Bennée
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 03/14] tcg/aarch64: " Pranith Kumar
2016-06-23 16:18   ` Alex Bennée
2016-06-23 16:50     ` Richard Henderson
2016-06-23 19:58       ` Alex Bennée
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 04/14] tcg/arm: " Pranith Kumar
2016-06-23 16:30   ` Alex Bennée
2016-06-23 16:49     ` Richard Henderson
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 05/14] tcg/ia64: " Pranith Kumar
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 06/14] tcg/mips: " Pranith Kumar
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 07/14] tcg/ppc: " Pranith Kumar
2016-06-22 19:50   ` Sergey Fedorov
2016-06-22 20:21     ` Richard Henderson
2016-06-22 20:27       ` Sergey Fedorov
2016-06-23 14:42     ` Sergey Fedorov
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 08/14] tcg/s390: " Pranith Kumar
2016-06-21  7:26   ` Paolo Bonzini
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 09/14] tcg/sparc: " Pranith Kumar
2016-06-22 19:56   ` Sergey Fedorov
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 10/14] tcg/tci: " Pranith Kumar
2016-06-22 19:57   ` Sergey Fedorov
2016-06-22 20:25     ` Richard Henderson
2016-06-22 20:28       ` Sergey Fedorov
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 11/14] target-arm: Generate fences in ARMv7 frontend Pranith Kumar
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 12/14] target-alpha: Generate fence op Pranith Kumar
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 13/14] aarch64: Generate fences for aarch64 Pranith Kumar
2016-06-24 16:17   ` Alex Bennée
2016-06-18  4:03 ` [Qemu-devel] [RFC v3 PATCH 14/14] target-i386: Generate fences for x86 Pranith Kumar
2016-06-18  5:48   ` Richard Henderson
2016-06-20 15:05     ` Pranith Kumar
2016-06-21  7:28   ` Paolo Bonzini
2016-06-21 15:57     ` Richard Henderson
2016-06-21 16:12       ` Paolo Bonzini
2016-06-21 16:23         ` Richard Henderson
2016-06-21 16:33           ` Paolo Bonzini
2016-06-21 17:28     ` Pranith Kumar
2016-06-21 17:54       ` Peter Maydell
2016-06-21 18:03         ` Pranith Kumar
2016-06-21 18:25           ` Alex Bennée
2016-06-22 11:18           ` Sergey Fedorov
2016-06-18  4:08 ` [Qemu-devel] [RFC v3 PATCH 00/14] tcg: Add fence gen support Pranith Kumar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).