[Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations
@ 2018-06-26 16:19 Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
                   ` (13 more replies)
  0 siblings, 14 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

In another patch set this week, I had noticed the old linux-user
do_store_exclusive code was still present.  I had thought that was
dead code that simply hadn't been removed, but it turned out that
we had not completed the transition to tcg atomics for linux-user.

In the process, I discovered that we weren't using atomic operations
for the 128-bit lq, lqarx, and stqcx insns.  These would have simply
produced incorrect results for -smp in system mode.

I tidy the code a bit by making use of MO_ALIGN, which means that
we don't need a separate explicit alignment check.

I use the new min/max atomic operations I added recently for
ARMv8.2-Atomics and RISC-V.

Finally, Power9 has some *really* odd atomic operations in its
l[wd]at and st[wd]at instructions.  We were generating illegal
instruction for these.  I implement them for serial context and
force parallel context to grab the exclusive lock and try again.

Except for the trivial linux-user ll/sc case, I do not have any
code that exercises these instructions.  Perhaps the IBM folk
have something that can test the others?

r~

Richard Henderson (13):
  target/ppc: Add do_unaligned_access hook
  target/ppc: Use atomic load for LQ and LQARX
  target/ppc: Use atomic store for STQ
  target/ppc: Use atomic cmpxchg for STQCX
  target/ppc: Remove POWERPC_EXCP_STCX
  target/ppc: Tidy gen_conditional_store
  target/ppc: Split out gen_load_locked
  target/ppc: Split out gen_ld_atomic
  target/ppc: Split out gen_st_atomic
  target/ppc: Use MO_ALIGN for EXIWX and ECOWX
  target/ppc: Use atomic min/max helpers
  target/ppc: Implement the rest of gen_ld_atomic
  target/ppc: Implement the rest of gen_st_atomic

 target/ppc/cpu.h                |   8 +-
 target/ppc/helper.h             |  11 +
 target/ppc/internal.h           |   5 +
 linux-user/ppc/cpu_loop.c       | 123 ++----
 target/ppc/excp_helper.c        |  18 +-
 target/ppc/mem_helper.c         |  72 +++-
 target/ppc/translate.c          | 648 ++++++++++++++++++++------------
 target/ppc/translate_init.inc.c |   1 +
 8 files changed, 539 insertions(+), 347 deletions(-)

-- 
2.17.1

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-27  9:09   ` David Gibson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
                   ` (12 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

This allows faults from MO_ALIGN to have the same effect
as from gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/internal.h           |  5 +++++
 target/ppc/excp_helper.c        | 18 +++++++++++++++++-
 target/ppc/translate_init.inc.c |  1 +
 3 files changed, 23 insertions(+), 1 deletion(-)

diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 1f441c6483..a9bcadff42 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -252,4 +252,9 @@ static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
 void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
 void helper_compute_fprf_float32(CPUPPCState *env, float32 arg);
 void helper_compute_fprf_float128(CPUPPCState *env, float128 arg);
+
+/* Raise a data fault alignment exception for the specified virtual address */
+void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
+                                 MMUAccessType access_type,
+                                 int mmu_idx, uintptr_t retaddr);
 #endif /* PPC_INTERNAL_H */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index c092fbead0..d6e97a90e0 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -22,7 +22,7 @@
 #include "exec/helper-proto.h"
 #include "exec/exec-all.h"
 #include "exec/cpu_ldst.h"
-
+#include "internal.h"
 #include "helper_regs.h"
 
 //#define DEBUG_OP
@@ -1198,3 +1198,19 @@ void helper_book3s_msgsnd(target_ulong rb)
     qemu_mutex_unlock_iothread();
 }
 #endif
+
+void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
+                                 MMUAccessType access_type,
+                                 int mmu_idx, uintptr_t retaddr)
+{
+    CPUPPCState *env = cs->env_ptr;
+    uint32_t insn;
+
+    /* Restore state and reload the insn we executed, for filling in DSISR.  */
+    cpu_restore_state(cs, retaddr, true);
+    insn = cpu_ldl_code(env, env->nip);
+
+    cs->exception_index = POWERPC_EXCP_ALIGN;
+    env->error_code = insn & 0x03FF0000;
+    cpu_loop_exit(cs);
+}
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 76d6f3fd5e..7813b1b004 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -10457,6 +10457,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
     cc->set_pc = ppc_cpu_set_pc;
     cc->gdb_read_register = ppc_cpu_gdb_read_register;
     cc->gdb_write_register = ppc_cpu_gdb_write_register;
+    cc->do_unaligned_access = ppc_cpu_do_unaligned_access;
 #ifdef CONFIG_USER_ONLY
     cc->handle_mmu_fault = ppc_cpu_handle_mmu_fault;
 #else
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
@ 2018-06-27  9:09   ` David Gibson
  2018-06-27 13:52     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2018-06-27  9:09 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 3241 bytes --]

On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
> This allows faults from MO_ALIGN to have the same effect
> as from gen_check_align.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

So, most powerpc cpus can handle most unaligned accesses without an
exception.  I'm assuming this series won't preclude that?

> ---
>  target/ppc/internal.h           |  5 +++++
>  target/ppc/excp_helper.c        | 18 +++++++++++++++++-
>  target/ppc/translate_init.inc.c |  1 +
>  3 files changed, 23 insertions(+), 1 deletion(-)
> 
> diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> index 1f441c6483..a9bcadff42 100644
> --- a/target/ppc/internal.h
> +++ b/target/ppc/internal.h
> @@ -252,4 +252,9 @@ static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
>  void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
>  void helper_compute_fprf_float32(CPUPPCState *env, float32 arg);
>  void helper_compute_fprf_float128(CPUPPCState *env, float128 arg);
> +
> +/* Raise a data fault alignment exception for the specified virtual address */
> +void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
> +                                 MMUAccessType access_type,
> +                                 int mmu_idx, uintptr_t retaddr);
>  #endif /* PPC_INTERNAL_H */
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index c092fbead0..d6e97a90e0 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -22,7 +22,7 @@
>  #include "exec/helper-proto.h"
>  #include "exec/exec-all.h"
>  #include "exec/cpu_ldst.h"
> -
> +#include "internal.h"
>  #include "helper_regs.h"
>  
>  //#define DEBUG_OP
> @@ -1198,3 +1198,19 @@ void helper_book3s_msgsnd(target_ulong rb)
>      qemu_mutex_unlock_iothread();
>  }
>  #endif
> +
> +void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
> +                                 MMUAccessType access_type,
> +                                 int mmu_idx, uintptr_t retaddr)
> +{
> +    CPUPPCState *env = cs->env_ptr;
> +    uint32_t insn;
> +
> +    /* Restore state and reload the insn we executed, for filling in DSISR.  */
> +    cpu_restore_state(cs, retaddr, true);
> +    insn = cpu_ldl_code(env, env->nip);
> +
> +    cs->exception_index = POWERPC_EXCP_ALIGN;
> +    env->error_code = insn & 0x03FF0000;
> +    cpu_loop_exit(cs);
> +}
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 76d6f3fd5e..7813b1b004 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -10457,6 +10457,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
>      cc->set_pc = ppc_cpu_set_pc;
>      cc->gdb_read_register = ppc_cpu_gdb_read_register;
>      cc->gdb_write_register = ppc_cpu_gdb_write_register;
> +    cc->do_unaligned_access = ppc_cpu_do_unaligned_access;
>  #ifdef CONFIG_USER_ONLY
>      cc->handle_mmu_fault = ppc_cpu_handle_mmu_fault;
>  #else

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-27  9:09   ` David Gibson
@ 2018-06-27 13:52     ` Richard Henderson
  2018-06-28  3:46       ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-27 13:52 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc

On 06/27/2018 02:09 AM, David Gibson wrote:
> On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
>> This allows faults from MO_ALIGN to have the same effect
>> as from gen_check_align.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> So, most powerpc cpus can handle most unaligned accesses without an
> exception.  I'm assuming this series won't preclude that?

Correct.  This hook will only fire when using MO_ALIGN to
request an alignment check.


r~

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
  2018-06-27 13:52     ` Richard Henderson
@ 2018-06-28  3:46       ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-28  3:46 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 892 bytes --]

On Wed, Jun 27, 2018 at 06:52:49AM -0700, Richard Henderson wrote:
> On 06/27/2018 02:09 AM, David Gibson wrote:
> > On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
> >> This allows faults from MO_ALIGN to have the same effect
> >> as from gen_check_align.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> > 
> > So, most powerpc cpus can handle most unaligned accesses without an
> > exception.  I'm assuming this series won't preclude that?
> 
> Correct.  This hook will only fire when using MO_ALIGN to
> request an alignment check.

Thanks for the confirmation, first patch applied to ppc-for-3.0.
Continuing to look at the rest.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-28  3:49   ` David Gibson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
                   ` (11 subsequent siblings)
  13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Section 1.4 of the Power ISA v3.0B states that both of these
instructions are single-copy atomic.  As we cannot (yet) issue
128-bit loads within TCG, use the generic helpers provided.

Since TCG cannot (yet) return a 128-bit value, add a slot within
CPUPPCState for returning the high half of a 128-bit return value.
This solution is preferred to the helper assigning to architectural
registers directly, as it avoids clobbering all TCG live values.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h        |  3 ++
 target/ppc/helper.h     |  5 +++
 target/ppc/mem_helper.c | 20 ++++++++-
 target/ppc/translate.c  | 93 ++++++++++++++++++++++++++++++-----------
 4 files changed, 95 insertions(+), 26 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c7f3fb6b73..973cf44cda 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1015,6 +1015,9 @@ struct CPUPPCState {
     /* Next instruction pointer */
     target_ulong nip;
 
+    /* High part of 128-bit helper return.  */
+    uint64_t retxh;
+
     int access_type; /* when a memory exception occurs, the access
                         type is stored here */
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index d751f0e219..3f451a5d7e 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -799,3 +799,8 @@ DEF_HELPER_4(dscliq, void, env, fprp, fprp, i32)
 
 DEF_HELPER_1(tbegin, void, env)
 DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
+
+#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
+DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+#endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index a34e604db3..44a8f3445a 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -21,9 +21,9 @@
 #include "exec/exec-all.h"
 #include "qemu/host-utils.h"
 #include "exec/helper-proto.h"
-
 #include "helper_regs.h"
 #include "exec/cpu_ldst.h"
+#include "tcg.h"
 #include "internal.h"
 
 //#define DEBUG_OP
@@ -215,6 +215,24 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg,
     return i;
 }
 
+#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
+uint64_t helper_lq_le_parallel(CPUPPCState *env, target_ulong addr,
+                               uint32_t opidx)
+{
+    Int128 ret = helper_atomic_ldo_le_mmu(env, addr, opidx, GETPC());
+    env->retxh = int128_gethi(ret);
+    return int128_getlo(ret);
+}
+
+uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
+                               uint32_t opidx)
+{
+    Int128 ret = helper_atomic_ldo_be_mmu(env, addr, opidx, GETPC());
+    env->retxh = int128_gethi(ret);
+    return int128_getlo(ret);
+}
+#endif
+
 /*****************************************************************************/
 /* Altivec extension helpers */
 #if defined(HOST_WORDS_BIGENDIAN)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3a215a1dc6..0923cc24e3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2607,7 +2607,7 @@ static void gen_ld(DisasContext *ctx)
 static void gen_lq(DisasContext *ctx)
 {
     int ra, rd;
-    TCGv EA;
+    TCGv EA, hi, lo;
 
     /* lq is a legal user mode instruction starting in ISA 2.07 */
     bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
@@ -2633,16 +2633,35 @@ static void gen_lq(DisasContext *ctx)
     EA = tcg_temp_new();
     gen_addr_imm_index(ctx, EA, 0x0F);
 
-    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
-       necessary 64-bit byteswap already. */
-    if (unlikely(ctx->le_mode)) {
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
+    /* Note that the low part is always in RD+1, even in LE mode.  */
+    lo = cpu_gpr[rd + 1];
+    hi = cpu_gpr[rd];
+
+    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+        TCGv_i32 oi = tcg_temp_new_i32();
+        if (ctx->le_mode) {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
+            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
+        } else {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
+            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
+        }
+        tcg_temp_free_i32(oi);
+        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
+#else
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
+#endif
+    } else if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ);
         gen_addr_add(ctx, EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
     } else {
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ);
         gen_addr_add(ctx, EA, EA, 8);
-        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
     }
     tcg_temp_free(EA);
 }
@@ -3236,9 +3255,8 @@ STCX(stdcx_, DEF_MEMOP(MO_Q))
 /* lqarx */
 static void gen_lqarx(DisasContext *ctx)
 {
-    TCGv EA;
     int rd = rD(ctx->opcode);
-    TCGv gpr1, gpr2;
+    TCGv EA, hi, lo;
 
     if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) ||
                  (rd == rB(ctx->opcode)))) {
@@ -3247,24 +3265,49 @@ static void gen_lqarx(DisasContext *ctx)
     }
 
     gen_set_access_type(ctx, ACCESS_RES);
-    EA = tcg_temp_local_new();
+    EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    gen_check_align(ctx, EA, 15);
-    if (unlikely(ctx->le_mode)) {
-        gpr1 = cpu_gpr[rd+1];
-        gpr2 = cpu_gpr[rd];
-    } else {
-        gpr1 = cpu_gpr[rd];
-        gpr2 = cpu_gpr[rd+1];
-    }
-    tcg_gen_qemu_ld_i64(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
-    tcg_gen_mov_tl(cpu_reserve, EA);
-    gen_addr_add(ctx, EA, EA, 8);
-    tcg_gen_qemu_ld_i64(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
 
-    tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
-    tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
+    /* Note that the low part is always in RD+1, even in LE mode.  */
+    lo = cpu_gpr[rd + 1];
+    hi = cpu_gpr[rd];
+
+    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+        TCGv_i32 oi = tcg_temp_new_i32();
+        if (ctx->le_mode) {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ | MO_ALIGN_16,
+                                                ctx->mem_idx));
+            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
+        } else {
+            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ | MO_ALIGN_16,
+                                                ctx->mem_idx));
+            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
+        }
+        tcg_temp_free_i32(oi);
+        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
+#else
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
+        tcg_temp_free(EA);
+        return;
+#endif
+    } else if (ctx->le_mode) {
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ | MO_ALIGN_16);
+        tcg_gen_mov_tl(cpu_reserve, EA);
+        gen_addr_add(ctx, EA, EA, 8);
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
+    } else {
+        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ | MO_ALIGN_16);
+        tcg_gen_mov_tl(cpu_reserve, EA);
+        gen_addr_add(ctx, EA, EA, 8);
+        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
+    }
     tcg_temp_free(EA);
+
+    tcg_gen_st_tl(hi, cpu_env, offsetof(CPUPPCState, reserve_val));
+    tcg_gen_st_tl(lo, cpu_env, offsetof(CPUPPCState, reserve_val2));
 }
 
 /* stqcx. */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
@ 2018-06-28  3:49   ` David Gibson
  2018-06-28 15:22     ` Richard Henderson
  0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2018-06-28  3:49 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 8966 bytes --]

On Tue, Jun 26, 2018 at 09:19:10AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that both of these
> instructions are single-copy atomic.  As we cannot (yet) issue
> 128-bit loads within TCG, use the generic helpers provided.
> 
> Since TCG cannot (yet) return a 128-bit value, add a slot within
> CPUPPCState for returning the high half of a 128-bit return value.
> This solution is preferred to the helper assigning to architectural
> registers directly, as it avoids clobbering all TCG live values.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/ppc/cpu.h        |  3 ++
>  target/ppc/helper.h     |  5 +++
>  target/ppc/mem_helper.c | 20 ++++++++-
>  target/ppc/translate.c  | 93 ++++++++++++++++++++++++++++++-----------
>  4 files changed, 95 insertions(+), 26 deletions(-)
> 
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index c7f3fb6b73..973cf44cda 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1015,6 +1015,9 @@ struct CPUPPCState {
>      /* Next instruction pointer */
>      target_ulong nip;
>  
> +    /* High part of 128-bit helper return.  */
> +    uint64_t retxh;
> +

Adding a temporary here is kind of gross.  I guess the helper
interface doesn't allow for 128-bit returns, but couldn't you pass a
register number into the helper and have it update the right GPR
without going through a temp?

>      int access_type; /* when a memory exception occurs, the access
>                          type is stored here */
>  
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index d751f0e219..3f451a5d7e 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -799,3 +799,8 @@ DEF_HELPER_4(dscliq, void, env, fprp, fprp, i32)
>  
>  DEF_HELPER_1(tbegin, void, env)
>  DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
> +
> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +#endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index a34e604db3..44a8f3445a 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -21,9 +21,9 @@
>  #include "exec/exec-all.h"
>  #include "qemu/host-utils.h"
>  #include "exec/helper-proto.h"
> -
>  #include "helper_regs.h"
>  #include "exec/cpu_ldst.h"
> +#include "tcg.h"
>  #include "internal.h"
>  
>  //#define DEBUG_OP
> @@ -215,6 +215,24 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg,
>      return i;
>  }
>  
> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> +uint64_t helper_lq_le_parallel(CPUPPCState *env, target_ulong addr,
> +                               uint32_t opidx)
> +{
> +    Int128 ret = helper_atomic_ldo_le_mmu(env, addr, opidx, GETPC());
> +    env->retxh = int128_gethi(ret);
> +    return int128_getlo(ret);
> +}
> +
> +uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
> +                               uint32_t opidx)
> +{
> +    Int128 ret = helper_atomic_ldo_be_mmu(env, addr, opidx, GETPC());
> +    env->retxh = int128_gethi(ret);
> +    return int128_getlo(ret);
> +}
> +#endif
> +
>  /*****************************************************************************/
>  /* Altivec extension helpers */
>  #if defined(HOST_WORDS_BIGENDIAN)
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 3a215a1dc6..0923cc24e3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2607,7 +2607,7 @@ static void gen_ld(DisasContext *ctx)
>  static void gen_lq(DisasContext *ctx)
>  {
>      int ra, rd;
> -    TCGv EA;
> +    TCGv EA, hi, lo;
>  
>      /* lq is a legal user mode instruction starting in ISA 2.07 */
>      bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> @@ -2633,16 +2633,35 @@ static void gen_lq(DisasContext *ctx)
>      EA = tcg_temp_new();
>      gen_addr_imm_index(ctx, EA, 0x0F);
>  
> -    /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
> -       necessary 64-bit byteswap already. */
> -    if (unlikely(ctx->le_mode)) {
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
> +    /* Note that the low part is always in RD+1, even in LE mode.  */
> +    lo = cpu_gpr[rd + 1];
> +    hi = cpu_gpr[rd];
> +
> +    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +        TCGv_i32 oi = tcg_temp_new_i32();
> +        if (ctx->le_mode) {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> +            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
> +        } else {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> +            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
> +        }
> +        tcg_temp_free_i32(oi);
> +        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
> +#else
> +        /* Restart with exclusive lock.  */
> +        gen_helper_exit_atomic(cpu_env);
> +        ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> +    } else if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ);
>          gen_addr_add(ctx, EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
>      } else {
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ);
>          gen_addr_add(ctx, EA, EA, 8);
> -        gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
>      }
>      tcg_temp_free(EA);
>  }
> @@ -3236,9 +3255,8 @@ STCX(stdcx_, DEF_MEMOP(MO_Q))
>  /* lqarx */
>  static void gen_lqarx(DisasContext *ctx)
>  {
> -    TCGv EA;
>      int rd = rD(ctx->opcode);
> -    TCGv gpr1, gpr2;
> +    TCGv EA, hi, lo;
>  
>      if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) ||
>                   (rd == rB(ctx->opcode)))) {
> @@ -3247,24 +3265,49 @@ static void gen_lqarx(DisasContext *ctx)
>      }
>  
>      gen_set_access_type(ctx, ACCESS_RES);
> -    EA = tcg_temp_local_new();
> +    EA = tcg_temp_new();
>      gen_addr_reg_index(ctx, EA);
> -    gen_check_align(ctx, EA, 15);
> -    if (unlikely(ctx->le_mode)) {
> -        gpr1 = cpu_gpr[rd+1];
> -        gpr2 = cpu_gpr[rd];
> -    } else {
> -        gpr1 = cpu_gpr[rd];
> -        gpr2 = cpu_gpr[rd+1];
> -    }
> -    tcg_gen_qemu_ld_i64(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
> -    tcg_gen_mov_tl(cpu_reserve, EA);
> -    gen_addr_add(ctx, EA, EA, 8);
> -    tcg_gen_qemu_ld_i64(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
>  
> -    tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
> -    tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
> +    /* Note that the low part is always in RD+1, even in LE mode.  */
> +    lo = cpu_gpr[rd + 1];
> +    hi = cpu_gpr[rd];
> +
> +    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +        TCGv_i32 oi = tcg_temp_new_i32();
> +        if (ctx->le_mode) {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ | MO_ALIGN_16,
> +                                                ctx->mem_idx));
> +            gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
> +        } else {
> +            tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ | MO_ALIGN_16,
> +                                                ctx->mem_idx));
> +            gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
> +        }
> +        tcg_temp_free_i32(oi);
> +        tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
> +#else
> +        /* Restart with exclusive lock.  */
> +        gen_helper_exit_atomic(cpu_env);
> +        ctx->base.is_jmp = DISAS_NORETURN;
> +        tcg_temp_free(EA);
> +        return;
> +#endif
> +    } else if (ctx->le_mode) {
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ | MO_ALIGN_16);
> +        tcg_gen_mov_tl(cpu_reserve, EA);
> +        gen_addr_add(ctx, EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
> +    } else {
> +        tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ | MO_ALIGN_16);
> +        tcg_gen_mov_tl(cpu_reserve, EA);
> +        gen_addr_add(ctx, EA, EA, 8);
> +        tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
> +    }
>      tcg_temp_free(EA);
> +
> +    tcg_gen_st_tl(hi, cpu_env, offsetof(CPUPPCState, reserve_val));
> +    tcg_gen_st_tl(lo, cpu_env, offsetof(CPUPPCState, reserve_val2));
>  }
>  
>  /* stqcx. */

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-28  3:49   ` David Gibson
@ 2018-06-28 15:22     ` Richard Henderson
  2018-06-29  3:33       ` David Gibson
  0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-28 15:22 UTC (permalink / raw)
  To: David Gibson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 956 bytes --]

On 06/27/2018 08:49 PM, David Gibson wrote:
>> +    /* High part of 128-bit helper return.  */
>> +    uint64_t retxh;
>> +
> 
> Adding a temporary here is kind of gross.  I guess the helper
> interface doesn't allow for 128-bit returns, but couldn't you pass a
> register number into the helper and have it update the right GPR
> without going through a temp?

I could pass a pointer, but that would cause ...

>> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
>> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)

... the helper definitions to lose TCG_CALL_NO_WG, because they *would* write
to a global register.  Which would cause TCG to discard all of the global guest
registers cached within host registers.

I've used this secondary memory return before, in target/s390,
and to me it seems cleaner than pointers.


r~


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 508 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
  2018-06-28 15:22     ` Richard Henderson
@ 2018-06-29  3:33       ` David Gibson
  0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29  3:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]

On Thu, Jun 28, 2018 at 08:22:38AM -0700, Richard Henderson wrote:
> On 06/27/2018 08:49 PM, David Gibson wrote:
> >> +    /* High part of 128-bit helper return.  */
> >> +    uint64_t retxh;
> >> +
> > 
> > Adding a temporary here is kind of gross.  I guess the helper
> > interface doesn't allow for 128-bit returns, but couldn't you pass a
> > register number into the helper and have it update the right GPR
> > without going through a temp?
> 
> I could pass a pointer, but that would cause ...
> 
> >> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> >> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> >> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> 
> ... the helper definitions to lose TCG_CALL_NO_WG, because they *would* write
> to a global register.  Which would cause TCG to discard all of the global guest
> registers cached within host registers.
> 
> I've used this secondary memory return before, in target/s390,
> and to me it seems cleaner than pointers.

Ok, sounds reasonable, applied to ppc-for-3.0.

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-28  3:51   ` David Gibson
  2018-06-29  3:33   ` David Gibson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
                   ` (10 subsequent siblings)
  13 siblings, 2 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Section 1.4 of the Power ISA v3.0B states that this insn is
single-copy atomic.  As we cannot (yet) issue 128-bit loads
within TCG, use the generic helpers provided.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h     |  4 ++++
 target/ppc/mem_helper.c | 14 ++++++++++++++
 target/ppc/translate.c  | 35 +++++++++++++++++++++++++++--------
 3 files changed, 45 insertions(+), 8 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 3f451a5d7e..cbc1228570 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
 #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
 DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
 DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
+                   void, env, tl, i64, i64, i32)
+DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
+                   void, env, tl, i64, i64, i32)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 44a8f3445a..57e301edc3 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
     env->retxh = int128_gethi(ret);
     return int128_getlo(ret);
 }
+
+void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
+                            uint64_t lo, uint64_t hi, uint32_t opidx)
+{
+    Int128 val = int128_make128(lo, hi);
+    helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
+}
+
+void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
+                            uint64_t lo, uint64_t hi, uint32_t opidx)
+{
+    Int128 val = int128_make128(lo, hi);
+    helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
+}
 #endif
 
 /*****************************************************************************/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0923cc24e3..3d63a62269 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
     if ((ctx->opcode & 0x3) == 0x2) { /* stq */
         bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
         bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
+        TCGv hi, lo;
 
         if (!(ctx->insns_flags & PPC_64BX)) {
             gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
@@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
         EA = tcg_temp_new();
         gen_addr_imm_index(ctx, EA, 0x03);
 
-        /* We only need to swap high and low halves. gen_qemu_st64_i64 does
-           necessary 64-bit byteswap already. */
-        if (unlikely(ctx->le_mode)) {
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
+        /* Note that the low part is always in RS+1, even in LE mode.  */
+        lo = cpu_gpr[rs + 1];
+        hi = cpu_gpr[rs];
+
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+            TCGv_i32 oi = tcg_temp_new_i32();
+            if (ctx->le_mode) {
+                tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
+                gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
+            } else {
+                tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
+                gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
+            }
+            tcg_temp_free_i32(oi);
+#else
+            /* Restart with exclusive lock.  */
+            gen_helper_exit_atomic(cpu_env);
+            ctx->base.is_jmp = DISAS_NORETURN;
+#endif
+        } else if (ctx->le_mode) {
+            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
             gen_addr_add(ctx, EA, EA, 8);
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
+            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
         } else {
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
+            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
             gen_addr_add(ctx, EA, EA, 8);
-            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
+            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
         }
         tcg_temp_free(EA);
     } else {
-        /* std / stdu*/
+        /* std / stdu */
         if (Rc(ctx->opcode)) {
             if (unlikely(rA(ctx->opcode) == 0)) {
                 gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
@ 2018-06-28  3:51   ` David Gibson
  2018-06-29  3:33   ` David Gibson
  1 sibling, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-28  3:51 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5147 bytes --]

On Tue, Jun 26, 2018 at 09:19:11AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that this insn is
> single-copy atomic.  As we cannot (yet) issue 128-bit loads

nit: s/loads/stores/

> within TCG, use the generic helpers provided.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/ppc/helper.h     |  4 ++++
>  target/ppc/mem_helper.c | 14 ++++++++++++++
>  target/ppc/translate.c  | 35 +++++++++++++++++++++++++++--------
>  3 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 3f451a5d7e..cbc1228570 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
>  #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
>  DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>  DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
>  #endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 44a8f3445a..57e301edc3 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
>      env->retxh = int128_gethi(ret);
>      return int128_getlo(ret);
>  }
> +
> +void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
> +}
> +
> +void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
> +}
>  #endif
>  
>  /*****************************************************************************/
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0923cc24e3..3d63a62269 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
>      if ((ctx->opcode & 0x3) == 0x2) { /* stq */
>          bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
>          bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> +        TCGv hi, lo;
>  
>          if (!(ctx->insns_flags & PPC_64BX)) {
>              gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> @@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
>          EA = tcg_temp_new();
>          gen_addr_imm_index(ctx, EA, 0x03);
>  
> -        /* We only need to swap high and low halves. gen_qemu_st64_i64 does
> -           necessary 64-bit byteswap already. */
> -        if (unlikely(ctx->le_mode)) {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +        /* Note that the low part is always in RS+1, even in LE mode.  */
> +        lo = cpu_gpr[rs + 1];
> +        hi = cpu_gpr[rs];
> +
> +        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +            TCGv_i32 oi = tcg_temp_new_i32();
> +            if (ctx->le_mode) {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> +                gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
> +            } else {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> +                gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
> +            }
> +            tcg_temp_free_i32(oi);
> +#else
> +            /* Restart with exclusive lock.  */
> +            gen_helper_exit_atomic(cpu_env);
> +            ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> +        } else if (ctx->le_mode) {
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
>          } else {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
>          }
>          tcg_temp_free(EA);
>      } else {
> -        /* std / stdu*/
> +        /* std / stdu */
>          if (Rc(ctx->opcode)) {
>              if (unlikely(rA(ctx->opcode) == 0)) {
>                  gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
  2018-06-28  3:51   ` David Gibson
@ 2018-06-29  3:33   ` David Gibson
  1 sibling, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29  3:33 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 5158 bytes --]

On Tue, Jun 26, 2018 at 09:19:11AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that this insn is
> single-copy atomic.  As we cannot (yet) issue 128-bit loads
> within TCG, use the generic helpers provided.
> 
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Applied to ppc-for-3.0, thanks.

> ---
>  target/ppc/helper.h     |  4 ++++
>  target/ppc/mem_helper.c | 14 ++++++++++++++
>  target/ppc/translate.c  | 35 +++++++++++++++++++++++++++--------
>  3 files changed, 45 insertions(+), 8 deletions(-)
> 
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 3f451a5d7e..cbc1228570 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
>  #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
>  DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>  DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
> +                   void, env, tl, i64, i64, i32)
>  #endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 44a8f3445a..57e301edc3 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
>      env->retxh = int128_gethi(ret);
>      return int128_getlo(ret);
>  }
> +
> +void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
> +}
> +
> +void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
> +                            uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> +    Int128 val = int128_make128(lo, hi);
> +    helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
> +}
>  #endif
>  
>  /*****************************************************************************/
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0923cc24e3..3d63a62269 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
>      if ((ctx->opcode & 0x3) == 0x2) { /* stq */
>          bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
>          bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> +        TCGv hi, lo;
>  
>          if (!(ctx->insns_flags & PPC_64BX)) {
>              gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> @@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
>          EA = tcg_temp_new();
>          gen_addr_imm_index(ctx, EA, 0x03);
>  
> -        /* We only need to swap high and low halves. gen_qemu_st64_i64 does
> -           necessary 64-bit byteswap already. */
> -        if (unlikely(ctx->le_mode)) {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +        /* Note that the low part is always in RS+1, even in LE mode.  */
> +        lo = cpu_gpr[rs + 1];
> +        hi = cpu_gpr[rs];
> +
> +        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> +            TCGv_i32 oi = tcg_temp_new_i32();
> +            if (ctx->le_mode) {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> +                gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
> +            } else {
> +                tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> +                gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
> +            }
> +            tcg_temp_free_i32(oi);
> +#else
> +            /* Restart with exclusive lock.  */
> +            gen_helper_exit_atomic(cpu_env);
> +            ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> +        } else if (ctx->le_mode) {
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
>          } else {
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> +            tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
>              gen_addr_add(ctx, EA, EA, 8);
> -            gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> +            tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
>          }
>          tcg_temp_free(EA);
>      } else {
> -        /* std / stdu*/
> +        /* std / stdu */
>          if (Rc(ctx->opcode)) {
>              if (unlikely(rA(ctx->opcode) == 0)) {
>                  gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (2 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
                   ` (9 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

When running in a parallel context, we must use a helper in order
to perform the 128-bit atomic operation.  When running in a serial
context, do the compare before the store.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h     |  2 +
 target/ppc/mem_helper.c | 38 +++++++++++++++++
 target/ppc/translate.c  | 95 ++++++++++++++++++++++++++---------------
 3 files changed, 101 insertions(+), 34 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index cbc1228570..5706c2497f 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -807,4 +807,6 @@ DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
                    void, env, tl, i64, i64, i32)
 DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
                    void, env, tl, i64, i64, i32)
+DEF_HELPER_5(stqcx_le_parallel, i32, env, tl, i64, i64, i32)
+DEF_HELPER_5(stqcx_be_parallel, i32, env, tl, i64, i64, i32)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 57e301edc3..8f0d86d104 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -245,6 +245,44 @@ void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
     Int128 val = int128_make128(lo, hi);
     helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
 }
+
+uint32_t helper_stqcx_le_parallel(CPUPPCState *env, target_ulong addr,
+                                  uint64_t new_lo, uint64_t new_hi,
+                                  uint32_t opidx)
+{
+    bool success = false;
+
+    if (likely(addr == env->reserve_addr)) {
+        Int128 oldv, cmpv, newv;
+
+        cmpv = int128_make128(env->reserve_val2, env->reserve_val);
+        newv = int128_make128(new_lo, new_hi);
+        oldv = helper_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv,
+                                             opidx, GETPC());
+        success = int128_eq(oldv, cmpv);
+    }
+    env->reserve_addr = -1;
+    return env->so + success * CRF_EQ_BIT;
+}
+
+uint32_t helper_stqcx_be_parallel(CPUPPCState *env, target_ulong addr,
+                                  uint64_t new_lo, uint64_t new_hi,
+                                  uint32_t opidx)
+{
+    bool success = false;
+
+    if (likely(addr == env->reserve_addr)) {
+        Int128 oldv, cmpv, newv;
+
+        cmpv = int128_make128(env->reserve_val2, env->reserve_val);
+        newv = int128_make128(new_lo, new_hi);
+        oldv = helper_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv,
+                                             opidx, GETPC());
+        success = int128_eq(oldv, cmpv);
+    }
+    env->reserve_addr = -1;
+    return env->so + success * CRF_EQ_BIT;
+}
 #endif
 
 /*****************************************************************************/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3d63a62269..c7b9d226eb 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3332,50 +3332,77 @@ static void gen_lqarx(DisasContext *ctx)
 /* stqcx. */
 static void gen_stqcx_(DisasContext *ctx)
 {
-    TCGv EA;
-    int reg = rS(ctx->opcode);
-    int len = 16;
-#if !defined(CONFIG_USER_ONLY)
-    TCGLabel *l1;
-    TCGv gpr1, gpr2;
-#endif
+    int rs = rS(ctx->opcode);
+    TCGv EA, hi, lo;
 
-    if (unlikely((rD(ctx->opcode) & 1))) {
+    if (unlikely(rs & 1)) {
         gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
         return;
     }
+
     gen_set_access_type(ctx, ACCESS_RES);
-    EA = tcg_temp_local_new();
+    EA = tcg_temp_new();
     gen_addr_reg_index(ctx, EA);
-    if (len > 1) {
-        gen_check_align(ctx, EA, (len) - 1);
-    }
 
-#if defined(CONFIG_USER_ONLY)
-    gen_conditional_store(ctx, EA, reg, 16);
+    /* Note that the low part is always in RS+1, even in LE mode.  */
+    lo = cpu_gpr[rs + 1];
+    hi = cpu_gpr[rs];
+
+    if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+        TCGv_i32 oi = tcg_const_i32(DEF_MEMOP(MO_Q) | MO_ALIGN_16);
+#ifdef CONFIG_ATOMIC128
+        if (ctx->le_mode) {
+            gen_helper_stqcx_le_parallel(cpu_crf[0], cpu_env, EA, lo, hi, oi);
+        } else {
+            gen_helper_stqcx_le_parallel(cpu_crf[0], cpu_env, EA, lo, hi, oi);
+        }
 #else
-    tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
-    l1 = gen_new_label();
-    tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
-    tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
-
-    if (unlikely(ctx->le_mode)) {
-        gpr1 = cpu_gpr[reg + 1];
-        gpr2 = cpu_gpr[reg];
-    } else {
-        gpr1 = cpu_gpr[reg];
-        gpr2 = cpu_gpr[reg + 1];
-    }
-    tcg_gen_qemu_st_tl(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
-    gen_addr_add(ctx, EA, EA, 8);
-    tcg_gen_qemu_st_tl(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
-
-    gen_set_label(l1);
-    tcg_gen_movi_tl(cpu_reserve, -1);
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
 #endif
-    tcg_temp_free(EA);
-}
+        tcg_temp_free(EA);
+        tcg_temp_free_i32(oi);
+    } else {
+        TCGLabel *lab_fail = gen_new_label();
+        TCGLabel *lab_over = gen_new_label();
+        TCGv_i64 t0 = tcg_temp_new_i64();
+        TCGv_i64 t1 = tcg_temp_new_i64();
 
+        tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, lab_fail);
+        tcg_temp_free(EA);
+
+        gen_qemu_ld64_i64(ctx, t0, cpu_reserve);
+        tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
+                                     ? offsetof(CPUPPCState, reserve_val2)
+                                     : offsetof(CPUPPCState, reserve_val)));
+        tcg_gen_brcond_i64(TCG_COND_NE, t0, t1, lab_fail);
+
+        tcg_gen_addi_i64(t0, cpu_reserve, 8);
+        gen_qemu_ld64_i64(ctx, t0, t0);
+        tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
+                                     ? offsetof(CPUPPCState, reserve_val)
+                                     : offsetof(CPUPPCState, reserve_val2)));
+        tcg_gen_brcond_i64(TCG_COND_NE, t0, t1, lab_fail);
+
+        /* Success */
+        gen_qemu_st64_i64(ctx, ctx->le_mode ? lo : hi, cpu_reserve);
+        tcg_gen_addi_i64(t0, cpu_reserve, 8);
+        gen_qemu_st64_i64(ctx, ctx->le_mode ? hi : lo, t0);
+
+        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+        tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
+        tcg_gen_br(lab_over);
+
+        gen_set_label(lab_fail);
+        tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+
+        gen_set_label(lab_over);
+        tcg_gen_movi_tl(cpu_reserve, -1);
+        tcg_temp_free_i64(t0);
+        tcg_temp_free_i64(t1);
+    }
+}
 #endif /* defined(TARGET_PPC64) */
 
 /* sync */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (3 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
                   ` (8 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Always use the gen_conditional_store implementation that uses
atomic_cmpxchg.  Make sure and clear reserve_addr across most
interrupts crossing the cpu_loop.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/cpu.h          |   5 --
 linux-user/ppc/cpu_loop.c | 123 +++++++-------------------------------
 target/ppc/translate.c    |  14 -----
 3 files changed, 23 insertions(+), 119 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 973cf44cda..4edcf62cf7 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -196,7 +196,6 @@ enum {
     /* QEMU exceptions: special cases we want to stop translation            */
     POWERPC_EXCP_SYNC         = 0x202, /* context synchronizing instruction  */
     POWERPC_EXCP_SYSCALL_USER = 0x203, /* System call in user mode only      */
-    POWERPC_EXCP_STCX         = 0x204 /* Conditional stores in user mode     */
 };
 
 /* Exceptions error codes                                                    */
@@ -994,10 +993,6 @@ struct CPUPPCState {
     /* Reservation value */
     target_ulong reserve_val;
     target_ulong reserve_val2;
-    /* Reservation store address */
-    target_ulong reserve_ea;
-    /* Reserved store source register and size */
-    target_ulong reserve_info;
 
     /* Those ones are used in supervisor mode only */
     /* machine state register */
diff --git a/linux-user/ppc/cpu_loop.c b/linux-user/ppc/cpu_loop.c
index 2fb516cb00..133a87f349 100644
--- a/linux-user/ppc/cpu_loop.c
+++ b/linux-user/ppc/cpu_loop.c
@@ -65,99 +65,23 @@ int ppc_dcr_write (ppc_dcr_t *dcr_env, int dcrn, uint32_t val)
     return -1;
 }
 
-static int do_store_exclusive(CPUPPCState *env)
-{
-    target_ulong addr;
-    target_ulong page_addr;
-    target_ulong val, val2 __attribute__((unused)) = 0;
-    int flags;
-    int segv = 0;
-
-    addr = env->reserve_ea;
-    page_addr = addr & TARGET_PAGE_MASK;
-    start_exclusive();
-    mmap_lock();
-    flags = page_get_flags(page_addr);
-    if ((flags & PAGE_READ) == 0) {
-        segv = 1;
-    } else {
-        int reg = env->reserve_info & 0x1f;
-        int size = env->reserve_info >> 5;
-        int stored = 0;
-
-        if (addr == env->reserve_addr) {
-            switch (size) {
-            case 1: segv = get_user_u8(val, addr); break;
-            case 2: segv = get_user_u16(val, addr); break;
-            case 4: segv = get_user_u32(val, addr); break;
-#if defined(TARGET_PPC64)
-            case 8: segv = get_user_u64(val, addr); break;
-            case 16: {
-                segv = get_user_u64(val, addr);
-                if (!segv) {
-                    segv = get_user_u64(val2, addr + 8);
-                }
-                break;
-            }
-#endif
-            default: abort();
-            }
-            if (!segv && val == env->reserve_val) {
-                val = env->gpr[reg];
-                switch (size) {
-                case 1: segv = put_user_u8(val, addr); break;
-                case 2: segv = put_user_u16(val, addr); break;
-                case 4: segv = put_user_u32(val, addr); break;
-#if defined(TARGET_PPC64)
-                case 8: segv = put_user_u64(val, addr); break;
-                case 16: {
-                    if (val2 == env->reserve_val2) {
-                        if (msr_le) {
-                            val2 = val;
-                            val = env->gpr[reg+1];
-                        } else {
-                            val2 = env->gpr[reg+1];
-                        }
-                        segv = put_user_u64(val, addr);
-                        if (!segv) {
-                            segv = put_user_u64(val2, addr + 8);
-                        }
-                    }
-                    break;
-                }
-#endif
-                default: abort();
-                }
-                if (!segv) {
-                    stored = 1;
-                }
-            }
-        }
-        env->crf[0] = (stored << 1) | xer_so;
-        env->reserve_addr = (target_ulong)-1;
-    }
-    if (!segv) {
-        env->nip += 4;
-    }
-    mmap_unlock();
-    end_exclusive();
-    return segv;
-}
-
 void cpu_loop(CPUPPCState *env)
 {
     CPUState *cs = CPU(ppc_env_get_cpu(env));
     target_siginfo_t info;
-    int trapnr;
+    int trapnr, sig;
     target_ulong ret;
 
     for(;;) {
+        bool arch_interrupt;
+
         cpu_exec_start(cs);
         trapnr = cpu_exec(cs);
         cpu_exec_end(cs);
         process_queued_cpu_work(cs);
 
-        switch(trapnr) {
+        arch_interrupt = true;
+        switch (trapnr) {
         case POWERPC_EXCP_NONE:
             /* Just go on */
             break;
@@ -524,26 +448,15 @@ void cpu_loop(CPUPPCState *env)
             }
             env->gpr[3] = ret;
             break;
-        case POWERPC_EXCP_STCX:
-            if (do_store_exclusive(env)) {
-                info.si_signo = TARGET_SIGSEGV;
-                info.si_errno = 0;
-                info.si_code = TARGET_SEGV_MAPERR;
-                info._sifields._sigfault._addr = env->nip;
-                queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
-            }
-            break;
         case EXCP_DEBUG:
-            {
-                int sig;
-
-                sig = gdb_handlesig(cs, TARGET_SIGTRAP);
-                if (sig) {
-                    info.si_signo = sig;
-                    info.si_errno = 0;
-                    info.si_code = TARGET_TRAP_BRKPT;
-                    queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
-                  }
+            sig = gdb_handlesig(cs, TARGET_SIGTRAP);
+            if (sig) {
+                info.si_signo = sig;
+                info.si_errno = 0;
+                info.si_code = TARGET_TRAP_BRKPT;
+                queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+            } else {
+                arch_interrupt = false;
             }
             break;
         case EXCP_INTERRUPT:
@@ -551,12 +464,22 @@ void cpu_loop(CPUPPCState *env)
             break;
         case EXCP_ATOMIC:
             cpu_exec_step_atomic(cs);
+            arch_interrupt = false;
             break;
         default:
             cpu_abort(cs, "Unknown exception 0x%x. Aborting\n", trapnr);
             break;
         }
         process_pending_signals(env);
+
+        /* Most of the traps imply a transition through kernel mode,
+         * which implies an REI instruction has been executed.  Which
+         * means that RX and LOCK_ADDR should be cleared.  But there
+         * are a few exceptions for traps internal to QEMU.
+         */
+        if (arch_interrupt) {
+            env->reserve_addr = -1;
+        }
     }
 }
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c7b9d226eb..03e8c5df03 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3201,19 +3201,6 @@ ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
 ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
 #endif
 
-#if defined(CONFIG_USER_ONLY)
-static void gen_conditional_store(DisasContext *ctx, TCGv EA,
-                                  int reg, int memop)
-{
-    TCGv t0 = tcg_temp_new();
-
-    tcg_gen_st_tl(EA, cpu_env, offsetof(CPUPPCState, reserve_ea));
-    tcg_gen_movi_tl(t0, (MEMOP_GET_SIZE(memop) << 5) | reg);
-    tcg_gen_st_tl(t0, cpu_env, offsetof(CPUPPCState, reserve_info));
-    tcg_temp_free(t0);
-    gen_exception_err(ctx, POWERPC_EXCP_STCX, 0);
-}
-#else
 static void gen_conditional_store(DisasContext *ctx, TCGv EA,
                                   int reg, int memop)
 {
@@ -3244,7 +3231,6 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
     gen_set_label(l2);
     tcg_gen_movi_tl(cpu_reserve, -1);
 }
-#endif
 
 #define STCX(name, memop)                                   \
 static void gen_##name(DisasContext *ctx)                   \
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (4 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
                   ` (7 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Leave only the minimal amount of code within the STCX macro,
moving the rest of the code into gen_conditional_store.
Remove the explicit call to gen_check_align; the matching LDAX will
have already checked alignment, and we verify the same address.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 28 +++++++++++-----------------
 1 file changed, 11 insertions(+), 17 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 03e8c5df03..e751072404 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3201,14 +3201,17 @@ ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
 ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
 #endif
 
-static void gen_conditional_store(DisasContext *ctx, TCGv EA,
-                                  int reg, int memop)
+static void gen_conditional_store(DisasContext *ctx, TCGMemOp memop)
 {
     TCGLabel *l1 = gen_new_label();
     TCGLabel *l2 = gen_new_label();
-    TCGv t0;
+    TCGv t0 = tcg_temp_new();
+    int reg = rS(ctx->opcode);
 
-    tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
+    gen_set_access_type(ctx, ACCESS_RES);
+    gen_addr_reg_index(ctx, t0);
+    tcg_gen_brcond_tl(TCG_COND_NE, t0, cpu_reserve, l1);
+    tcg_temp_free(t0);
 
     t0 = tcg_temp_new();
     tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val,
@@ -3232,19 +3235,10 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
     tcg_gen_movi_tl(cpu_reserve, -1);
 }
 
-#define STCX(name, memop)                                   \
-static void gen_##name(DisasContext *ctx)                   \
-{                                                           \
-    TCGv t0;                                                \
-    int len = MEMOP_GET_SIZE(memop);                        \
-    gen_set_access_type(ctx, ACCESS_RES);                   \
-    t0 = tcg_temp_local_new();                              \
-    gen_addr_reg_index(ctx, t0);                            \
-    if (len > 1) {                                          \
-        gen_check_align(ctx, t0, (len) - 1);                \
-    }                                                       \
-    gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \
-    tcg_temp_free(t0);                                      \
+#define STCX(name, memop)                  \
+static void gen_##name(DisasContext *ctx)  \
+{                                          \
+    gen_conditional_store(ctx, memop);     \
 }
 
 STCX(stbcx_, DEF_MEMOP(MO_UB))
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (5 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
                   ` (6 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Leave only the minimal amount of code within the LDAR macro,
moving the rest of the code into gen_load_locked.  Use MO_ALIGN
and remove the explicit call to gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 35 ++++++++++++++++++-----------------
 1 file changed, 18 insertions(+), 17 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index e751072404..f48fcbeefb 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3070,23 +3070,24 @@ static void gen_isync(DisasContext *ctx)
 
 #define MEMOP_GET_SIZE(x)  (1 << ((x) & MO_SIZE))
 
-#define LARX(name, memop)                                            \
-static void gen_##name(DisasContext *ctx)                            \
-{                                                                    \
-    TCGv t0;                                                         \
-    TCGv gpr = cpu_gpr[rD(ctx->opcode)];                             \
-    int len = MEMOP_GET_SIZE(memop);                                 \
-    gen_set_access_type(ctx, ACCESS_RES);                            \
-    t0 = tcg_temp_local_new();                                       \
-    gen_addr_reg_index(ctx, t0);                                     \
-    if ((len) > 1) {                                                 \
-        gen_check_align(ctx, t0, (len)-1);                           \
-    }                                                                \
-    tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop);                \
-    tcg_gen_mov_tl(cpu_reserve, t0);                                 \
-    tcg_gen_mov_tl(cpu_reserve_val, gpr);                            \
-    tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);                           \
-    tcg_temp_free(t0);                                               \
+static void gen_load_locked(DisasContext *ctx, TCGMemOp memop)
+{
+    TCGv gpr = cpu_gpr[rD(ctx->opcode)];
+    TCGv t0 = tcg_temp_new();
+
+    gen_set_access_type(ctx, ACCESS_RES);
+    gen_addr_reg_index(ctx, t0);
+    tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop | MO_ALIGN);
+    tcg_gen_mov_tl(cpu_reserve, t0);
+    tcg_gen_mov_tl(cpu_reserve_val, gpr);
+    tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+    tcg_temp_free(t0);
+}
+
+#define LARX(name, memop)                  \
+static void gen_##name(DisasContext *ctx)  \
+{                                          \
+    gen_load_locked(ctx, memop);           \
 }
 
 /* lwarx */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (6 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
                   ` (5 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Move the guts of LD_ATOMIC to a function.  Use foo_tl for the operations
instead of foo_i32 or foo_i64 specifically.  Use MO_ALIGN instead of an
explicit call to gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 105 ++++++++++++++++++++---------------------
 1 file changed, 52 insertions(+), 53 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f48fcbeefb..361b178db8 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3095,61 +3095,60 @@ LARX(lbarx, DEF_MEMOP(MO_UB))
 LARX(lharx, DEF_MEMOP(MO_UW))
 LARX(lwarx, DEF_MEMOP(MO_UL))
 
-#define LD_ATOMIC(name, memop, tp, op, eop)                             \
-static void gen_##name(DisasContext *ctx)                               \
-{                                                                       \
-    int len = MEMOP_GET_SIZE(memop);                                    \
-    uint32_t gpr_FC = FC(ctx->opcode);                                  \
-    TCGv EA = tcg_temp_local_new();                                     \
-    TCGv_##tp t0, t1;                                                   \
-                                                                        \
-    gen_addr_register(ctx, EA);                                         \
-    if (len > 1) {                                                      \
-        gen_check_align(ctx, EA, len - 1);                              \
-    }                                                                   \
-    t0 = tcg_temp_new_##tp();                                           \
-    t1 = tcg_temp_new_##tp();                                           \
-    tcg_gen_##op(t0, cpu_gpr[rD(ctx->opcode) + 1]);                     \
-                                                                        \
-    switch (gpr_FC) {                                                   \
-    case 0: /* Fetch and add */                                         \
-        tcg_gen_atomic_fetch_add_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 1: /* Fetch and xor */                                         \
-        tcg_gen_atomic_fetch_xor_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 2: /* Fetch and or */                                          \
-        tcg_gen_atomic_fetch_or_##tp(t1, EA, t0, ctx->mem_idx, memop);  \
-        break;                                                          \
-    case 3: /* Fetch and 'and' */                                       \
-        tcg_gen_atomic_fetch_and_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 8: /* Swap */                                                  \
-        tcg_gen_atomic_xchg_##tp(t1, EA, t0, ctx->mem_idx, memop);      \
-        break;                                                          \
-    case 4:  /* Fetch and max unsigned */                               \
-    case 5:  /* Fetch and max signed */                                 \
-    case 6:  /* Fetch and min unsigned */                               \
-    case 7:  /* Fetch and min signed */                                 \
-    case 16: /* compare and swap not equal */                           \
-    case 24: /* Fetch and increment bounded */                          \
-    case 25: /* Fetch and increment equal */                            \
-    case 28: /* Fetch and decrement bounded */                          \
-        gen_invalid(ctx);                                               \
-        break;                                                          \
-    default:                                                            \
-        /* invoke data storage error handler */                         \
-        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);   \
-    }                                                                   \
-    tcg_gen_##eop(cpu_gpr[rD(ctx->opcode)], t1);                        \
-    tcg_temp_free_##tp(t0);                                             \
-    tcg_temp_free_##tp(t1);                                             \
-    tcg_temp_free(EA);                                                  \
+static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
+{
+    uint32_t gpr_FC = FC(ctx->opcode);
+    TCGv EA = tcg_temp_new();
+    TCGv src, dst;
+
+    gen_addr_register(ctx, EA);
+    dst = cpu_gpr[rD(ctx->opcode)];
+    src = cpu_gpr[rD(ctx->opcode) + 1];
+
+    memop |= MO_ALIGN;
+    switch (gpr_FC) {
+    case 0: /* Fetch and add */
+        tcg_gen_atomic_fetch_add_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 1: /* Fetch and xor */
+        tcg_gen_atomic_fetch_xor_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 2: /* Fetch and or */
+        tcg_gen_atomic_fetch_or_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 3: /* Fetch and 'and' */
+        tcg_gen_atomic_fetch_and_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 8: /* Swap */
+        tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 4:  /* Fetch and max unsigned */
+    case 5:  /* Fetch and max signed */
+    case 6:  /* Fetch and min unsigned */
+    case 7:  /* Fetch and min signed */
+    case 16: /* compare and swap not equal */
+    case 24: /* Fetch and increment bounded */
+    case 25: /* Fetch and increment equal */
+    case 28: /* Fetch and decrement bounded */
+        gen_invalid(ctx);
+        break;
+    default:
+        /* invoke data storage error handler */
+        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
+    }
+    tcg_temp_free(EA);
 }
 
-LD_ATOMIC(lwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32, extu_i32_tl)
-#if defined(TARGET_PPC64)
-LD_ATOMIC(ldat, DEF_MEMOP(MO_Q), i64, mov_i64, mov_i64)
+static void gen_lwat(DisasContext *ctx)
+{
+    gen_ld_atomic(ctx, DEF_MEMOP(MO_UL));
+}
+
+#ifdef TARGET_PPC64
+static void gen_ldat(DisasContext *ctx)
+{
+    gen_ld_atomic(ctx, DEF_MEMOP(MO_Q));
+}
 #endif
 
 #define ST_ATOMIC(name, memop, tp, op)                                  \
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (7 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
                   ` (4 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

Move the guts of ST_ATOMIC to a function.  Use foo_tl for the operations
instead of foo_i32 or foo_i64 specifically.  Use MO_ALIGN instead of an
explicit call to gen_check_align.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 93 +++++++++++++++++++++---------------------
 1 file changed, 47 insertions(+), 46 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 361b178db8..53ca8f0114 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3151,54 +3151,55 @@ static void gen_ldat(DisasContext *ctx)
 }
 #endif
 
-#define ST_ATOMIC(name, memop, tp, op)                                  \
-static void gen_##name(DisasContext *ctx)                               \
-{                                                                       \
-    int len = MEMOP_GET_SIZE(memop);                                    \
-    uint32_t gpr_FC = FC(ctx->opcode);                                  \
-    TCGv EA = tcg_temp_local_new();                                     \
-    TCGv_##tp t0, t1;                                                   \
-                                                                        \
-    gen_addr_register(ctx, EA);                                         \
-    if (len > 1) {                                                      \
-        gen_check_align(ctx, EA, len - 1);                              \
-    }                                                                   \
-    t0 = tcg_temp_new_##tp();                                           \
-    t1 = tcg_temp_new_##tp();                                           \
-    tcg_gen_##op(t0, cpu_gpr[rD(ctx->opcode) + 1]);                     \
-                                                                        \
-    switch (gpr_FC) {                                                   \
-    case 0: /* add and Store */                                         \
-        tcg_gen_atomic_add_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 1: /* xor and Store */                                         \
-        tcg_gen_atomic_xor_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 2: /* Or and Store */                                          \
-        tcg_gen_atomic_or_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop);  \
-        break;                                                          \
-    case 3: /* 'and' and Store */                                       \
-        tcg_gen_atomic_and_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
-        break;                                                          \
-    case 4:  /* Store max unsigned */                                   \
-    case 5:  /* Store max signed */                                     \
-    case 6:  /* Store min unsigned */                                   \
-    case 7:  /* Store min signed */                                     \
-    case 24: /* Store twin  */                                          \
-        gen_invalid(ctx);                                               \
-        break;                                                          \
-    default:                                                            \
-        /* invoke data storage error handler */                         \
-        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);   \
-    }                                                                   \
-    tcg_temp_free_##tp(t0);                                             \
-    tcg_temp_free_##tp(t1);                                             \
-    tcg_temp_free(EA);                                                  \
+static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
+{
+    uint32_t gpr_FC = FC(ctx->opcode);
+    TCGv EA = tcg_temp_new();
+    TCGv src, discard;
+
+    gen_addr_register(ctx, EA);
+    src = cpu_gpr[rD(ctx->opcode)];
+    discard = tcg_temp_new();
+
+    memop |= MO_ALIGN;
+    switch (gpr_FC) {
+    case 0: /* add and Store */
+        tcg_gen_atomic_add_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 1: /* xor and Store */
+        tcg_gen_atomic_xor_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 2: /* Or and Store */
+        tcg_gen_atomic_or_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 3: /* 'and' and Store */
+        tcg_gen_atomic_and_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
+    case 4:  /* Store max unsigned */
+    case 5:  /* Store max signed */
+    case 6:  /* Store min unsigned */
+    case 7:  /* Store min signed */
+    case 24: /* Store twin  */
+        gen_invalid(ctx);
+        break;
+    default:
+        /* invoke data storage error handler */
+        gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
+    }
+    tcg_temp_free(discard);
+    tcg_temp_free(EA);
 }
 
-ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
-#if defined(TARGET_PPC64)
-ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
+static void gen_stwat(DisasContext *ctx)
+{
+    gen_st_atomic(ctx, DEF_MEMOP(MO_UL));
+}
+
+#ifdef TARGET_PPC64
+static void gen_stdat(DisasContext *ctx)
+{
+    gen_st_atomic(ctx, DEF_MEMOP(MO_Q));
+}
 #endif
 
 static void gen_conditional_store(DisasContext *ctx, TCGMemOp memop)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (8 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
                   ` (3 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

This avoids the need for gen_check_align entirely.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 25 ++++---------------------
 1 file changed, 4 insertions(+), 21 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 53ca8f0114..c2a28be6d7 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2388,23 +2388,6 @@ static inline void gen_addr_add(DisasContext *ctx, TCGv ret, TCGv arg1,
     }
 }
 
-static inline void gen_check_align(DisasContext *ctx, TCGv EA, int mask)
-{
-    TCGLabel *l1 = gen_new_label();
-    TCGv t0 = tcg_temp_new();
-    TCGv_i32 t1, t2;
-    tcg_gen_andi_tl(t0, EA, mask);
-    tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, l1);
-    t1 = tcg_const_i32(POWERPC_EXCP_ALIGN);
-    t2 = tcg_const_i32(ctx->opcode & 0x03FF0000);
-    gen_update_nip(ctx, ctx->base.pc_next - 4);
-    gen_helper_raise_exception_err(cpu_env, t1, t2);
-    tcg_temp_free_i32(t1);
-    tcg_temp_free_i32(t2);
-    gen_set_label(l1);
-    tcg_temp_free(t0);
-}
-
 static inline void gen_align_no_le(DisasContext *ctx)
 {
     gen_exception_err(ctx, POWERPC_EXCP_ALIGN,
@@ -4706,8 +4689,8 @@ static void gen_eciwx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_EXT);
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_check_align(ctx, t0, 0x03);
-    gen_qemu_ld32u(ctx, cpu_gpr[rD(ctx->opcode)], t0);
+    tcg_gen_qemu_ld_tl(cpu_gpr[rD(ctx->opcode)], t0, ctx->mem_idx,
+                       DEF_MEMOP(MO_UL | MO_ALIGN));
     tcg_temp_free(t0);
 }
 
@@ -4719,8 +4702,8 @@ static void gen_ecowx(DisasContext *ctx)
     gen_set_access_type(ctx, ACCESS_EXT);
     t0 = tcg_temp_new();
     gen_addr_reg_index(ctx, t0);
-    gen_check_align(ctx, t0, 0x03);
-    gen_qemu_st32(ctx, cpu_gpr[rD(ctx->opcode)], t0);
+    tcg_gen_qemu_st_tl(cpu_gpr[rD(ctx->opcode)], t0, ctx->mem_idx,
+                       DEF_MEMOP(MO_UL | MO_ALIGN));
     tcg_temp_free(t0);
 }
 
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (9 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
                   ` (2 subsequent siblings)
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

These operations were previously unimplemented for ppc.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 24 ++++++++++++++++++++----
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c2a28be6d7..79285b6698 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3102,13 +3102,21 @@ static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
     case 3: /* Fetch and 'and' */
         tcg_gen_atomic_fetch_and_tl(dst, EA, src, ctx->mem_idx, memop);
         break;
+    case 4:  /* Fetch and max unsigned */
+        tcg_gen_atomic_fetch_umax_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 5:  /* Fetch and max signed */
+        tcg_gen_atomic_fetch_smax_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 6:  /* Fetch and min unsigned */
+        tcg_gen_atomic_fetch_umin_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
+    case 7:  /* Fetch and min signed */
+        tcg_gen_atomic_fetch_smin_tl(dst, EA, src, ctx->mem_idx, memop);
+        break;
     case 8: /* Swap */
         tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
         break;
-    case 4:  /* Fetch and max unsigned */
-    case 5:  /* Fetch and max signed */
-    case 6:  /* Fetch and min unsigned */
-    case 7:  /* Fetch and min signed */
     case 16: /* compare and swap not equal */
     case 24: /* Fetch and increment bounded */
     case 25: /* Fetch and increment equal */
@@ -3159,9 +3167,17 @@ static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
         tcg_gen_atomic_and_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
         break;
     case 4:  /* Store max unsigned */
+        tcg_gen_atomic_umax_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 5:  /* Store max signed */
+        tcg_gen_atomic_smax_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 6:  /* Store min unsigned */
+        tcg_gen_atomic_umin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 7:  /* Store min signed */
+        tcg_gen_atomic_smin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+        break;
     case 24: /* Store twin  */
         gen_invalid(ctx);
         break;
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (10 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
  2018-06-29  4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

These cases were stubbed out.  For now, implement them only within
a serial context, forcing parallel execution to synchronize.  It
would be possible to implement these with cmpxchg loops, if we care.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 89 ++++++++++++++++++++++++++++++++++++++----
 1 file changed, 82 insertions(+), 7 deletions(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 79285b6698..597a37d3ec 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3078,16 +3078,45 @@ LARX(lbarx, DEF_MEMOP(MO_UB))
 LARX(lharx, DEF_MEMOP(MO_UW))
 LARX(lwarx, DEF_MEMOP(MO_UL))
 
+static void gen_fetch_inc_conditional(DisasContext *ctx, TCGMemOp memop,
+                                      TCGv EA, TCGCond cond, int addend)
+{
+    TCGv t = tcg_temp_new();
+    TCGv t2 = tcg_temp_new();
+    TCGv u = tcg_temp_new();
+
+    tcg_gen_qemu_ld_tl(t, EA, ctx->mem_idx, memop);
+    tcg_gen_addi_tl(t2, EA, MEMOP_GET_SIZE(memop));
+    tcg_gen_qemu_ld_tl(t2, t2, ctx->mem_idx, memop);
+    tcg_gen_addi_tl(u, t, addend);
+
+    /* E.g. for fetch and increment bounded... */
+    /* mem(EA,s) = (t != t2 ? u = t + 1 : t) */
+    tcg_gen_movcond_tl(cond, u, t, t2, u, t);
+    tcg_gen_qemu_st_tl(u, EA, ctx->mem_idx, memop);
+
+    /* RT = (t != t2 ? t : u = 1<<(s*8-1)) */
+    tcg_gen_movi_tl(u, 1 << (MEMOP_GET_SIZE(memop) * 8 - 1));
+    tcg_gen_movcond_tl(cond, cpu_gpr[rD(ctx->opcode)], t, t2, t, u);
+
+    tcg_temp_free(t);
+    tcg_temp_free(t2);
+    tcg_temp_free(u);
+}
+
 static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
 {
     uint32_t gpr_FC = FC(ctx->opcode);
     TCGv EA = tcg_temp_new();
+    int rt = rD(ctx->opcode);
+    bool need_serial;
     TCGv src, dst;
 
     gen_addr_register(ctx, EA);
-    dst = cpu_gpr[rD(ctx->opcode)];
-    src = cpu_gpr[rD(ctx->opcode) + 1];
+    dst = cpu_gpr[rt];
+    src = cpu_gpr[(rt + 1) & 31];
 
+    need_serial = false;
     memop |= MO_ALIGN;
     switch (gpr_FC) {
     case 0: /* Fetch and add */
@@ -3117,17 +3146,63 @@ static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
     case 8: /* Swap */
         tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
         break;
-    case 16: /* compare and swap not equal */
-    case 24: /* Fetch and increment bounded */
-    case 25: /* Fetch and increment equal */
-    case 28: /* Fetch and decrement bounded */
-        gen_invalid(ctx);
+
+    case 16: /* Compare and swap not equal */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            TCGv t0 = tcg_temp_new();
+            TCGv t1 = tcg_temp_new();
+
+            tcg_gen_qemu_ld_tl(t0, EA, ctx->mem_idx, memop);
+            if ((memop & MO_SIZE) == MO_64 || TARGET_LONG_BITS == 32) {
+                tcg_gen_mov_tl(t1, src);
+            } else {
+                tcg_gen_ext32u_tl(t1, src);
+            }
+            tcg_gen_movcond_tl(TCG_COND_NE, t1, t0, t1,
+                               cpu_gpr[(rt + 2) & 31], t0);
+            tcg_gen_qemu_st_tl(t1, EA, ctx->mem_idx, memop);
+            tcg_gen_mov_tl(dst, t0);
+
+            tcg_temp_free(t0);
+            tcg_temp_free(t1);
+        }
         break;
+
+    case 24: /* Fetch and increment bounded */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_NE, 1);
+        }
+        break;
+    case 25: /* Fetch and increment equal */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_EQ, 1);
+        }
+        break;
+    case 28: /* Fetch and decrement bounded */
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            need_serial = true;
+        } else {
+            gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_NE, -1);
+        }
+        break;
+
     default:
         /* invoke data storage error handler */
         gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
     }
     tcg_temp_free(EA);
+
+    if (need_serial) {
+        /* Restart with exclusive lock.  */
+        gen_helper_exit_atomic(cpu_env);
+        ctx->base.is_jmp = DISAS_NORETURN;
+    }
 }
 
 static void gen_lwat(DisasContext *ctx)
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (11 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
  2018-06-29  4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson
  13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-ppc, david

The store twin case was stubbed out.  For now, implement it only within
a serial context, forcing parallel execution to synchronize.  It would
be possible to implement with a cmpxchg loop, if we care, but the loose
alignment requirements (simply no crossing 32-byte boundary) might send
us back to the serial context anyway.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/translate.c | 26 +++++++++++++++++++++++++-
 1 file changed, 25 insertions(+), 1 deletion(-)

diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 597a37d3ec..e120f2ed0b 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3254,7 +3254,31 @@ static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
         tcg_gen_atomic_smin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
         break;
     case 24: /* Store twin  */
-        gen_invalid(ctx);
+        if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+            /* Restart with exclusive lock.  */
+            gen_helper_exit_atomic(cpu_env);
+            ctx->base.is_jmp = DISAS_NORETURN;
+        } else {
+            TCGv t = tcg_temp_new();
+            TCGv t2 = tcg_temp_new();
+            TCGv s = tcg_temp_new();
+            TCGv s2 = tcg_temp_new();
+            TCGv ea_plus_s = tcg_temp_new();
+
+            tcg_gen_qemu_ld_tl(t, EA, ctx->mem_idx, memop);
+            tcg_gen_addi_tl(ea_plus_s, EA, MEMOP_GET_SIZE(memop));
+            tcg_gen_qemu_ld_tl(t2, ea_plus_s, ctx->mem_idx, memop);
+            tcg_gen_movcond_tl(TCG_COND_EQ, s, t, t2, src, t);
+            tcg_gen_movcond_tl(TCG_COND_EQ, s2, t, t2, src, t2);
+            tcg_gen_qemu_st_tl(s, EA, ctx->mem_idx, memop);
+            tcg_gen_qemu_st_tl(s2, ea_plus_s, ctx->mem_idx, memop);
+
+            tcg_temp_free(ea_plus_s);
+            tcg_temp_free(s2);
+            tcg_temp_free(s);
+            tcg_temp_free(t2);
+            tcg_temp_free(t);
+        }
         break;
     default:
         /* invoke data storage error handler */
-- 
2.17.1

^ permalink raw reply related	[flat|nested] 23+ messages in thread

* Re: [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations
  2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
                   ` (12 preceding siblings ...)
  2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
@ 2018-06-29  4:15 ` David Gibson
  13 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29  4:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-ppc

[-- Attachment #1: Type: text/plain, Size: 2578 bytes --]

On Tue, Jun 26, 2018 at 09:19:08AM -0700, Richard Henderson wrote:
> In another patch set this week, I had noticed the old linux-user
> do_store_exclusive code was still present.  I had thought that was
> dead code that simply hadn't been removed, but it turned out that
> we had not completed the transition to tcg atomics for linux-user.
> 
> In the process, I discovered that we weren't using atomic operations
> for the 128-bit lq, lqarx, and stqcx insns.  These would have simply
> produced incorrect results for -smp in system mode.
> 
> I tidy the code a bit by making use of MO_ALIGN, which means that
> we don't need a separate explicit alignment check.
> 
> I use the new min/max atomic operations I added recently for
> ARMv8.2-Atomics and RISC-V.
> 
> Finally, Power9 has some *really* odd atomic operations in its
> l[wd]at and st[wd]at instructions.  We were generating illegal
> instruction for these.  I implement them for serial context and
> force parallel context to grab the exclusive lock and try again.
> 
> Except for the trivial linux-user ll/sc case, I do not have any
> code that exercises these instructions.  Perhaps the IBM folk
> have something that can test the others?

I've now applied the whole series to ppc-for-3.0.

> 
> 
> r~
> 
> 
> Richard Henderson (13):
>   target/ppc: Add do_unaligned_access hook
>   target/ppc: Use atomic load for LQ and LQARX
>   target/ppc: Use atomic store for STQ
>   target/ppc: Use atomic cmpxchg for STQCX
>   target/ppc: Remove POWERPC_EXCP_STCX
>   target/ppc: Tidy gen_conditional_store
>   target/ppc: Split out gen_load_locked
>   target/ppc: Split out gen_ld_atomic
>   target/ppc: Split out gen_st_atomic
>   target/ppc: Use MO_ALIGN for EXIWX and ECOWX
>   target/ppc: Use atomic min/max helpers
>   target/ppc: Implement the rest of gen_ld_atomic
>   target/ppc: Implement the rest of gen_st_atomic
> 
>  target/ppc/cpu.h                |   8 +-
>  target/ppc/helper.h             |  11 +
>  target/ppc/internal.h           |   5 +
>  linux-user/ppc/cpu_loop.c       | 123 ++----
>  target/ppc/excp_helper.c        |  18 +-
>  target/ppc/mem_helper.c         |  72 +++-
>  target/ppc/translate.c          | 648 ++++++++++++++++++++------------
>  target/ppc/translate_init.inc.c |   1 +
>  8 files changed, 539 insertions(+), 347 deletions(-)
> 

-- 
David Gibson			| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au	| minimalist, thank you.  NOT _the_ _other_
				| _way_ _around_!
http://www.ozlabs.org/~dgibson

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 23+ messages in thread

end of thread, other threads:[~2018-06-29  4:15 UTC | newest]

Thread overview: 23+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
2018-06-27  9:09   ` David Gibson
2018-06-27 13:52     ` Richard Henderson
2018-06-28  3:46       ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
2018-06-28  3:49   ` David Gibson
2018-06-28 15:22     ` Richard Henderson
2018-06-29  3:33       ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
2018-06-28  3:51   ` David Gibson
2018-06-29  3:33   ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
2018-06-29  4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).