* [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-27 9:09 ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
` (12 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
This allows faults from MO_ALIGN to have the same effect
as from gen_check_align.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/internal.h | 5 +++++
target/ppc/excp_helper.c | 18 +++++++++++++++++-
target/ppc/translate_init.inc.c | 1 +
3 files changed, 23 insertions(+), 1 deletion(-)
diff --git a/target/ppc/internal.h b/target/ppc/internal.h
index 1f441c6483..a9bcadff42 100644
--- a/target/ppc/internal.h
+++ b/target/ppc/internal.h
@@ -252,4 +252,9 @@ static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
void helper_compute_fprf_float32(CPUPPCState *env, float32 arg);
void helper_compute_fprf_float128(CPUPPCState *env, float128 arg);
+
+/* Raise a data fault alignment exception for the specified virtual address */
+void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
+ MMUAccessType access_type,
+ int mmu_idx, uintptr_t retaddr);
#endif /* PPC_INTERNAL_H */
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index c092fbead0..d6e97a90e0 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -22,7 +22,7 @@
#include "exec/helper-proto.h"
#include "exec/exec-all.h"
#include "exec/cpu_ldst.h"
-
+#include "internal.h"
#include "helper_regs.h"
//#define DEBUG_OP
@@ -1198,3 +1198,19 @@ void helper_book3s_msgsnd(target_ulong rb)
qemu_mutex_unlock_iothread();
}
#endif
+
+void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
+ MMUAccessType access_type,
+ int mmu_idx, uintptr_t retaddr)
+{
+ CPUPPCState *env = cs->env_ptr;
+ uint32_t insn;
+
+ /* Restore state and reload the insn we executed, for filling in DSISR. */
+ cpu_restore_state(cs, retaddr, true);
+ insn = cpu_ldl_code(env, env->nip);
+
+ cs->exception_index = POWERPC_EXCP_ALIGN;
+ env->error_code = insn & 0x03FF0000;
+ cpu_loop_exit(cs);
+}
diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
index 76d6f3fd5e..7813b1b004 100644
--- a/target/ppc/translate_init.inc.c
+++ b/target/ppc/translate_init.inc.c
@@ -10457,6 +10457,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
cc->set_pc = ppc_cpu_set_pc;
cc->gdb_read_register = ppc_cpu_gdb_read_register;
cc->gdb_write_register = ppc_cpu_gdb_write_register;
+ cc->do_unaligned_access = ppc_cpu_do_unaligned_access;
#ifdef CONFIG_USER_ONLY
cc->handle_mmu_fault = ppc_cpu_handle_mmu_fault;
#else
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
@ 2018-06-27 9:09 ` David Gibson
2018-06-27 13:52 ` Richard Henderson
0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2018-06-27 9:09 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 3241 bytes --]
On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
> This allows faults from MO_ALIGN to have the same effect
> as from gen_check_align.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
So, most powerpc cpus can handle most unaligned accesses without an
exception. I'm assuming this series won't preclude that?
> ---
> target/ppc/internal.h | 5 +++++
> target/ppc/excp_helper.c | 18 +++++++++++++++++-
> target/ppc/translate_init.inc.c | 1 +
> 3 files changed, 23 insertions(+), 1 deletion(-)
>
> diff --git a/target/ppc/internal.h b/target/ppc/internal.h
> index 1f441c6483..a9bcadff42 100644
> --- a/target/ppc/internal.h
> +++ b/target/ppc/internal.h
> @@ -252,4 +252,9 @@ static inline void putVSR(int n, ppc_vsr_t *vsr, CPUPPCState *env)
> void helper_compute_fprf_float16(CPUPPCState *env, float16 arg);
> void helper_compute_fprf_float32(CPUPPCState *env, float32 arg);
> void helper_compute_fprf_float128(CPUPPCState *env, float128 arg);
> +
> +/* Raise a data fault alignment exception for the specified virtual address */
> +void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr addr,
> + MMUAccessType access_type,
> + int mmu_idx, uintptr_t retaddr);
> #endif /* PPC_INTERNAL_H */
> diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
> index c092fbead0..d6e97a90e0 100644
> --- a/target/ppc/excp_helper.c
> +++ b/target/ppc/excp_helper.c
> @@ -22,7 +22,7 @@
> #include "exec/helper-proto.h"
> #include "exec/exec-all.h"
> #include "exec/cpu_ldst.h"
> -
> +#include "internal.h"
> #include "helper_regs.h"
>
> //#define DEBUG_OP
> @@ -1198,3 +1198,19 @@ void helper_book3s_msgsnd(target_ulong rb)
> qemu_mutex_unlock_iothread();
> }
> #endif
> +
> +void ppc_cpu_do_unaligned_access(CPUState *cs, vaddr vaddr,
> + MMUAccessType access_type,
> + int mmu_idx, uintptr_t retaddr)
> +{
> + CPUPPCState *env = cs->env_ptr;
> + uint32_t insn;
> +
> + /* Restore state and reload the insn we executed, for filling in DSISR. */
> + cpu_restore_state(cs, retaddr, true);
> + insn = cpu_ldl_code(env, env->nip);
> +
> + cs->exception_index = POWERPC_EXCP_ALIGN;
> + env->error_code = insn & 0x03FF0000;
> + cpu_loop_exit(cs);
> +}
> diff --git a/target/ppc/translate_init.inc.c b/target/ppc/translate_init.inc.c
> index 76d6f3fd5e..7813b1b004 100644
> --- a/target/ppc/translate_init.inc.c
> +++ b/target/ppc/translate_init.inc.c
> @@ -10457,6 +10457,7 @@ static void ppc_cpu_class_init(ObjectClass *oc, void *data)
> cc->set_pc = ppc_cpu_set_pc;
> cc->gdb_read_register = ppc_cpu_gdb_read_register;
> cc->gdb_write_register = ppc_cpu_gdb_write_register;
> + cc->do_unaligned_access = ppc_cpu_do_unaligned_access;
> #ifdef CONFIG_USER_ONLY
> cc->handle_mmu_fault = ppc_cpu_handle_mmu_fault;
> #else
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
2018-06-27 9:09 ` David Gibson
@ 2018-06-27 13:52 ` Richard Henderson
2018-06-28 3:46 ` David Gibson
0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-27 13:52 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc
On 06/27/2018 02:09 AM, David Gibson wrote:
> On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
>> This allows faults from MO_ALIGN to have the same effect
>> as from gen_check_align.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>
> So, most powerpc cpus can handle most unaligned accesses without an
> exception. I'm assuming this series won't preclude that?
Correct. This hook will only fire when using MO_ALIGN to
request an alignment check.
r~
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook
2018-06-27 13:52 ` Richard Henderson
@ 2018-06-28 3:46 ` David Gibson
0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-28 3:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 892 bytes --]
On Wed, Jun 27, 2018 at 06:52:49AM -0700, Richard Henderson wrote:
> On 06/27/2018 02:09 AM, David Gibson wrote:
> > On Tue, Jun 26, 2018 at 09:19:09AM -0700, Richard Henderson wrote:
> >> This allows faults from MO_ALIGN to have the same effect
> >> as from gen_check_align.
> >>
> >> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> >
> > So, most powerpc cpus can handle most unaligned accesses without an
> > exception. I'm assuming this series won't preclude that?
>
> Correct. This hook will only fire when using MO_ALIGN to
> request an alignment check.
Thanks for the confirmation, first patch applied to ppc-for-3.0.
Continuing to look at the rest.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-28 3:49 ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
` (11 subsequent siblings)
13 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
Section 1.4 of the Power ISA v3.0B states that both of these
instructions are single-copy atomic. As we cannot (yet) issue
128-bit loads within TCG, use the generic helpers provided.
Since TCG cannot (yet) return a 128-bit value, add a slot within
CPUPPCState for returning the high half of a 128-bit return value.
This solution is preferred to the helper assigning to architectural
registers directly, as it avoids clobbering all TCG live values.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/cpu.h | 3 ++
target/ppc/helper.h | 5 +++
target/ppc/mem_helper.c | 20 ++++++++-
target/ppc/translate.c | 93 ++++++++++++++++++++++++++++++-----------
4 files changed, 95 insertions(+), 26 deletions(-)
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index c7f3fb6b73..973cf44cda 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -1015,6 +1015,9 @@ struct CPUPPCState {
/* Next instruction pointer */
target_ulong nip;
+ /* High part of 128-bit helper return. */
+ uint64_t retxh;
+
int access_type; /* when a memory exception occurs, the access
type is stored here */
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index d751f0e219..3f451a5d7e 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -799,3 +799,8 @@ DEF_HELPER_4(dscliq, void, env, fprp, fprp, i32)
DEF_HELPER_1(tbegin, void, env)
DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
+
+#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
+DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+#endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index a34e604db3..44a8f3445a 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -21,9 +21,9 @@
#include "exec/exec-all.h"
#include "qemu/host-utils.h"
#include "exec/helper-proto.h"
-
#include "helper_regs.h"
#include "exec/cpu_ldst.h"
+#include "tcg.h"
#include "internal.h"
//#define DEBUG_OP
@@ -215,6 +215,24 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg,
return i;
}
+#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
+uint64_t helper_lq_le_parallel(CPUPPCState *env, target_ulong addr,
+ uint32_t opidx)
+{
+ Int128 ret = helper_atomic_ldo_le_mmu(env, addr, opidx, GETPC());
+ env->retxh = int128_gethi(ret);
+ return int128_getlo(ret);
+}
+
+uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
+ uint32_t opidx)
+{
+ Int128 ret = helper_atomic_ldo_be_mmu(env, addr, opidx, GETPC());
+ env->retxh = int128_gethi(ret);
+ return int128_getlo(ret);
+}
+#endif
+
/*****************************************************************************/
/* Altivec extension helpers */
#if defined(HOST_WORDS_BIGENDIAN)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3a215a1dc6..0923cc24e3 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2607,7 +2607,7 @@ static void gen_ld(DisasContext *ctx)
static void gen_lq(DisasContext *ctx)
{
int ra, rd;
- TCGv EA;
+ TCGv EA, hi, lo;
/* lq is a legal user mode instruction starting in ISA 2.07 */
bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
@@ -2633,16 +2633,35 @@ static void gen_lq(DisasContext *ctx)
EA = tcg_temp_new();
gen_addr_imm_index(ctx, EA, 0x0F);
- /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
- necessary 64-bit byteswap already. */
- if (unlikely(ctx->le_mode)) {
- gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
+ /* Note that the low part is always in RD+1, even in LE mode. */
+ lo = cpu_gpr[rd + 1];
+ hi = cpu_gpr[rd];
+
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+ TCGv_i32 oi = tcg_temp_new_i32();
+ if (ctx->le_mode) {
+ tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
+ gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
+ } else {
+ tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
+ gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
+ }
+ tcg_temp_free_i32(oi);
+ tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
+#else
+ /* Restart with exclusive lock. */
+ gen_helper_exit_atomic(cpu_env);
+ ctx->base.is_jmp = DISAS_NORETURN;
+#endif
+ } else if (ctx->le_mode) {
+ tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ);
gen_addr_add(ctx, EA, EA, 8);
- gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
+ tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
} else {
- gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
+ tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ);
gen_addr_add(ctx, EA, EA, 8);
- gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
+ tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
}
tcg_temp_free(EA);
}
@@ -3236,9 +3255,8 @@ STCX(stdcx_, DEF_MEMOP(MO_Q))
/* lqarx */
static void gen_lqarx(DisasContext *ctx)
{
- TCGv EA;
int rd = rD(ctx->opcode);
- TCGv gpr1, gpr2;
+ TCGv EA, hi, lo;
if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) ||
(rd == rB(ctx->opcode)))) {
@@ -3247,24 +3265,49 @@ static void gen_lqarx(DisasContext *ctx)
}
gen_set_access_type(ctx, ACCESS_RES);
- EA = tcg_temp_local_new();
+ EA = tcg_temp_new();
gen_addr_reg_index(ctx, EA);
- gen_check_align(ctx, EA, 15);
- if (unlikely(ctx->le_mode)) {
- gpr1 = cpu_gpr[rd+1];
- gpr2 = cpu_gpr[rd];
- } else {
- gpr1 = cpu_gpr[rd];
- gpr2 = cpu_gpr[rd+1];
- }
- tcg_gen_qemu_ld_i64(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
- tcg_gen_mov_tl(cpu_reserve, EA);
- gen_addr_add(ctx, EA, EA, 8);
- tcg_gen_qemu_ld_i64(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
- tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
- tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
+ /* Note that the low part is always in RD+1, even in LE mode. */
+ lo = cpu_gpr[rd + 1];
+ hi = cpu_gpr[rd];
+
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+ TCGv_i32 oi = tcg_temp_new_i32();
+ if (ctx->le_mode) {
+ tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ | MO_ALIGN_16,
+ ctx->mem_idx));
+ gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
+ } else {
+ tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ | MO_ALIGN_16,
+ ctx->mem_idx));
+ gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
+ }
+ tcg_temp_free_i32(oi);
+ tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
+#else
+ /* Restart with exclusive lock. */
+ gen_helper_exit_atomic(cpu_env);
+ ctx->base.is_jmp = DISAS_NORETURN;
+ tcg_temp_free(EA);
+ return;
+#endif
+ } else if (ctx->le_mode) {
+ tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ | MO_ALIGN_16);
+ tcg_gen_mov_tl(cpu_reserve, EA);
+ gen_addr_add(ctx, EA, EA, 8);
+ tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
+ } else {
+ tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ | MO_ALIGN_16);
+ tcg_gen_mov_tl(cpu_reserve, EA);
+ gen_addr_add(ctx, EA, EA, 8);
+ tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
+ }
tcg_temp_free(EA);
+
+ tcg_gen_st_tl(hi, cpu_env, offsetof(CPUPPCState, reserve_val));
+ tcg_gen_st_tl(lo, cpu_env, offsetof(CPUPPCState, reserve_val2));
}
/* stqcx. */
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
@ 2018-06-28 3:49 ` David Gibson
2018-06-28 15:22 ` Richard Henderson
0 siblings, 1 reply; 23+ messages in thread
From: David Gibson @ 2018-06-28 3:49 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 8966 bytes --]
On Tue, Jun 26, 2018 at 09:19:10AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that both of these
> instructions are single-copy atomic. As we cannot (yet) issue
> 128-bit loads within TCG, use the generic helpers provided.
>
> Since TCG cannot (yet) return a 128-bit value, add a slot within
> CPUPPCState for returning the high half of a 128-bit return value.
> This solution is preferred to the helper assigning to architectural
> registers directly, as it avoids clobbering all TCG live values.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/ppc/cpu.h | 3 ++
> target/ppc/helper.h | 5 +++
> target/ppc/mem_helper.c | 20 ++++++++-
> target/ppc/translate.c | 93 ++++++++++++++++++++++++++++++-----------
> 4 files changed, 95 insertions(+), 26 deletions(-)
>
> diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
> index c7f3fb6b73..973cf44cda 100644
> --- a/target/ppc/cpu.h
> +++ b/target/ppc/cpu.h
> @@ -1015,6 +1015,9 @@ struct CPUPPCState {
> /* Next instruction pointer */
> target_ulong nip;
>
> + /* High part of 128-bit helper return. */
> + uint64_t retxh;
> +
Adding a temporary here is kind of gross. I guess the helper
interface doesn't allow for 128-bit returns, but couldn't you pass a
register number into the helper and have it update the right GPR
without going through a temp?
> int access_type; /* when a memory exception occurs, the access
> type is stored here */
>
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index d751f0e219..3f451a5d7e 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -799,3 +799,8 @@ DEF_HELPER_4(dscliq, void, env, fprp, fprp, i32)
>
> DEF_HELPER_1(tbegin, void, env)
> DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
> +
> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +#endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index a34e604db3..44a8f3445a 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -21,9 +21,9 @@
> #include "exec/exec-all.h"
> #include "qemu/host-utils.h"
> #include "exec/helper-proto.h"
> -
> #include "helper_regs.h"
> #include "exec/cpu_ldst.h"
> +#include "tcg.h"
> #include "internal.h"
>
> //#define DEBUG_OP
> @@ -215,6 +215,24 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg,
> return i;
> }
>
> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> +uint64_t helper_lq_le_parallel(CPUPPCState *env, target_ulong addr,
> + uint32_t opidx)
> +{
> + Int128 ret = helper_atomic_ldo_le_mmu(env, addr, opidx, GETPC());
> + env->retxh = int128_gethi(ret);
> + return int128_getlo(ret);
> +}
> +
> +uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
> + uint32_t opidx)
> +{
> + Int128 ret = helper_atomic_ldo_be_mmu(env, addr, opidx, GETPC());
> + env->retxh = int128_gethi(ret);
> + return int128_getlo(ret);
> +}
> +#endif
> +
> /*****************************************************************************/
> /* Altivec extension helpers */
> #if defined(HOST_WORDS_BIGENDIAN)
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 3a215a1dc6..0923cc24e3 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2607,7 +2607,7 @@ static void gen_ld(DisasContext *ctx)
> static void gen_lq(DisasContext *ctx)
> {
> int ra, rd;
> - TCGv EA;
> + TCGv EA, hi, lo;
>
> /* lq is a legal user mode instruction starting in ISA 2.07 */
> bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> @@ -2633,16 +2633,35 @@ static void gen_lq(DisasContext *ctx)
> EA = tcg_temp_new();
> gen_addr_imm_index(ctx, EA, 0x0F);
>
> - /* We only need to swap high and low halves. gen_qemu_ld64_i64 does
> - necessary 64-bit byteswap already. */
> - if (unlikely(ctx->le_mode)) {
> - gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
> + /* Note that the low part is always in RD+1, even in LE mode. */
> + lo = cpu_gpr[rd + 1];
> + hi = cpu_gpr[rd];
> +
> + if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> + TCGv_i32 oi = tcg_temp_new_i32();
> + if (ctx->le_mode) {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> + gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
> + } else {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> + gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
> + }
> + tcg_temp_free_i32(oi);
> + tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
> +#else
> + /* Restart with exclusive lock. */
> + gen_helper_exit_atomic(cpu_env);
> + ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> + } else if (ctx->le_mode) {
> + tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ);
> gen_addr_add(ctx, EA, EA, 8);
> - gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
> + tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
> } else {
> - gen_qemu_ld64_i64(ctx, cpu_gpr[rd], EA);
> + tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ);
> gen_addr_add(ctx, EA, EA, 8);
> - gen_qemu_ld64_i64(ctx, cpu_gpr[rd + 1], EA);
> + tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
> }
> tcg_temp_free(EA);
> }
> @@ -3236,9 +3255,8 @@ STCX(stdcx_, DEF_MEMOP(MO_Q))
> /* lqarx */
> static void gen_lqarx(DisasContext *ctx)
> {
> - TCGv EA;
> int rd = rD(ctx->opcode);
> - TCGv gpr1, gpr2;
> + TCGv EA, hi, lo;
>
> if (unlikely((rd & 1) || (rd == rA(ctx->opcode)) ||
> (rd == rB(ctx->opcode)))) {
> @@ -3247,24 +3265,49 @@ static void gen_lqarx(DisasContext *ctx)
> }
>
> gen_set_access_type(ctx, ACCESS_RES);
> - EA = tcg_temp_local_new();
> + EA = tcg_temp_new();
> gen_addr_reg_index(ctx, EA);
> - gen_check_align(ctx, EA, 15);
> - if (unlikely(ctx->le_mode)) {
> - gpr1 = cpu_gpr[rd+1];
> - gpr2 = cpu_gpr[rd];
> - } else {
> - gpr1 = cpu_gpr[rd];
> - gpr2 = cpu_gpr[rd+1];
> - }
> - tcg_gen_qemu_ld_i64(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
> - tcg_gen_mov_tl(cpu_reserve, EA);
> - gen_addr_add(ctx, EA, EA, 8);
> - tcg_gen_qemu_ld_i64(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
>
> - tcg_gen_st_tl(gpr1, cpu_env, offsetof(CPUPPCState, reserve_val));
> - tcg_gen_st_tl(gpr2, cpu_env, offsetof(CPUPPCState, reserve_val2));
> + /* Note that the low part is always in RD+1, even in LE mode. */
> + lo = cpu_gpr[rd + 1];
> + hi = cpu_gpr[rd];
> +
> + if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> + TCGv_i32 oi = tcg_temp_new_i32();
> + if (ctx->le_mode) {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ | MO_ALIGN_16,
> + ctx->mem_idx));
> + gen_helper_lq_le_parallel(lo, cpu_env, EA, oi);
> + } else {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ | MO_ALIGN_16,
> + ctx->mem_idx));
> + gen_helper_lq_be_parallel(lo, cpu_env, EA, oi);
> + }
> + tcg_temp_free_i32(oi);
> + tcg_gen_ld_i64(hi, cpu_env, offsetof(CPUPPCState, retxh));
> +#else
> + /* Restart with exclusive lock. */
> + gen_helper_exit_atomic(cpu_env);
> + ctx->base.is_jmp = DISAS_NORETURN;
> + tcg_temp_free(EA);
> + return;
> +#endif
> + } else if (ctx->le_mode) {
> + tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_LEQ | MO_ALIGN_16);
> + tcg_gen_mov_tl(cpu_reserve, EA);
> + gen_addr_add(ctx, EA, EA, 8);
> + tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_LEQ);
> + } else {
> + tcg_gen_qemu_ld_i64(hi, EA, ctx->mem_idx, MO_BEQ | MO_ALIGN_16);
> + tcg_gen_mov_tl(cpu_reserve, EA);
> + gen_addr_add(ctx, EA, EA, 8);
> + tcg_gen_qemu_ld_i64(lo, EA, ctx->mem_idx, MO_BEQ);
> + }
> tcg_temp_free(EA);
> +
> + tcg_gen_st_tl(hi, cpu_env, offsetof(CPUPPCState, reserve_val));
> + tcg_gen_st_tl(lo, cpu_env, offsetof(CPUPPCState, reserve_val2));
> }
>
> /* stqcx. */
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
2018-06-28 3:49 ` David Gibson
@ 2018-06-28 15:22 ` Richard Henderson
2018-06-29 3:33 ` David Gibson
0 siblings, 1 reply; 23+ messages in thread
From: Richard Henderson @ 2018-06-28 15:22 UTC (permalink / raw)
To: David Gibson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 956 bytes --]
On 06/27/2018 08:49 PM, David Gibson wrote:
>> + /* High part of 128-bit helper return. */
>> + uint64_t retxh;
>> +
>
> Adding a temporary here is kind of gross. I guess the helper
> interface doesn't allow for 128-bit returns, but couldn't you pass a
> register number into the helper and have it update the right GPR
> without going through a temp?
I could pass a pointer, but that would cause ...
>> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
>> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
... the helper definitions to lose TCG_CALL_NO_WG, because they *would* write
to a global register. Which would cause TCG to discard all of the global guest
registers cached within host registers.
I've used this secondary memory return before, in target/s390,
and to me it seems cleaner than pointers.
r~
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 508 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX
2018-06-28 15:22 ` Richard Henderson
@ 2018-06-29 3:33 ` David Gibson
0 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29 3:33 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 1304 bytes --]
On Thu, Jun 28, 2018 at 08:22:38AM -0700, Richard Henderson wrote:
> On 06/27/2018 08:49 PM, David Gibson wrote:
> >> + /* High part of 128-bit helper return. */
> >> + uint64_t retxh;
> >> +
> >
> > Adding a temporary here is kind of gross. I guess the helper
> > interface doesn't allow for 128-bit returns, but couldn't you pass a
> > register number into the helper and have it update the right GPR
> > without going through a temp?
>
> I could pass a pointer, but that would cause ...
>
> >> +#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> >> +DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> >> +DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
>
> ... the helper definitions to lose TCG_CALL_NO_WG, because they *would* write
> to a global register. Which would cause TCG to discard all of the global guest
> registers cached within host registers.
>
> I've used this secondary memory return before, in target/s390,
> and to me it seems cleaner than pointers.
Ok, sounds reasonable, applied to ppc-for-3.0.
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 01/13] target/ppc: Add do_unaligned_access hook Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 02/13] target/ppc: Use atomic load for LQ and LQARX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-28 3:51 ` David Gibson
2018-06-29 3:33 ` David Gibson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
` (10 subsequent siblings)
13 siblings, 2 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
Section 1.4 of the Power ISA v3.0B states that this insn is
single-copy atomic. As we cannot (yet) issue 128-bit loads
within TCG, use the generic helpers provided.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/helper.h | 4 ++++
target/ppc/mem_helper.c | 14 ++++++++++++++
target/ppc/translate.c | 35 +++++++++++++++++++++++++++--------
3 files changed, 45 insertions(+), 8 deletions(-)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 3f451a5d7e..cbc1228570 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
#if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
+DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
+ void, env, tl, i64, i64, i32)
+DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
+ void, env, tl, i64, i64, i32)
#endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 44a8f3445a..57e301edc3 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
env->retxh = int128_gethi(ret);
return int128_getlo(ret);
}
+
+void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
+ uint64_t lo, uint64_t hi, uint32_t opidx)
+{
+ Int128 val = int128_make128(lo, hi);
+ helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
+}
+
+void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
+ uint64_t lo, uint64_t hi, uint32_t opidx)
+{
+ Int128 val = int128_make128(lo, hi);
+ helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
+}
#endif
/*****************************************************************************/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0923cc24e3..3d63a62269 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
if ((ctx->opcode & 0x3) == 0x2) { /* stq */
bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
+ TCGv hi, lo;
if (!(ctx->insns_flags & PPC_64BX)) {
gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
@@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
EA = tcg_temp_new();
gen_addr_imm_index(ctx, EA, 0x03);
- /* We only need to swap high and low halves. gen_qemu_st64_i64 does
- necessary 64-bit byteswap already. */
- if (unlikely(ctx->le_mode)) {
- gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
+ /* Note that the low part is always in RS+1, even in LE mode. */
+ lo = cpu_gpr[rs + 1];
+ hi = cpu_gpr[rs];
+
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+#ifdef CONFIG_ATOMIC128
+ TCGv_i32 oi = tcg_temp_new_i32();
+ if (ctx->le_mode) {
+ tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
+ gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
+ } else {
+ tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
+ gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
+ }
+ tcg_temp_free_i32(oi);
+#else
+ /* Restart with exclusive lock. */
+ gen_helper_exit_atomic(cpu_env);
+ ctx->base.is_jmp = DISAS_NORETURN;
+#endif
+ } else if (ctx->le_mode) {
+ tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
gen_addr_add(ctx, EA, EA, 8);
- gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
+ tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
} else {
- gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
+ tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
gen_addr_add(ctx, EA, EA, 8);
- gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
+ tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
}
tcg_temp_free(EA);
} else {
- /* std / stdu*/
+ /* std / stdu */
if (Rc(ctx->opcode)) {
if (unlikely(rA(ctx->opcode) == 0)) {
gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
@ 2018-06-28 3:51 ` David Gibson
2018-06-29 3:33 ` David Gibson
1 sibling, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-28 3:51 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 5147 bytes --]
On Tue, Jun 26, 2018 at 09:19:11AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that this insn is
> single-copy atomic. As we cannot (yet) issue 128-bit loads
nit: s/loads/stores/
> within TCG, use the generic helpers provided.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/ppc/helper.h | 4 ++++
> target/ppc/mem_helper.c | 14 ++++++++++++++
> target/ppc/translate.c | 35 +++++++++++++++++++++++++++--------
> 3 files changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 3f451a5d7e..cbc1228570 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
> #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
> + void, env, tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
> + void, env, tl, i64, i64, i32)
> #endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 44a8f3445a..57e301edc3 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
> env->retxh = int128_gethi(ret);
> return int128_getlo(ret);
> }
> +
> +void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
> + uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> + Int128 val = int128_make128(lo, hi);
> + helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
> +}
> +
> +void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
> + uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> + Int128 val = int128_make128(lo, hi);
> + helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
> +}
> #endif
>
> /*****************************************************************************/
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0923cc24e3..3d63a62269 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
> if ((ctx->opcode & 0x3) == 0x2) { /* stq */
> bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> + TCGv hi, lo;
>
> if (!(ctx->insns_flags & PPC_64BX)) {
> gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> @@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
> EA = tcg_temp_new();
> gen_addr_imm_index(ctx, EA, 0x03);
>
> - /* We only need to swap high and low halves. gen_qemu_st64_i64 does
> - necessary 64-bit byteswap already. */
> - if (unlikely(ctx->le_mode)) {
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> + /* Note that the low part is always in RS+1, even in LE mode. */
> + lo = cpu_gpr[rs + 1];
> + hi = cpu_gpr[rs];
> +
> + if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> + TCGv_i32 oi = tcg_temp_new_i32();
> + if (ctx->le_mode) {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> + gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
> + } else {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> + gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
> + }
> + tcg_temp_free_i32(oi);
> +#else
> + /* Restart with exclusive lock. */
> + gen_helper_exit_atomic(cpu_env);
> + ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> + } else if (ctx->le_mode) {
> + tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
> gen_addr_add(ctx, EA, EA, 8);
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> + tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
> } else {
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> + tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
> gen_addr_add(ctx, EA, EA, 8);
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> + tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
> }
> tcg_temp_free(EA);
> } else {
> - /* std / stdu*/
> + /* std / stdu */
> if (Rc(ctx->opcode)) {
> if (unlikely(rA(ctx->opcode) == 0)) {
> gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ
2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
2018-06-28 3:51 ` David Gibson
@ 2018-06-29 3:33 ` David Gibson
1 sibling, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29 3:33 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 5158 bytes --]
On Tue, Jun 26, 2018 at 09:19:11AM -0700, Richard Henderson wrote:
> Section 1.4 of the Power ISA v3.0B states that this insn is
> single-copy atomic. As we cannot (yet) issue 128-bit loads
> within TCG, use the generic helpers provided.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Applied to ppc-for-3.0, thanks.
> ---
> target/ppc/helper.h | 4 ++++
> target/ppc/mem_helper.c | 14 ++++++++++++++
> target/ppc/translate.c | 35 +++++++++++++++++++++++++++--------
> 3 files changed, 45 insertions(+), 8 deletions(-)
>
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 3f451a5d7e..cbc1228570 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -803,4 +803,8 @@ DEF_HELPER_FLAGS_1(fixup_thrm, TCG_CALL_NO_RWG, void, env)
> #if defined(TARGET_PPC64) && defined(CONFIG_ATOMIC128)
> DEF_HELPER_FLAGS_3(lq_le_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> DEF_HELPER_FLAGS_3(lq_be_parallel, TCG_CALL_NO_WG, i64, env, tl, i32)
> +DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
> + void, env, tl, i64, i64, i32)
> +DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
> + void, env, tl, i64, i64, i32)
> #endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 44a8f3445a..57e301edc3 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -231,6 +231,20 @@ uint64_t helper_lq_be_parallel(CPUPPCState *env, target_ulong addr,
> env->retxh = int128_gethi(ret);
> return int128_getlo(ret);
> }
> +
> +void helper_stq_le_parallel(CPUPPCState *env, target_ulong addr,
> + uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> + Int128 val = int128_make128(lo, hi);
> + helper_atomic_sto_le_mmu(env, addr, val, opidx, GETPC());
> +}
> +
> +void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
> + uint64_t lo, uint64_t hi, uint32_t opidx)
> +{
> + Int128 val = int128_make128(lo, hi);
> + helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
> +}
> #endif
>
> /*****************************************************************************/
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0923cc24e3..3d63a62269 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -2760,6 +2760,7 @@ static void gen_std(DisasContext *ctx)
> if ((ctx->opcode & 0x3) == 0x2) { /* stq */
> bool legal_in_user_mode = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> bool le_is_supported = (ctx->insns_flags2 & PPC2_LSQ_ISA207) != 0;
> + TCGv hi, lo;
>
> if (!(ctx->insns_flags & PPC_64BX)) {
> gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
> @@ -2783,20 +2784,38 @@ static void gen_std(DisasContext *ctx)
> EA = tcg_temp_new();
> gen_addr_imm_index(ctx, EA, 0x03);
>
> - /* We only need to swap high and low halves. gen_qemu_st64_i64 does
> - necessary 64-bit byteswap already. */
> - if (unlikely(ctx->le_mode)) {
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> + /* Note that the low part is always in RS+1, even in LE mode. */
> + lo = cpu_gpr[rs + 1];
> + hi = cpu_gpr[rs];
> +
> + if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
> +#ifdef CONFIG_ATOMIC128
> + TCGv_i32 oi = tcg_temp_new_i32();
> + if (ctx->le_mode) {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_LEQ, ctx->mem_idx));
> + gen_helper_stq_le_parallel(cpu_env, EA, lo, hi, oi);
> + } else {
> + tcg_gen_movi_i32(oi, make_memop_idx(MO_BEQ, ctx->mem_idx));
> + gen_helper_stq_be_parallel(cpu_env, EA, lo, hi, oi);
> + }
> + tcg_temp_free_i32(oi);
> +#else
> + /* Restart with exclusive lock. */
> + gen_helper_exit_atomic(cpu_env);
> + ctx->base.is_jmp = DISAS_NORETURN;
> +#endif
> + } else if (ctx->le_mode) {
> + tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_LEQ);
> gen_addr_add(ctx, EA, EA, 8);
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> + tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_LEQ);
> } else {
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs], EA);
> + tcg_gen_qemu_st_i64(hi, EA, ctx->mem_idx, MO_BEQ);
> gen_addr_add(ctx, EA, EA, 8);
> - gen_qemu_st64_i64(ctx, cpu_gpr[rs + 1], EA);
> + tcg_gen_qemu_st_i64(lo, EA, ctx->mem_idx, MO_BEQ);
> }
> tcg_temp_free(EA);
> } else {
> - /* std / stdu*/
> + /* std / stdu */
> if (Rc(ctx->opcode)) {
> if (unlikely(rA(ctx->opcode) == 0)) {
> gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (2 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 03/13] target/ppc: Use atomic store for STQ Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
` (9 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
When running in a parallel context, we must use a helper in order
to perform the 128-bit atomic operation. When running in a serial
context, do the compare before the store.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/helper.h | 2 +
target/ppc/mem_helper.c | 38 +++++++++++++++++
target/ppc/translate.c | 95 ++++++++++++++++++++++++++---------------
3 files changed, 101 insertions(+), 34 deletions(-)
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index cbc1228570..5706c2497f 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -807,4 +807,6 @@ DEF_HELPER_FLAGS_5(stq_le_parallel, TCG_CALL_NO_WG,
void, env, tl, i64, i64, i32)
DEF_HELPER_FLAGS_5(stq_be_parallel, TCG_CALL_NO_WG,
void, env, tl, i64, i64, i32)
+DEF_HELPER_5(stqcx_le_parallel, i32, env, tl, i64, i64, i32)
+DEF_HELPER_5(stqcx_be_parallel, i32, env, tl, i64, i64, i32)
#endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 57e301edc3..8f0d86d104 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -245,6 +245,44 @@ void helper_stq_be_parallel(CPUPPCState *env, target_ulong addr,
Int128 val = int128_make128(lo, hi);
helper_atomic_sto_be_mmu(env, addr, val, opidx, GETPC());
}
+
+uint32_t helper_stqcx_le_parallel(CPUPPCState *env, target_ulong addr,
+ uint64_t new_lo, uint64_t new_hi,
+ uint32_t opidx)
+{
+ bool success = false;
+
+ if (likely(addr == env->reserve_addr)) {
+ Int128 oldv, cmpv, newv;
+
+ cmpv = int128_make128(env->reserve_val2, env->reserve_val);
+ newv = int128_make128(new_lo, new_hi);
+ oldv = helper_atomic_cmpxchgo_le_mmu(env, addr, cmpv, newv,
+ opidx, GETPC());
+ success = int128_eq(oldv, cmpv);
+ }
+ env->reserve_addr = -1;
+ return env->so + success * CRF_EQ_BIT;
+}
+
+uint32_t helper_stqcx_be_parallel(CPUPPCState *env, target_ulong addr,
+ uint64_t new_lo, uint64_t new_hi,
+ uint32_t opidx)
+{
+ bool success = false;
+
+ if (likely(addr == env->reserve_addr)) {
+ Int128 oldv, cmpv, newv;
+
+ cmpv = int128_make128(env->reserve_val2, env->reserve_val);
+ newv = int128_make128(new_lo, new_hi);
+ oldv = helper_atomic_cmpxchgo_be_mmu(env, addr, cmpv, newv,
+ opidx, GETPC());
+ success = int128_eq(oldv, cmpv);
+ }
+ env->reserve_addr = -1;
+ return env->so + success * CRF_EQ_BIT;
+}
#endif
/*****************************************************************************/
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 3d63a62269..c7b9d226eb 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3332,50 +3332,77 @@ static void gen_lqarx(DisasContext *ctx)
/* stqcx. */
static void gen_stqcx_(DisasContext *ctx)
{
- TCGv EA;
- int reg = rS(ctx->opcode);
- int len = 16;
-#if !defined(CONFIG_USER_ONLY)
- TCGLabel *l1;
- TCGv gpr1, gpr2;
-#endif
+ int rs = rS(ctx->opcode);
+ TCGv EA, hi, lo;
- if (unlikely((rD(ctx->opcode) & 1))) {
+ if (unlikely(rs & 1)) {
gen_inval_exception(ctx, POWERPC_EXCP_INVAL_INVAL);
return;
}
+
gen_set_access_type(ctx, ACCESS_RES);
- EA = tcg_temp_local_new();
+ EA = tcg_temp_new();
gen_addr_reg_index(ctx, EA);
- if (len > 1) {
- gen_check_align(ctx, EA, (len) - 1);
- }
-#if defined(CONFIG_USER_ONLY)
- gen_conditional_store(ctx, EA, reg, 16);
+ /* Note that the low part is always in RS+1, even in LE mode. */
+ lo = cpu_gpr[rs + 1];
+ hi = cpu_gpr[rs];
+
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+ TCGv_i32 oi = tcg_const_i32(DEF_MEMOP(MO_Q) | MO_ALIGN_16);
+#ifdef CONFIG_ATOMIC128
+ if (ctx->le_mode) {
+ gen_helper_stqcx_le_parallel(cpu_crf[0], cpu_env, EA, lo, hi, oi);
+ } else {
+ gen_helper_stqcx_le_parallel(cpu_crf[0], cpu_env, EA, lo, hi, oi);
+ }
#else
- tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
- l1 = gen_new_label();
- tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
- tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
-
- if (unlikely(ctx->le_mode)) {
- gpr1 = cpu_gpr[reg + 1];
- gpr2 = cpu_gpr[reg];
- } else {
- gpr1 = cpu_gpr[reg];
- gpr2 = cpu_gpr[reg + 1];
- }
- tcg_gen_qemu_st_tl(gpr1, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
- gen_addr_add(ctx, EA, EA, 8);
- tcg_gen_qemu_st_tl(gpr2, EA, ctx->mem_idx, DEF_MEMOP(MO_Q));
-
- gen_set_label(l1);
- tcg_gen_movi_tl(cpu_reserve, -1);
+ /* Restart with exclusive lock. */
+ gen_helper_exit_atomic(cpu_env);
+ ctx->base.is_jmp = DISAS_NORETURN;
#endif
- tcg_temp_free(EA);
-}
+ tcg_temp_free(EA);
+ tcg_temp_free_i32(oi);
+ } else {
+ TCGLabel *lab_fail = gen_new_label();
+ TCGLabel *lab_over = gen_new_label();
+ TCGv_i64 t0 = tcg_temp_new_i64();
+ TCGv_i64 t1 = tcg_temp_new_i64();
+ tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, lab_fail);
+ tcg_temp_free(EA);
+
+ gen_qemu_ld64_i64(ctx, t0, cpu_reserve);
+ tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
+ ? offsetof(CPUPPCState, reserve_val2)
+ : offsetof(CPUPPCState, reserve_val)));
+ tcg_gen_brcond_i64(TCG_COND_NE, t0, t1, lab_fail);
+
+ tcg_gen_addi_i64(t0, cpu_reserve, 8);
+ gen_qemu_ld64_i64(ctx, t0, t0);
+ tcg_gen_ld_i64(t1, cpu_env, (ctx->le_mode
+ ? offsetof(CPUPPCState, reserve_val)
+ : offsetof(CPUPPCState, reserve_val2)));
+ tcg_gen_brcond_i64(TCG_COND_NE, t0, t1, lab_fail);
+
+ /* Success */
+ gen_qemu_st64_i64(ctx, ctx->le_mode ? lo : hi, cpu_reserve);
+ tcg_gen_addi_i64(t0, cpu_reserve, 8);
+ gen_qemu_st64_i64(ctx, ctx->le_mode ? hi : lo, t0);
+
+ tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+ tcg_gen_ori_i32(cpu_crf[0], cpu_crf[0], CRF_EQ);
+ tcg_gen_br(lab_over);
+
+ gen_set_label(lab_fail);
+ tcg_gen_trunc_tl_i32(cpu_crf[0], cpu_so);
+
+ gen_set_label(lab_over);
+ tcg_gen_movi_tl(cpu_reserve, -1);
+ tcg_temp_free_i64(t0);
+ tcg_temp_free_i64(t1);
+ }
+}
#endif /* defined(TARGET_PPC64) */
/* sync */
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (3 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 04/13] target/ppc: Use atomic cmpxchg for STQCX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
` (8 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
Always use the gen_conditional_store implementation that uses
atomic_cmpxchg. Make sure and clear reserve_addr across most
interrupts crossing the cpu_loop.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/cpu.h | 5 --
linux-user/ppc/cpu_loop.c | 123 +++++++-------------------------------
target/ppc/translate.c | 14 -----
3 files changed, 23 insertions(+), 119 deletions(-)
diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index 973cf44cda..4edcf62cf7 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -196,7 +196,6 @@ enum {
/* QEMU exceptions: special cases we want to stop translation */
POWERPC_EXCP_SYNC = 0x202, /* context synchronizing instruction */
POWERPC_EXCP_SYSCALL_USER = 0x203, /* System call in user mode only */
- POWERPC_EXCP_STCX = 0x204 /* Conditional stores in user mode */
};
/* Exceptions error codes */
@@ -994,10 +993,6 @@ struct CPUPPCState {
/* Reservation value */
target_ulong reserve_val;
target_ulong reserve_val2;
- /* Reservation store address */
- target_ulong reserve_ea;
- /* Reserved store source register and size */
- target_ulong reserve_info;
/* Those ones are used in supervisor mode only */
/* machine state register */
diff --git a/linux-user/ppc/cpu_loop.c b/linux-user/ppc/cpu_loop.c
index 2fb516cb00..133a87f349 100644
--- a/linux-user/ppc/cpu_loop.c
+++ b/linux-user/ppc/cpu_loop.c
@@ -65,99 +65,23 @@ int ppc_dcr_write (ppc_dcr_t *dcr_env, int dcrn, uint32_t val)
return -1;
}
-static int do_store_exclusive(CPUPPCState *env)
-{
- target_ulong addr;
- target_ulong page_addr;
- target_ulong val, val2 __attribute__((unused)) = 0;
- int flags;
- int segv = 0;
-
- addr = env->reserve_ea;
- page_addr = addr & TARGET_PAGE_MASK;
- start_exclusive();
- mmap_lock();
- flags = page_get_flags(page_addr);
- if ((flags & PAGE_READ) == 0) {
- segv = 1;
- } else {
- int reg = env->reserve_info & 0x1f;
- int size = env->reserve_info >> 5;
- int stored = 0;
-
- if (addr == env->reserve_addr) {
- switch (size) {
- case 1: segv = get_user_u8(val, addr); break;
- case 2: segv = get_user_u16(val, addr); break;
- case 4: segv = get_user_u32(val, addr); break;
-#if defined(TARGET_PPC64)
- case 8: segv = get_user_u64(val, addr); break;
- case 16: {
- segv = get_user_u64(val, addr);
- if (!segv) {
- segv = get_user_u64(val2, addr + 8);
- }
- break;
- }
-#endif
- default: abort();
- }
- if (!segv && val == env->reserve_val) {
- val = env->gpr[reg];
- switch (size) {
- case 1: segv = put_user_u8(val, addr); break;
- case 2: segv = put_user_u16(val, addr); break;
- case 4: segv = put_user_u32(val, addr); break;
-#if defined(TARGET_PPC64)
- case 8: segv = put_user_u64(val, addr); break;
- case 16: {
- if (val2 == env->reserve_val2) {
- if (msr_le) {
- val2 = val;
- val = env->gpr[reg+1];
- } else {
- val2 = env->gpr[reg+1];
- }
- segv = put_user_u64(val, addr);
- if (!segv) {
- segv = put_user_u64(val2, addr + 8);
- }
- }
- break;
- }
-#endif
- default: abort();
- }
- if (!segv) {
- stored = 1;
- }
- }
- }
- env->crf[0] = (stored << 1) | xer_so;
- env->reserve_addr = (target_ulong)-1;
- }
- if (!segv) {
- env->nip += 4;
- }
- mmap_unlock();
- end_exclusive();
- return segv;
-}
-
void cpu_loop(CPUPPCState *env)
{
CPUState *cs = CPU(ppc_env_get_cpu(env));
target_siginfo_t info;
- int trapnr;
+ int trapnr, sig;
target_ulong ret;
for(;;) {
+ bool arch_interrupt;
+
cpu_exec_start(cs);
trapnr = cpu_exec(cs);
cpu_exec_end(cs);
process_queued_cpu_work(cs);
- switch(trapnr) {
+ arch_interrupt = true;
+ switch (trapnr) {
case POWERPC_EXCP_NONE:
/* Just go on */
break;
@@ -524,26 +448,15 @@ void cpu_loop(CPUPPCState *env)
}
env->gpr[3] = ret;
break;
- case POWERPC_EXCP_STCX:
- if (do_store_exclusive(env)) {
- info.si_signo = TARGET_SIGSEGV;
- info.si_errno = 0;
- info.si_code = TARGET_SEGV_MAPERR;
- info._sifields._sigfault._addr = env->nip;
- queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
- }
- break;
case EXCP_DEBUG:
- {
- int sig;
-
- sig = gdb_handlesig(cs, TARGET_SIGTRAP);
- if (sig) {
- info.si_signo = sig;
- info.si_errno = 0;
- info.si_code = TARGET_TRAP_BRKPT;
- queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
- }
+ sig = gdb_handlesig(cs, TARGET_SIGTRAP);
+ if (sig) {
+ info.si_signo = sig;
+ info.si_errno = 0;
+ info.si_code = TARGET_TRAP_BRKPT;
+ queue_signal(env, info.si_signo, QEMU_SI_FAULT, &info);
+ } else {
+ arch_interrupt = false;
}
break;
case EXCP_INTERRUPT:
@@ -551,12 +464,22 @@ void cpu_loop(CPUPPCState *env)
break;
case EXCP_ATOMIC:
cpu_exec_step_atomic(cs);
+ arch_interrupt = false;
break;
default:
cpu_abort(cs, "Unknown exception 0x%x. Aborting\n", trapnr);
break;
}
process_pending_signals(env);
+
+ /* Most of the traps imply a transition through kernel mode,
+ * which implies an REI instruction has been executed. Which
+ * means that RX and LOCK_ADDR should be cleared. But there
+ * are a few exceptions for traps internal to QEMU.
+ */
+ if (arch_interrupt) {
+ env->reserve_addr = -1;
+ }
}
}
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c7b9d226eb..03e8c5df03 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3201,19 +3201,6 @@ ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
#endif
-#if defined(CONFIG_USER_ONLY)
-static void gen_conditional_store(DisasContext *ctx, TCGv EA,
- int reg, int memop)
-{
- TCGv t0 = tcg_temp_new();
-
- tcg_gen_st_tl(EA, cpu_env, offsetof(CPUPPCState, reserve_ea));
- tcg_gen_movi_tl(t0, (MEMOP_GET_SIZE(memop) << 5) | reg);
- tcg_gen_st_tl(t0, cpu_env, offsetof(CPUPPCState, reserve_info));
- tcg_temp_free(t0);
- gen_exception_err(ctx, POWERPC_EXCP_STCX, 0);
-}
-#else
static void gen_conditional_store(DisasContext *ctx, TCGv EA,
int reg, int memop)
{
@@ -3244,7 +3231,6 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
gen_set_label(l2);
tcg_gen_movi_tl(cpu_reserve, -1);
}
-#endif
#define STCX(name, memop) \
static void gen_##name(DisasContext *ctx) \
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (4 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 05/13] target/ppc: Remove POWERPC_EXCP_STCX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
` (7 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
Leave only the minimal amount of code within the STCX macro,
moving the rest of the code into gen_conditional_store.
Remove the explicit call to gen_check_align; the matching LDAX will
have already checked alignment, and we verify the same address.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 28 +++++++++++-----------------
1 file changed, 11 insertions(+), 17 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 03e8c5df03..e751072404 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3201,14 +3201,17 @@ ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
#endif
-static void gen_conditional_store(DisasContext *ctx, TCGv EA,
- int reg, int memop)
+static void gen_conditional_store(DisasContext *ctx, TCGMemOp memop)
{
TCGLabel *l1 = gen_new_label();
TCGLabel *l2 = gen_new_label();
- TCGv t0;
+ TCGv t0 = tcg_temp_new();
+ int reg = rS(ctx->opcode);
- tcg_gen_brcond_tl(TCG_COND_NE, EA, cpu_reserve, l1);
+ gen_set_access_type(ctx, ACCESS_RES);
+ gen_addr_reg_index(ctx, t0);
+ tcg_gen_brcond_tl(TCG_COND_NE, t0, cpu_reserve, l1);
+ tcg_temp_free(t0);
t0 = tcg_temp_new();
tcg_gen_atomic_cmpxchg_tl(t0, cpu_reserve, cpu_reserve_val,
@@ -3232,19 +3235,10 @@ static void gen_conditional_store(DisasContext *ctx, TCGv EA,
tcg_gen_movi_tl(cpu_reserve, -1);
}
-#define STCX(name, memop) \
-static void gen_##name(DisasContext *ctx) \
-{ \
- TCGv t0; \
- int len = MEMOP_GET_SIZE(memop); \
- gen_set_access_type(ctx, ACCESS_RES); \
- t0 = tcg_temp_local_new(); \
- gen_addr_reg_index(ctx, t0); \
- if (len > 1) { \
- gen_check_align(ctx, t0, (len) - 1); \
- } \
- gen_conditional_store(ctx, t0, rS(ctx->opcode), memop); \
- tcg_temp_free(t0); \
+#define STCX(name, memop) \
+static void gen_##name(DisasContext *ctx) \
+{ \
+ gen_conditional_store(ctx, memop); \
}
STCX(stbcx_, DEF_MEMOP(MO_UB))
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (5 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 06/13] target/ppc: Tidy gen_conditional_store Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
` (6 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
Leave only the minimal amount of code within the LDAR macro,
moving the rest of the code into gen_load_locked. Use MO_ALIGN
and remove the explicit call to gen_check_align.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 35 ++++++++++++++++++-----------------
1 file changed, 18 insertions(+), 17 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index e751072404..f48fcbeefb 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3070,23 +3070,24 @@ static void gen_isync(DisasContext *ctx)
#define MEMOP_GET_SIZE(x) (1 << ((x) & MO_SIZE))
-#define LARX(name, memop) \
-static void gen_##name(DisasContext *ctx) \
-{ \
- TCGv t0; \
- TCGv gpr = cpu_gpr[rD(ctx->opcode)]; \
- int len = MEMOP_GET_SIZE(memop); \
- gen_set_access_type(ctx, ACCESS_RES); \
- t0 = tcg_temp_local_new(); \
- gen_addr_reg_index(ctx, t0); \
- if ((len) > 1) { \
- gen_check_align(ctx, t0, (len)-1); \
- } \
- tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop); \
- tcg_gen_mov_tl(cpu_reserve, t0); \
- tcg_gen_mov_tl(cpu_reserve_val, gpr); \
- tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ); \
- tcg_temp_free(t0); \
+static void gen_load_locked(DisasContext *ctx, TCGMemOp memop)
+{
+ TCGv gpr = cpu_gpr[rD(ctx->opcode)];
+ TCGv t0 = tcg_temp_new();
+
+ gen_set_access_type(ctx, ACCESS_RES);
+ gen_addr_reg_index(ctx, t0);
+ tcg_gen_qemu_ld_tl(gpr, t0, ctx->mem_idx, memop | MO_ALIGN);
+ tcg_gen_mov_tl(cpu_reserve, t0);
+ tcg_gen_mov_tl(cpu_reserve_val, gpr);
+ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+ tcg_temp_free(t0);
+}
+
+#define LARX(name, memop) \
+static void gen_##name(DisasContext *ctx) \
+{ \
+ gen_load_locked(ctx, memop); \
}
/* lwarx */
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (6 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 07/13] target/ppc: Split out gen_load_locked Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
` (5 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
Move the guts of LD_ATOMIC to a function. Use foo_tl for the operations
instead of foo_i32 or foo_i64 specifically. Use MO_ALIGN instead of an
explicit call to gen_check_align.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 105 ++++++++++++++++++++---------------------
1 file changed, 52 insertions(+), 53 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index f48fcbeefb..361b178db8 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3095,61 +3095,60 @@ LARX(lbarx, DEF_MEMOP(MO_UB))
LARX(lharx, DEF_MEMOP(MO_UW))
LARX(lwarx, DEF_MEMOP(MO_UL))
-#define LD_ATOMIC(name, memop, tp, op, eop) \
-static void gen_##name(DisasContext *ctx) \
-{ \
- int len = MEMOP_GET_SIZE(memop); \
- uint32_t gpr_FC = FC(ctx->opcode); \
- TCGv EA = tcg_temp_local_new(); \
- TCGv_##tp t0, t1; \
- \
- gen_addr_register(ctx, EA); \
- if (len > 1) { \
- gen_check_align(ctx, EA, len - 1); \
- } \
- t0 = tcg_temp_new_##tp(); \
- t1 = tcg_temp_new_##tp(); \
- tcg_gen_##op(t0, cpu_gpr[rD(ctx->opcode) + 1]); \
- \
- switch (gpr_FC) { \
- case 0: /* Fetch and add */ \
- tcg_gen_atomic_fetch_add_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 1: /* Fetch and xor */ \
- tcg_gen_atomic_fetch_xor_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 2: /* Fetch and or */ \
- tcg_gen_atomic_fetch_or_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 3: /* Fetch and 'and' */ \
- tcg_gen_atomic_fetch_and_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 8: /* Swap */ \
- tcg_gen_atomic_xchg_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 4: /* Fetch and max unsigned */ \
- case 5: /* Fetch and max signed */ \
- case 6: /* Fetch and min unsigned */ \
- case 7: /* Fetch and min signed */ \
- case 16: /* compare and swap not equal */ \
- case 24: /* Fetch and increment bounded */ \
- case 25: /* Fetch and increment equal */ \
- case 28: /* Fetch and decrement bounded */ \
- gen_invalid(ctx); \
- break; \
- default: \
- /* invoke data storage error handler */ \
- gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL); \
- } \
- tcg_gen_##eop(cpu_gpr[rD(ctx->opcode)], t1); \
- tcg_temp_free_##tp(t0); \
- tcg_temp_free_##tp(t1); \
- tcg_temp_free(EA); \
+static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
+{
+ uint32_t gpr_FC = FC(ctx->opcode);
+ TCGv EA = tcg_temp_new();
+ TCGv src, dst;
+
+ gen_addr_register(ctx, EA);
+ dst = cpu_gpr[rD(ctx->opcode)];
+ src = cpu_gpr[rD(ctx->opcode) + 1];
+
+ memop |= MO_ALIGN;
+ switch (gpr_FC) {
+ case 0: /* Fetch and add */
+ tcg_gen_atomic_fetch_add_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 1: /* Fetch and xor */
+ tcg_gen_atomic_fetch_xor_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 2: /* Fetch and or */
+ tcg_gen_atomic_fetch_or_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 3: /* Fetch and 'and' */
+ tcg_gen_atomic_fetch_and_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 8: /* Swap */
+ tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 4: /* Fetch and max unsigned */
+ case 5: /* Fetch and max signed */
+ case 6: /* Fetch and min unsigned */
+ case 7: /* Fetch and min signed */
+ case 16: /* compare and swap not equal */
+ case 24: /* Fetch and increment bounded */
+ case 25: /* Fetch and increment equal */
+ case 28: /* Fetch and decrement bounded */
+ gen_invalid(ctx);
+ break;
+ default:
+ /* invoke data storage error handler */
+ gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
+ }
+ tcg_temp_free(EA);
}
-LD_ATOMIC(lwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32, extu_i32_tl)
-#if defined(TARGET_PPC64)
-LD_ATOMIC(ldat, DEF_MEMOP(MO_Q), i64, mov_i64, mov_i64)
+static void gen_lwat(DisasContext *ctx)
+{
+ gen_ld_atomic(ctx, DEF_MEMOP(MO_UL));
+}
+
+#ifdef TARGET_PPC64
+static void gen_ldat(DisasContext *ctx)
+{
+ gen_ld_atomic(ctx, DEF_MEMOP(MO_Q));
+}
#endif
#define ST_ATOMIC(name, memop, tp, op) \
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (7 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 08/13] target/ppc: Split out gen_ld_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
` (4 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
Move the guts of ST_ATOMIC to a function. Use foo_tl for the operations
instead of foo_i32 or foo_i64 specifically. Use MO_ALIGN instead of an
explicit call to gen_check_align.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 93 +++++++++++++++++++++---------------------
1 file changed, 47 insertions(+), 46 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 361b178db8..53ca8f0114 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3151,54 +3151,55 @@ static void gen_ldat(DisasContext *ctx)
}
#endif
-#define ST_ATOMIC(name, memop, tp, op) \
-static void gen_##name(DisasContext *ctx) \
-{ \
- int len = MEMOP_GET_SIZE(memop); \
- uint32_t gpr_FC = FC(ctx->opcode); \
- TCGv EA = tcg_temp_local_new(); \
- TCGv_##tp t0, t1; \
- \
- gen_addr_register(ctx, EA); \
- if (len > 1) { \
- gen_check_align(ctx, EA, len - 1); \
- } \
- t0 = tcg_temp_new_##tp(); \
- t1 = tcg_temp_new_##tp(); \
- tcg_gen_##op(t0, cpu_gpr[rD(ctx->opcode) + 1]); \
- \
- switch (gpr_FC) { \
- case 0: /* add and Store */ \
- tcg_gen_atomic_add_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 1: /* xor and Store */ \
- tcg_gen_atomic_xor_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 2: /* Or and Store */ \
- tcg_gen_atomic_or_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 3: /* 'and' and Store */ \
- tcg_gen_atomic_and_fetch_##tp(t1, EA, t0, ctx->mem_idx, memop); \
- break; \
- case 4: /* Store max unsigned */ \
- case 5: /* Store max signed */ \
- case 6: /* Store min unsigned */ \
- case 7: /* Store min signed */ \
- case 24: /* Store twin */ \
- gen_invalid(ctx); \
- break; \
- default: \
- /* invoke data storage error handler */ \
- gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL); \
- } \
- tcg_temp_free_##tp(t0); \
- tcg_temp_free_##tp(t1); \
- tcg_temp_free(EA); \
+static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
+{
+ uint32_t gpr_FC = FC(ctx->opcode);
+ TCGv EA = tcg_temp_new();
+ TCGv src, discard;
+
+ gen_addr_register(ctx, EA);
+ src = cpu_gpr[rD(ctx->opcode)];
+ discard = tcg_temp_new();
+
+ memop |= MO_ALIGN;
+ switch (gpr_FC) {
+ case 0: /* add and Store */
+ tcg_gen_atomic_add_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
+ case 1: /* xor and Store */
+ tcg_gen_atomic_xor_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
+ case 2: /* Or and Store */
+ tcg_gen_atomic_or_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
+ case 3: /* 'and' and Store */
+ tcg_gen_atomic_and_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
+ case 4: /* Store max unsigned */
+ case 5: /* Store max signed */
+ case 6: /* Store min unsigned */
+ case 7: /* Store min signed */
+ case 24: /* Store twin */
+ gen_invalid(ctx);
+ break;
+ default:
+ /* invoke data storage error handler */
+ gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
+ }
+ tcg_temp_free(discard);
+ tcg_temp_free(EA);
}
-ST_ATOMIC(stwat, DEF_MEMOP(MO_UL), i32, trunc_tl_i32)
-#if defined(TARGET_PPC64)
-ST_ATOMIC(stdat, DEF_MEMOP(MO_Q), i64, mov_i64)
+static void gen_stwat(DisasContext *ctx)
+{
+ gen_st_atomic(ctx, DEF_MEMOP(MO_UL));
+}
+
+#ifdef TARGET_PPC64
+static void gen_stdat(DisasContext *ctx)
+{
+ gen_st_atomic(ctx, DEF_MEMOP(MO_Q));
+}
#endif
static void gen_conditional_store(DisasContext *ctx, TCGMemOp memop)
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (8 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 09/13] target/ppc: Split out gen_st_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
` (3 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
This avoids the need for gen_check_align entirely.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 25 ++++---------------------
1 file changed, 4 insertions(+), 21 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 53ca8f0114..c2a28be6d7 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -2388,23 +2388,6 @@ static inline void gen_addr_add(DisasContext *ctx, TCGv ret, TCGv arg1,
}
}
-static inline void gen_check_align(DisasContext *ctx, TCGv EA, int mask)
-{
- TCGLabel *l1 = gen_new_label();
- TCGv t0 = tcg_temp_new();
- TCGv_i32 t1, t2;
- tcg_gen_andi_tl(t0, EA, mask);
- tcg_gen_brcondi_tl(TCG_COND_EQ, t0, 0, l1);
- t1 = tcg_const_i32(POWERPC_EXCP_ALIGN);
- t2 = tcg_const_i32(ctx->opcode & 0x03FF0000);
- gen_update_nip(ctx, ctx->base.pc_next - 4);
- gen_helper_raise_exception_err(cpu_env, t1, t2);
- tcg_temp_free_i32(t1);
- tcg_temp_free_i32(t2);
- gen_set_label(l1);
- tcg_temp_free(t0);
-}
-
static inline void gen_align_no_le(DisasContext *ctx)
{
gen_exception_err(ctx, POWERPC_EXCP_ALIGN,
@@ -4706,8 +4689,8 @@ static void gen_eciwx(DisasContext *ctx)
gen_set_access_type(ctx, ACCESS_EXT);
t0 = tcg_temp_new();
gen_addr_reg_index(ctx, t0);
- gen_check_align(ctx, t0, 0x03);
- gen_qemu_ld32u(ctx, cpu_gpr[rD(ctx->opcode)], t0);
+ tcg_gen_qemu_ld_tl(cpu_gpr[rD(ctx->opcode)], t0, ctx->mem_idx,
+ DEF_MEMOP(MO_UL | MO_ALIGN));
tcg_temp_free(t0);
}
@@ -4719,8 +4702,8 @@ static void gen_ecowx(DisasContext *ctx)
gen_set_access_type(ctx, ACCESS_EXT);
t0 = tcg_temp_new();
gen_addr_reg_index(ctx, t0);
- gen_check_align(ctx, t0, 0x03);
- gen_qemu_st32(ctx, cpu_gpr[rD(ctx->opcode)], t0);
+ tcg_gen_qemu_st_tl(cpu_gpr[rD(ctx->opcode)], t0, ctx->mem_idx,
+ DEF_MEMOP(MO_UL | MO_ALIGN));
tcg_temp_free(t0);
}
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (9 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 10/13] target/ppc: Use MO_ALIGN for EXIWX and ECOWX Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
` (2 subsequent siblings)
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
These operations were previously unimplemented for ppc.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 24 ++++++++++++++++++++----
1 file changed, 20 insertions(+), 4 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index c2a28be6d7..79285b6698 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3102,13 +3102,21 @@ static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
case 3: /* Fetch and 'and' */
tcg_gen_atomic_fetch_and_tl(dst, EA, src, ctx->mem_idx, memop);
break;
+ case 4: /* Fetch and max unsigned */
+ tcg_gen_atomic_fetch_umax_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 5: /* Fetch and max signed */
+ tcg_gen_atomic_fetch_smax_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 6: /* Fetch and min unsigned */
+ tcg_gen_atomic_fetch_umin_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
+ case 7: /* Fetch and min signed */
+ tcg_gen_atomic_fetch_smin_tl(dst, EA, src, ctx->mem_idx, memop);
+ break;
case 8: /* Swap */
tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
break;
- case 4: /* Fetch and max unsigned */
- case 5: /* Fetch and max signed */
- case 6: /* Fetch and min unsigned */
- case 7: /* Fetch and min signed */
case 16: /* compare and swap not equal */
case 24: /* Fetch and increment bounded */
case 25: /* Fetch and increment equal */
@@ -3159,9 +3167,17 @@ static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
tcg_gen_atomic_and_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
break;
case 4: /* Store max unsigned */
+ tcg_gen_atomic_umax_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
case 5: /* Store max signed */
+ tcg_gen_atomic_smax_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
case 6: /* Store min unsigned */
+ tcg_gen_atomic_umin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
case 7: /* Store min signed */
+ tcg_gen_atomic_smin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
+ break;
case 24: /* Store twin */
gen_invalid(ctx);
break;
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (10 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 11/13] target/ppc: Use atomic min/max helpers Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
2018-06-29 4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
These cases were stubbed out. For now, implement them only within
a serial context, forcing parallel execution to synchronize. It
would be possible to implement these with cmpxchg loops, if we care.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 89 ++++++++++++++++++++++++++++++++++++++----
1 file changed, 82 insertions(+), 7 deletions(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 79285b6698..597a37d3ec 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3078,16 +3078,45 @@ LARX(lbarx, DEF_MEMOP(MO_UB))
LARX(lharx, DEF_MEMOP(MO_UW))
LARX(lwarx, DEF_MEMOP(MO_UL))
+static void gen_fetch_inc_conditional(DisasContext *ctx, TCGMemOp memop,
+ TCGv EA, TCGCond cond, int addend)
+{
+ TCGv t = tcg_temp_new();
+ TCGv t2 = tcg_temp_new();
+ TCGv u = tcg_temp_new();
+
+ tcg_gen_qemu_ld_tl(t, EA, ctx->mem_idx, memop);
+ tcg_gen_addi_tl(t2, EA, MEMOP_GET_SIZE(memop));
+ tcg_gen_qemu_ld_tl(t2, t2, ctx->mem_idx, memop);
+ tcg_gen_addi_tl(u, t, addend);
+
+ /* E.g. for fetch and increment bounded... */
+ /* mem(EA,s) = (t != t2 ? u = t + 1 : t) */
+ tcg_gen_movcond_tl(cond, u, t, t2, u, t);
+ tcg_gen_qemu_st_tl(u, EA, ctx->mem_idx, memop);
+
+ /* RT = (t != t2 ? t : u = 1<<(s*8-1)) */
+ tcg_gen_movi_tl(u, 1 << (MEMOP_GET_SIZE(memop) * 8 - 1));
+ tcg_gen_movcond_tl(cond, cpu_gpr[rD(ctx->opcode)], t, t2, t, u);
+
+ tcg_temp_free(t);
+ tcg_temp_free(t2);
+ tcg_temp_free(u);
+}
+
static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
{
uint32_t gpr_FC = FC(ctx->opcode);
TCGv EA = tcg_temp_new();
+ int rt = rD(ctx->opcode);
+ bool need_serial;
TCGv src, dst;
gen_addr_register(ctx, EA);
- dst = cpu_gpr[rD(ctx->opcode)];
- src = cpu_gpr[rD(ctx->opcode) + 1];
+ dst = cpu_gpr[rt];
+ src = cpu_gpr[(rt + 1) & 31];
+ need_serial = false;
memop |= MO_ALIGN;
switch (gpr_FC) {
case 0: /* Fetch and add */
@@ -3117,17 +3146,63 @@ static void gen_ld_atomic(DisasContext *ctx, TCGMemOp memop)
case 8: /* Swap */
tcg_gen_atomic_xchg_tl(dst, EA, src, ctx->mem_idx, memop);
break;
- case 16: /* compare and swap not equal */
- case 24: /* Fetch and increment bounded */
- case 25: /* Fetch and increment equal */
- case 28: /* Fetch and decrement bounded */
- gen_invalid(ctx);
+
+ case 16: /* Compare and swap not equal */
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+ need_serial = true;
+ } else {
+ TCGv t0 = tcg_temp_new();
+ TCGv t1 = tcg_temp_new();
+
+ tcg_gen_qemu_ld_tl(t0, EA, ctx->mem_idx, memop);
+ if ((memop & MO_SIZE) == MO_64 || TARGET_LONG_BITS == 32) {
+ tcg_gen_mov_tl(t1, src);
+ } else {
+ tcg_gen_ext32u_tl(t1, src);
+ }
+ tcg_gen_movcond_tl(TCG_COND_NE, t1, t0, t1,
+ cpu_gpr[(rt + 2) & 31], t0);
+ tcg_gen_qemu_st_tl(t1, EA, ctx->mem_idx, memop);
+ tcg_gen_mov_tl(dst, t0);
+
+ tcg_temp_free(t0);
+ tcg_temp_free(t1);
+ }
break;
+
+ case 24: /* Fetch and increment bounded */
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+ need_serial = true;
+ } else {
+ gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_NE, 1);
+ }
+ break;
+ case 25: /* Fetch and increment equal */
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+ need_serial = true;
+ } else {
+ gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_EQ, 1);
+ }
+ break;
+ case 28: /* Fetch and decrement bounded */
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+ need_serial = true;
+ } else {
+ gen_fetch_inc_conditional(ctx, memop, EA, TCG_COND_NE, -1);
+ }
+ break;
+
default:
/* invoke data storage error handler */
gen_exception_err(ctx, POWERPC_EXCP_DSI, POWERPC_EXCP_INVAL);
}
tcg_temp_free(EA);
+
+ if (need_serial) {
+ /* Restart with exclusive lock. */
+ gen_helper_exit_atomic(cpu_env);
+ ctx->base.is_jmp = DISAS_NORETURN;
+ }
}
static void gen_lwat(DisasContext *ctx)
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (11 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 12/13] target/ppc: Implement the rest of gen_ld_atomic Richard Henderson
@ 2018-06-26 16:19 ` Richard Henderson
2018-06-29 4:15 ` [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations David Gibson
13 siblings, 0 replies; 23+ messages in thread
From: Richard Henderson @ 2018-06-26 16:19 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-ppc, david
The store twin case was stubbed out. For now, implement it only within
a serial context, forcing parallel execution to synchronize. It would
be possible to implement with a cmpxchg loop, if we care, but the loose
alignment requirements (simply no crossing 32-byte boundary) might send
us back to the serial context anyway.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/ppc/translate.c | 26 +++++++++++++++++++++++++-
1 file changed, 25 insertions(+), 1 deletion(-)
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 597a37d3ec..e120f2ed0b 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -3254,7 +3254,31 @@ static void gen_st_atomic(DisasContext *ctx, TCGMemOp memop)
tcg_gen_atomic_smin_fetch_tl(discard, EA, src, ctx->mem_idx, memop);
break;
case 24: /* Store twin */
- gen_invalid(ctx);
+ if (tb_cflags(ctx->base.tb) & CF_PARALLEL) {
+ /* Restart with exclusive lock. */
+ gen_helper_exit_atomic(cpu_env);
+ ctx->base.is_jmp = DISAS_NORETURN;
+ } else {
+ TCGv t = tcg_temp_new();
+ TCGv t2 = tcg_temp_new();
+ TCGv s = tcg_temp_new();
+ TCGv s2 = tcg_temp_new();
+ TCGv ea_plus_s = tcg_temp_new();
+
+ tcg_gen_qemu_ld_tl(t, EA, ctx->mem_idx, memop);
+ tcg_gen_addi_tl(ea_plus_s, EA, MEMOP_GET_SIZE(memop));
+ tcg_gen_qemu_ld_tl(t2, ea_plus_s, ctx->mem_idx, memop);
+ tcg_gen_movcond_tl(TCG_COND_EQ, s, t, t2, src, t);
+ tcg_gen_movcond_tl(TCG_COND_EQ, s2, t, t2, src, t2);
+ tcg_gen_qemu_st_tl(s, EA, ctx->mem_idx, memop);
+ tcg_gen_qemu_st_tl(s2, ea_plus_s, ctx->mem_idx, memop);
+
+ tcg_temp_free(ea_plus_s);
+ tcg_temp_free(s2);
+ tcg_temp_free(s);
+ tcg_temp_free(t2);
+ tcg_temp_free(t);
+ }
break;
default:
/* invoke data storage error handler */
--
2.17.1
^ permalink raw reply related [flat|nested] 23+ messages in thread
* Re: [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations
2018-06-26 16:19 [Qemu-devel] [PATCH 00/13] target/ppc improve atomic operations Richard Henderson
` (12 preceding siblings ...)
2018-06-26 16:19 ` [Qemu-devel] [PATCH 13/13] target/ppc: Implement the rest of gen_st_atomic Richard Henderson
@ 2018-06-29 4:15 ` David Gibson
13 siblings, 0 replies; 23+ messages in thread
From: David Gibson @ 2018-06-29 4:15 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-ppc
[-- Attachment #1: Type: text/plain, Size: 2578 bytes --]
On Tue, Jun 26, 2018 at 09:19:08AM -0700, Richard Henderson wrote:
> In another patch set this week, I had noticed the old linux-user
> do_store_exclusive code was still present. I had thought that was
> dead code that simply hadn't been removed, but it turned out that
> we had not completed the transition to tcg atomics for linux-user.
>
> In the process, I discovered that we weren't using atomic operations
> for the 128-bit lq, lqarx, and stqcx insns. These would have simply
> produced incorrect results for -smp in system mode.
>
> I tidy the code a bit by making use of MO_ALIGN, which means that
> we don't need a separate explicit alignment check.
>
> I use the new min/max atomic operations I added recently for
> ARMv8.2-Atomics and RISC-V.
>
> Finally, Power9 has some *really* odd atomic operations in its
> l[wd]at and st[wd]at instructions. We were generating illegal
> instruction for these. I implement them for serial context and
> force parallel context to grab the exclusive lock and try again.
>
> Except for the trivial linux-user ll/sc case, I do not have any
> code that exercises these instructions. Perhaps the IBM folk
> have something that can test the others?
I've now applied the whole series to ppc-for-3.0.
>
>
> r~
>
>
> Richard Henderson (13):
> target/ppc: Add do_unaligned_access hook
> target/ppc: Use atomic load for LQ and LQARX
> target/ppc: Use atomic store for STQ
> target/ppc: Use atomic cmpxchg for STQCX
> target/ppc: Remove POWERPC_EXCP_STCX
> target/ppc: Tidy gen_conditional_store
> target/ppc: Split out gen_load_locked
> target/ppc: Split out gen_ld_atomic
> target/ppc: Split out gen_st_atomic
> target/ppc: Use MO_ALIGN for EXIWX and ECOWX
> target/ppc: Use atomic min/max helpers
> target/ppc: Implement the rest of gen_ld_atomic
> target/ppc: Implement the rest of gen_st_atomic
>
> target/ppc/cpu.h | 8 +-
> target/ppc/helper.h | 11 +
> target/ppc/internal.h | 5 +
> linux-user/ppc/cpu_loop.c | 123 ++----
> target/ppc/excp_helper.c | 18 +-
> target/ppc/mem_helper.c | 72 +++-
> target/ppc/translate.c | 648 ++++++++++++++++++++------------
> target/ppc/translate_init.inc.c | 1 +
> 8 files changed, 539 insertions(+), 347 deletions(-)
>
--
David Gibson | I'll have my music baroque, and my code
david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]
^ permalink raw reply [flat|nested] 23+ messages in thread