* [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics @ 2022-05-03 10:33 Nicholas Piggin 2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin ` (2 more replies) 0 siblings, 3 replies; 6+ messages in thread From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw) To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel The generated eieio memory ordering semantics do not match the instruction definition in the architecture. Add a big comment to explain this strange instruction and correct the memory ordering behaviour. Signed-off: Nicholas Piggin <npiggin@gmail.com> --- target/ppc/translate.c | 26 +++++++++++++++++++++++++- 1 file changed, 25 insertions(+), 1 deletion(-) diff --git a/target/ppc/translate.c b/target/ppc/translate.c index fa34f81c30..abb8807180 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -3513,7 +3513,31 @@ static void gen_stswx(DisasContext *ctx) /* eieio */ static void gen_eieio(DisasContext *ctx) { - TCGBar bar = TCG_MO_LD_ST; + TCGBar bar = TCG_MO_ALL; + + /* + * eieio has complex semanitcs. It provides memory ordering between + * operations in the set: + * - loads from CI memory. + * - stores to CI memory. + * - stores to WT memory. + * + * It separately also orders memory for operations in the set: + * - stores to cacheble memory. + * + * It also serializes instructions: + * - dcbt and dcbst. + * + * It separately serializes: + * - tlbie and tlbsync. + * + * And separately serializes: + * - slbieg, slbiag, and slbsync. + * + * The end result is that CI memory ordering requires TCG_MO_ALL + * and it is not possible to special-case more relaxed ordering for + * cacheable accesses. TCG_BAR_SC is required to provide the serialization. + */ /* * POWER9 has a eieio instruction variant using bit 6 as a hint to -- 2.35.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio 2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin @ 2022-05-03 10:33 ` Nicholas Piggin 2022-05-03 15:01 ` Richard Henderson 2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin 2022-05-03 10:33 ` [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin 2 siblings, 1 reply; 6+ messages in thread From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw) To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel eieio does not provide ordering between stores to CI memory and stores to cacheable memory so it can't be used as a general ST_ST barrier. Signed-of-by: Nicholas Piggin <npiggin@gmail.com> --- tcg/ppc/tcg-target.c.inc | 2 -- 1 file changed, 2 deletions(-) diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc index cfcd121f9c..3ff845d063 100644 --- a/tcg/ppc/tcg-target.c.inc +++ b/tcg/ppc/tcg-target.c.inc @@ -1836,8 +1836,6 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0) a0 &= TCG_MO_ALL; if (a0 == TCG_MO_LD_LD) { insn = LWSYNC; - } else if (a0 == TCG_MO_ST_ST) { - insn = EIEIO; } tcg_out32(s, insn); } -- 2.35.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio 2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin @ 2022-05-03 15:01 ` Richard Henderson 0 siblings, 0 replies; 6+ messages in thread From: Richard Henderson @ 2022-05-03 15:01 UTC (permalink / raw) To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel On 5/3/22 03:33, Nicholas Piggin wrote: > eieio does not provide ordering between stores to CI memory and stores > to cacheable memory so it can't be used as a general ST_ST barrier. > > Signed-of-by: Nicholas Piggin <npiggin@gmail.com> > --- > tcg/ppc/tcg-target.c.inc | 2 -- > 1 file changed, 2 deletions(-) > > diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc > index cfcd121f9c..3ff845d063 100644 > --- a/tcg/ppc/tcg-target.c.inc > +++ b/tcg/ppc/tcg-target.c.inc > @@ -1836,8 +1836,6 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0) > a0 &= TCG_MO_ALL; > if (a0 == TCG_MO_LD_LD) { > insn = LWSYNC; > - } else if (a0 == TCG_MO_ST_ST) { > - insn = EIEIO; > } > tcg_out32(s, insn); > } Certainly matches the comment from patch 1. Reviewed-by: Richard Henderson <richard.henderson@linaro.org> r~ ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync 2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin 2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin @ 2022-05-03 10:33 ` Nicholas Piggin 2022-05-03 14:53 ` Richard Henderson 2022-05-03 10:33 ` [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin 2 siblings, 1 reply; 6+ messages in thread From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw) To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel lwsync orders more than just LD_LD, importantly it matches x86 and s390 default memory ordering. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- target/ppc/cpu.h | 2 ++ tcg/ppc/tcg-target.c.inc | 2 +- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index c2b6c987c0..0b0e9761cd 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -28,6 +28,8 @@ #define TCG_GUEST_DEFAULT_MO 0 +#define PPC_LWSYNC_MO (TCG_MO_LD_LD | TCG_MO_LD_ST | TCG_MO_ST_ST) + #define TARGET_PAGE_BITS_64K 16 #define TARGET_PAGE_BITS_16M 24 diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc index 3ff845d063..b87fc2383e 100644 --- a/tcg/ppc/tcg-target.c.inc +++ b/tcg/ppc/tcg-target.c.inc @@ -1834,7 +1834,7 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0) { uint32_t insn = HWSYNC; a0 &= TCG_MO_ALL; - if (a0 == TCG_MO_LD_LD) { + if ((a0 & PPC_LWSYNC_MO) == a0) { insn = LWSYNC; } tcg_out32(s, insn); -- 2.35.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync 2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin @ 2022-05-03 14:53 ` Richard Henderson 0 siblings, 0 replies; 6+ messages in thread From: Richard Henderson @ 2022-05-03 14:53 UTC (permalink / raw) To: Nicholas Piggin, qemu-ppc; +Cc: qemu-devel On 5/3/22 03:33, Nicholas Piggin wrote: > lwsync orders more than just LD_LD, importantly it matches x86 and > s390 default memory ordering. > > Signed-off-by: Nicholas Piggin <npiggin@gmail.com> > --- > target/ppc/cpu.h | 2 ++ > tcg/ppc/tcg-target.c.inc | 2 +- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h > index c2b6c987c0..0b0e9761cd 100644 > --- a/target/ppc/cpu.h > +++ b/target/ppc/cpu.h > @@ -28,6 +28,8 @@ > > #define TCG_GUEST_DEFAULT_MO 0 > > +#define PPC_LWSYNC_MO (TCG_MO_LD_LD | TCG_MO_LD_ST | TCG_MO_ST_ST) You can't put this here... > + > #define TARGET_PAGE_BITS_64K 16 > #define TARGET_PAGE_BITS_16M 24 > > diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc > index 3ff845d063..b87fc2383e 100644 > --- a/tcg/ppc/tcg-target.c.inc > +++ b/tcg/ppc/tcg-target.c.inc > @@ -1834,7 +1834,7 @@ static void tcg_out_mb(TCGContext *s, TCGArg a0) > { > uint32_t insn = HWSYNC; > a0 &= TCG_MO_ALL; > - if (a0 == TCG_MO_LD_LD) { > + if ((a0 & PPC_LWSYNC_MO) == a0) { ... and have it used here. You should have seen compilation failures for the missing symbol. I can only assume you used a restricted --target-list in testing. Anyway, it looks like a simpler test would be insn = (a0 & TCG_MO_ST_LD ? HWSYNC : LWSYNC); r~ ^ permalink raw reply [flat|nested] 6+ messages in thread
* [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering 2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin 2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin 2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin @ 2022-05-03 10:33 ` Nicholas Piggin 2 siblings, 0 replies; 6+ messages in thread From: Nicholas Piggin @ 2022-05-03 10:33 UTC (permalink / raw) To: qemu-ppc; +Cc: Nicholas Piggin, qemu-devel This allows an x86 host to no-op lwsyncs, and ppc host can use lwsync rather than sync. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- target/ppc/cpu.h | 4 +++- target/ppc/cpu_init.c | 13 +++++++------ target/ppc/machine.c | 3 ++- target/ppc/translate.c | 8 +++++++- 4 files changed, 19 insertions(+), 9 deletions(-) diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h index 0b0e9761cd..bf5f226567 100644 --- a/target/ppc/cpu.h +++ b/target/ppc/cpu.h @@ -2287,6 +2287,8 @@ enum { PPC2_ISA300 = 0x0000000000080000ULL, /* POWER ISA 3.1 */ PPC2_ISA310 = 0x0000000000100000ULL, + /* lwsync instruction */ + PPC2_MEM_LWSYNC = 0x0000000000200000ULL, #define PPC_TCG_INSNS2 (PPC2_BOOKE206 | PPC2_VSX | PPC2_PRCNTL | PPC2_DBRX | \ PPC2_ISA205 | PPC2_VSX207 | PPC2_PERM_ISA206 | \ @@ -2295,7 +2297,7 @@ enum { PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | \ PPC2_ALTIVEC_207 | PPC2_ISA207S | PPC2_DFP | \ PPC2_FP_CVT_S64 | PPC2_TM | PPC2_PM_ISA206 | \ - PPC2_ISA300 | PPC2_ISA310) + PPC2_ISA300 | PPC2_ISA310 | PPC2_MEM_LWSYNC) }; /*****************************************************************************/ diff --git a/target/ppc/cpu_init.c b/target/ppc/cpu_init.c index d42e2ba8e0..26d9277ffb 100644 --- a/target/ppc/cpu_init.c +++ b/target/ppc/cpu_init.c @@ -5769,7 +5769,7 @@ POWERPC_FAMILY(970)(ObjectClass *oc, void *data) PPC_MEM_TLBIE | PPC_MEM_TLBSYNC | PPC_64B | PPC_ALTIVEC | PPC_SEGMENT_64B | PPC_SLBI; - pcc->insns_flags2 = PPC2_FP_CVT_S64; + pcc->insns_flags2 = PPC2_FP_CVT_S64 | PPC2_MEM_LWSYNC; pcc->msr_mask = (1ull << MSR_SF) | (1ull << MSR_VR) | (1ull << MSR_POW) | @@ -5846,7 +5846,7 @@ POWERPC_FAMILY(POWER5P)(ObjectClass *oc, void *data) PPC_64B | PPC_POPCNTB | PPC_SEGMENT_64B | PPC_SLBI; - pcc->insns_flags2 = PPC2_FP_CVT_S64; + pcc->insns_flags2 = PPC2_FP_CVT_S64 | PPC2_MEM_LWSYNC; pcc->msr_mask = (1ull << MSR_SF) | (1ull << MSR_VR) | (1ull << MSR_POW) | @@ -5984,7 +5984,7 @@ POWERPC_FAMILY(POWER7)(ObjectClass *oc, void *data) PPC2_PERM_ISA206 | PPC2_DIVE_ISA206 | PPC2_ATOMIC_ISA206 | PPC2_FP_CVT_ISA206 | PPC2_FP_TST_ISA206 | PPC2_FP_CVT_S64 | - PPC2_PM_ISA206; + PPC2_PM_ISA206 | PPC2_MEM_LWSYNC; pcc->msr_mask = (1ull << MSR_SF) | (1ull << MSR_VR) | (1ull << MSR_VSX) | @@ -6157,7 +6157,7 @@ POWERPC_FAMILY(POWER8)(ObjectClass *oc, void *data) PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | - PPC2_TM | PPC2_PM_ISA206; + PPC2_TM | PPC2_PM_ISA206 | PPC2_MEM_LWSYNC; pcc->msr_mask = (1ull << MSR_SF) | (1ull << MSR_HV) | (1ull << MSR_TM) | @@ -6375,7 +6375,7 @@ POWERPC_FAMILY(POWER9)(ObjectClass *oc, void *data) PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | - PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL; + PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_MEM_LWSYNC; pcc->msr_mask = (1ull << MSR_SF) | (1ull << MSR_HV) | (1ull << MSR_TM) | @@ -6590,7 +6590,8 @@ POWERPC_FAMILY(POWER10)(ObjectClass *oc, void *data) PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | - PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310; + PPC2_TM | PPC2_ISA300 | PPC2_PRCNTL | PPC2_ISA310 | + PPC2_MEM_LWSYNC; pcc->msr_mask = (1ull << MSR_SF) | (1ull << MSR_HV) | (1ull << MSR_TM) | diff --git a/target/ppc/machine.c b/target/ppc/machine.c index e673944597..33b3d6cf30 100644 --- a/target/ppc/machine.c +++ b/target/ppc/machine.c @@ -157,7 +157,8 @@ static int cpu_pre_save(void *opaque) | PPC2_ATOMIC_ISA206 | PPC2_FP_CVT_ISA206 | PPC2_FP_TST_ISA206 | PPC2_BCTAR_ISA207 | PPC2_LSQ_ISA207 | PPC2_ALTIVEC_207 - | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_TM; + | PPC2_ISA205 | PPC2_ISA207S | PPC2_FP_CVT_S64 | PPC2_TM + | PPC2_MEM_LWSYNC; env->spr[SPR_LR] = env->lr; env->spr[SPR_CTR] = env->ctr; diff --git a/target/ppc/translate.c b/target/ppc/translate.c index abb8807180..76691cf082 100644 --- a/target/ppc/translate.c +++ b/target/ppc/translate.c @@ -4040,8 +4040,13 @@ static void gen_stqcx_(DisasContext *ctx) /* sync */ static void gen_sync(DisasContext *ctx) { + TCGBar bar = TCG_MO_ALL; uint32_t l = (ctx->opcode >> 21) & 3; + if ((l == 1) && (ctx->insns_flags2 & PPC2_MEM_LWSYNC)) { + bar = PPC_LWSYNC_MO; + } + /* * We may need to check for a pending TLB flush. * @@ -4053,7 +4058,8 @@ static void gen_sync(DisasContext *ctx) if (((l == 2) || !(ctx->insns_flags & PPC_64B)) && !ctx->pr) { gen_check_tlb_flush(ctx, true); } - tcg_gen_mb(TCG_MO_ALL | TCG_BAR_SC); + + tcg_gen_mb(bar | TCG_BAR_SC); } /* wait */ -- 2.35.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-05-03 15:03 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-05-03 10:33 [RFC PATCH 1/4] target/ppc: Fix eieio memory ordering semantics Nicholas Piggin 2022-05-03 10:33 ` [RFC PATCH 2/4] tcg/ppc: ST_ST memory ordering is not provided with eieio Nicholas Piggin 2022-05-03 15:01 ` Richard Henderson 2022-05-03 10:33 ` [RFC PATCH 3/4] tcg/ppc: Optimize memory ordering generation with lwsync Nicholas Piggin 2022-05-03 14:53 ` Richard Henderson 2022-05-03 10:33 ` [RFC PATCH 4/4] target/ppc: Implement lwsync with weaker memory ordering Nicholas Piggin
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).