* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl [not found] <579ad8bd.8481620a.89e78.f1ce@mx.google.com> @ 2016-07-29 4:44 ` Benjamin Herrenschmidt 2016-07-29 6:42 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 4:44 UTC (permalink / raw) To: Richard Henderson; +Cc: David Gibson, qemu-devel, qemu-ppc On Fri, 2016-07-29 at 09:46 +0530, Richard Henderson wrote: > > But that doesn't yet make the leap to 128-bit types in tcg. > I was going to raise that topic during the 2.8 cycle, since as a > consequence I want to drop support for 32-bit hosts, at least for 64- > bit guests, and maybe entirely. I wonder (or is it too chancy ?) if it's worthwhile having a TCG ops that in effect does a translate and returns a host pointer... That or a slightly more convoluted way of asking the backend to produce a series of load/stores. We can provide the element size (so it can do byteswap) and the base reg. Maybe a stride too. We can have a default impl that turns it into a loop of existing load/stores and smart backends can check for page boundaries and limit the translations. That would allow a much more efficient implementation for example of ppc store/load multiplem which are heavily used by 32-bit guest code for example. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl [not found] <579ad8bd.8481620a.89e78.f1ce@mx.google.com> 2016-07-29 4:44 ` [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl Benjamin Herrenschmidt @ 2016-07-29 6:42 ` Benjamin Herrenschmidt 2016-07-29 6:56 ` Benjamin Herrenschmidt 2016-07-29 6:58 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 6:42 UTC (permalink / raw) To: Richard Henderson; +Cc: David Gibson, qemu-devel, qemu-ppc On Fri, 2016-07-29 at 09:46 +0530, Richard Henderson wrote: > The form of declaration you're using takes care of that. In order to > not clobber, you have to use DEF_HELPER_FLAGS with *NO_RWG*. BTW. Is that stuff (and the various flags here) somewhat documented anywhere ? :-) Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 6:42 ` Benjamin Herrenschmidt @ 2016-07-29 6:56 ` Benjamin Herrenschmidt 2016-07-29 12:30 ` Richard Henderson 2016-07-29 12:37 ` Richard Henderson 2016-07-29 6:58 ` Benjamin Herrenschmidt 1 sibling, 2 replies; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 6:56 UTC (permalink / raw) To: Richard Henderson; +Cc: David Gibson, qemu-devel, qemu-ppc On Fri, 2016-07-29 at 16:42 +1000, Benjamin Herrenschmidt wrote: > On Fri, 2016-07-29 at 09:46 +0530, Richard Henderson wrote: > > > > The form of declaration you're using takes care of that. In order > > to > > not clobber, you have to use DEF_HELPER_FLAGS with *NO_RWG*. > > BTW. Is that stuff (and the various flags here) somewhat documented > anywhere ? :-) More specifically, looking at the flags defined: #define TCG_CALL_NO_READ_GLOBALS 0x0010 /* Helper does not write globals */ #define TCG_CALL_NO_WRITE_GLOBALS 0x0020 /* Helper can be safely suppressed if the return value is not used. */ #define TCG_CALL_NO_SIDE_EFFECTS 0x0040 I assume by "globals" that means things defined with tcg_global_mem_new_i32() correct ? So I can't do something like set TCG_CALL_NO_RWG (nor TCG_CALL_NO_WG) on something like dcbz because it can take an exception, correct ? Now if a helper doens't access env->gpr/spr/... but instead take everything as in/out arguments and cannot take an exception, we can use these, right ? I notice that sadly, all of the vector ops are helper with full clobbers, because I assume, the "avr" is passed as pointer due to the lack of an int128 type in TCG correct ? Is the only option here to go down the int128 path or would there be a reasonable way to add a mechanism for more fine grained clobbers ? Another reason why I think that might be handy is that a whole lot of helpers basically have full clobbers simply because they update a CR bit (or fscr for FP ops). It would be nice to be able to mark just that clobbered. Maybe by declaring up to a handful of "globals" special, assigning them an index and having a special clobber bits corresponding to them in the flags ? Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 6:56 ` Benjamin Herrenschmidt @ 2016-07-29 12:30 ` Richard Henderson 2016-07-29 12:37 ` Richard Henderson 1 sibling, 0 replies; 15+ messages in thread From: Richard Henderson @ 2016-07-29 12:30 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Gibson, qemu-devel, qemu-ppc On 07/29/2016 12:26 PM, Benjamin Herrenschmidt wrote: > So I can't do something like set TCG_CALL_NO_RWG (nor TCG_CALL_NO_WG) > on something like dcbz because it can take an exception, correct ? You can't set NO_RWG, but you can say NO_WG. It "reads" the registers via the exception (and it certainly has side effects). But it doesn't write to the registers (along the normal return path; a loophole I've used elsewhere). r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 6:56 ` Benjamin Herrenschmidt 2016-07-29 12:30 ` Richard Henderson @ 2016-07-29 12:37 ` Richard Henderson 1 sibling, 0 replies; 15+ messages in thread From: Richard Henderson @ 2016-07-29 12:37 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: David Gibson, qemu-devel, qemu-ppc On 07/29/2016 12:26 PM, Benjamin Herrenschmidt wrote: > I notice that sadly, all of the vector ops are helper with full > clobbers, because I assume, the "avr" is passed as pointer due to the > lack of an int128 type in TCG correct ? Yes. Although x86 doesn't declare the vector registers as tcg registers at all. Which is both a blessing and a curse. My opinion is that aarch64 is the model to aim for -- most operations can operate on 64-bit slices at a time, no registers clobbered. There will always be exceptions, of course, but hopefully with rarer insns. r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 6:42 ` Benjamin Herrenschmidt 2016-07-29 6:56 ` Benjamin Herrenschmidt @ 2016-07-29 6:58 ` Benjamin Herrenschmidt 1 sibling, 0 replies; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 6:58 UTC (permalink / raw) To: Richard Henderson; +Cc: David Gibson, qemu-devel, qemu-ppc On Fri, 2016-07-29 at 16:42 +1000, Benjamin Herrenschmidt wrote: > On Fri, 2016-07-29 at 09:46 +0530, Richard Henderson wrote: > > > > The form of declaration you're using takes care of that. In order > > to > > not clobber, you have to use DEF_HELPER_FLAGS with *NO_RWG*. > > BTW. Is that stuff (and the various flags here) somewhat documented > anywhere ? :-) I found the README ! :-) Sorry for the noise. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 01/32] ppc: Fix fault PC reporting for lve*/stve* VMX instructions @ 2016-07-26 22:20 Benjamin Herrenschmidt 2016-07-26 22:21 ` [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-26 22:20 UTC (permalink / raw) To: qemu-ppc; +Cc: qemu-devel, david, Benjamin Herrenschmidt We forgot to do gen_update_nip() for these like we do with other helpers. Fix this, but in a more efficient way by passing the RA to the accessors instead so the overhead is only taken on faults. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- target-ppc/mem_helper.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-) diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c index e4de86b..e4ed377 100644 --- a/target-ppc/mem_helper.c +++ b/target-ppc/mem_helper.c @@ -232,16 +232,16 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg, \ if (needs_byteswap(env)) { \ r->element[LO_IDX ? index : (adjust - index)] = \ - swap(access(env, addr)); \ + swap(access(env, addr, GETPC())); \ } else { \ r->element[LO_IDX ? index : (adjust - index)] = \ - access(env, addr); \ + access(env, addr, GETPC()); \ } \ } #define I(x) (x) -LVE(lvebx, cpu_ldub_data, I, u8) -LVE(lvehx, cpu_lduw_data, bswap16, u16) -LVE(lvewx, cpu_ldl_data, bswap32, u32) +LVE(lvebx, cpu_ldub_data_ra, I, u8) +LVE(lvehx, cpu_lduw_data_ra, bswap16, u16) +LVE(lvewx, cpu_ldl_data_ra, bswap32, u32) #undef I #undef LVE @@ -259,16 +259,17 @@ LVE(lvewx, cpu_ldl_data, bswap32, u32) \ if (needs_byteswap(env)) { \ access(env, addr, swap(r->element[LO_IDX ? index : \ - (adjust - index)])); \ + (adjust - index)]), \ + GETPC()); \ } else { \ access(env, addr, r->element[LO_IDX ? index : \ - (adjust - index)]); \ + (adjust - index)], GETPC()); \ } \ } #define I(x) (x) -STVE(stvebx, cpu_stb_data, I, u8) -STVE(stvehx, cpu_stw_data, bswap16, u16) -STVE(stvewx, cpu_stl_data, bswap32, u32) +STVE(stvebx, cpu_stb_data_ra, I, u8) +STVE(stvehx, cpu_stw_data_ra, bswap16, u16) +STVE(stvewx, cpu_stl_data_ra, bswap32, u32) #undef I #undef LVE -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-26 22:20 [Qemu-devel] [PATCH 01/32] ppc: Fix fault PC reporting for lve*/stve* VMX instructions Benjamin Herrenschmidt @ 2016-07-26 22:21 ` Benjamin Herrenschmidt 2016-07-29 0:49 ` Richard Henderson 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-26 22:21 UTC (permalink / raw) To: qemu-ppc; +Cc: qemu-devel, david, Benjamin Herrenschmidt Those are always naturally aligned, so cannot cross a page boundary, thus instead of generating two 8-byte loads with translation on each (and double swap for LE on LE), we use a helper that will do a single translation and memcpy the result over (or do appropriate swapping if needed). Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> --- target-ppc/helper.h | 2 ++ target-ppc/mem_helper.c | 60 +++++++++++++++++++++++++++++++++++++++++ target-ppc/translate/vmx-impl.c | 38 ++++++++------------------ target-ppc/translate/vmx-ops.c | 4 +-- 4 files changed, 75 insertions(+), 29 deletions(-) diff --git a/target-ppc/helper.h b/target-ppc/helper.h index 1f5cfd0..64f7d2c 100644 --- a/target-ppc/helper.h +++ b/target-ppc/helper.h @@ -269,9 +269,11 @@ DEF_HELPER_5(vmsumshm, void, env, avr, avr, avr, avr) DEF_HELPER_5(vmsumshs, void, env, avr, avr, avr, avr) DEF_HELPER_4(vmladduhm, void, avr, avr, avr, avr) DEF_HELPER_2(mtvscr, void, env, avr) +DEF_HELPER_3(lvx, void, env, i32, tl) DEF_HELPER_3(lvebx, void, env, avr, tl) DEF_HELPER_3(lvehx, void, env, avr, tl) DEF_HELPER_3(lvewx, void, env, avr, tl) +DEF_HELPER_3(stvx, void, env, i32, tl) DEF_HELPER_3(stvebx, void, env, avr, tl) DEF_HELPER_3(stvehx, void, env, avr, tl) DEF_HELPER_3(stvewx, void, env, avr, tl) diff --git a/target-ppc/mem_helper.c b/target-ppc/mem_helper.c index 6548715..da3f973 100644 --- a/target-ppc/mem_helper.c +++ b/target-ppc/mem_helper.c @@ -225,6 +225,66 @@ target_ulong helper_lscbx(CPUPPCState *env, target_ulong addr, uint32_t reg, #define LO_IDX 0 #endif +void helper_lvx(CPUPPCState *env, uint32_t vr, target_ulong addr) +{ + uintptr_t raddr = GETPC(); + ppc_avr_t *haddr; + + /* Align address */ + addr &= ~(target_ulong)0xf; + + /* Try fast path translate */ + haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, env->dmmu_idx); + if (haddr) { + if (msr_le && HI_IDX) { + memcpy(&env->avr[vr], haddr, 16); + } else { + env->avr[vr].u64[LO_IDX] = bswap64(haddr->u64[HI_IDX]); + env->avr[vr].u64[HI_IDX] = bswap64(haddr->u64[LO_IDX]); + } + } else { + if (needs_byteswap(env)) { + env->avr[vr].u64[LO_IDX] = + bswap64(cpu_ldq_data_ra(env, addr, raddr)); + env->avr[vr].u64[HI_IDX] = + bswap64(cpu_ldq_data_ra(env, addr + 8, raddr)); + } else { + env->avr[vr].u64[HI_IDX] = cpu_ldq_data_ra(env, addr, raddr); + env->avr[vr].u64[LO_IDX] = cpu_ldq_data_ra(env, addr + 8, raddr); + } + } +} + +void helper_stvx(CPUPPCState *env, uint32_t vr, target_ulong addr) +{ + uintptr_t raddr = GETPC(); + ppc_avr_t *haddr; + + /* Align address */ + addr &= ~(target_ulong)0xf; + + /* Try fast path translate */ + haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, env->dmmu_idx); + if (haddr) { + if (msr_le && HI_IDX) { + memcpy(haddr, &env->avr[vr], 16); + } else { + haddr->u64[LO_IDX] = bswap64(env->avr[vr].u64[HI_IDX]); + haddr->u64[HI_IDX] = bswap64(env->avr[vr].u64[LO_IDX]); + } + } else { + if (needs_byteswap(env)) { + cpu_stq_data_ra(env, addr, + bswap64(env->avr[vr].u64[LO_IDX]), raddr); + cpu_stq_data_ra(env, addr + 8, + bswap64(env->avr[vr].u64[HI_IDX]), raddr); + } else { + cpu_stq_data_ra(env, addr, env->avr[vr].u64[HI_IDX], raddr); + cpu_stq_data_ra(env, addr + 8, env->avr[vr].u64[LO_IDX], raddr); + } + } +} + /* We use msr_le to determine index ordering in a vector. However, byteswapping is not simply controlled by msr_le. We also need to take into account endianness of the target. This is done for the little-endian diff --git a/target-ppc/translate/vmx-impl.c b/target-ppc/translate/vmx-impl.c index 110e19c..a58aa0c 100644 --- a/target-ppc/translate/vmx-impl.c +++ b/target-ppc/translate/vmx-impl.c @@ -15,55 +15,39 @@ static inline TCGv_ptr gen_avr_ptr(int reg) } #define GEN_VR_LDX(name, opc2, opc3) \ -static void glue(gen_, name)(DisasContext *ctx) \ +static void glue(gen_, name)(DisasContext *ctx) \ { \ TCGv EA; \ + TCGv_i32 t0; \ if (unlikely(!ctx->altivec_enabled)) { \ gen_exception(ctx, POWERPC_EXCP_VPU); \ return; \ } \ gen_set_access_type(ctx, ACCESS_INT); \ EA = tcg_temp_new(); \ + t0 = tcg_const_i32(rD(ctx->opcode)); \ gen_addr_reg_index(ctx, EA); \ - tcg_gen_andi_tl(EA, EA, ~0xf); \ - /* We only need to swap high and low halves. gen_qemu_ld64 does necessary \ - 64-bit byteswap already. */ \ - if (ctx->le_mode) { \ - gen_qemu_ld64(ctx, cpu_avrl[rD(ctx->opcode)], EA); \ - tcg_gen_addi_tl(EA, EA, 8); \ - gen_qemu_ld64(ctx, cpu_avrh[rD(ctx->opcode)], EA); \ - } else { \ - gen_qemu_ld64(ctx, cpu_avrh[rD(ctx->opcode)], EA); \ - tcg_gen_addi_tl(EA, EA, 8); \ - gen_qemu_ld64(ctx, cpu_avrl[rD(ctx->opcode)], EA); \ - } \ + gen_helper_lvx(cpu_env, t0, EA); \ tcg_temp_free(EA); \ + tcg_temp_free_i32(t0); \ } #define GEN_VR_STX(name, opc2, opc3) \ static void gen_st##name(DisasContext *ctx) \ { \ TCGv EA; \ + TCGv_i32 t0; \ if (unlikely(!ctx->altivec_enabled)) { \ gen_exception(ctx, POWERPC_EXCP_VPU); \ return; \ } \ gen_set_access_type(ctx, ACCESS_INT); \ EA = tcg_temp_new(); \ + t0 = tcg_const_i32(rD(ctx->opcode)); \ gen_addr_reg_index(ctx, EA); \ - tcg_gen_andi_tl(EA, EA, ~0xf); \ - /* We only need to swap high and low halves. gen_qemu_st64 does necessary \ - 64-bit byteswap already. */ \ - if (ctx->le_mode) { \ - gen_qemu_st64(ctx, cpu_avrl[rD(ctx->opcode)], EA); \ - tcg_gen_addi_tl(EA, EA, 8); \ - gen_qemu_st64(ctx, cpu_avrh[rD(ctx->opcode)], EA); \ - } else { \ - gen_qemu_st64(ctx, cpu_avrh[rD(ctx->opcode)], EA); \ - tcg_gen_addi_tl(EA, EA, 8); \ - gen_qemu_st64(ctx, cpu_avrl[rD(ctx->opcode)], EA); \ - } \ + gen_helper_stvx(cpu_env, t0, EA); \ tcg_temp_free(EA); \ + tcg_temp_free_i32(t0); \ } #define GEN_VR_LVE(name, opc2, opc3, size) \ @@ -116,9 +100,9 @@ GEN_VR_LVE(bx, 0x07, 0x00, 1); GEN_VR_LVE(hx, 0x07, 0x01, 2); GEN_VR_LVE(wx, 0x07, 0x02, 4); -GEN_VR_STX(svx, 0x07, 0x07); +GEN_VR_STX(vx, 0x07, 0x07); /* As we don't emulate the cache, stvxl is stricly equivalent to stvx */ -GEN_VR_STX(svxl, 0x07, 0x0F); +GEN_VR_STX(vxl, 0x07, 0x0F); GEN_VR_STVE(bx, 0x07, 0x04, 1); GEN_VR_STVE(hx, 0x07, 0x05, 2); diff --git a/target-ppc/translate/vmx-ops.c b/target-ppc/translate/vmx-ops.c index b9c982a..6c7d150 100644 --- a/target-ppc/translate/vmx-ops.c +++ b/target-ppc/translate/vmx-ops.c @@ -11,8 +11,8 @@ GEN_VR_LDX(lvxl, 0x07, 0x0B), GEN_VR_LVE(bx, 0x07, 0x00), GEN_VR_LVE(hx, 0x07, 0x01), GEN_VR_LVE(wx, 0x07, 0x02), -GEN_VR_STX(svx, 0x07, 0x07), -GEN_VR_STX(svxl, 0x07, 0x0F), +GEN_VR_STX(vx, 0x07, 0x07), +GEN_VR_STX(vxl, 0x07, 0x0F), GEN_VR_STVE(bx, 0x07, 0x04), GEN_VR_STVE(hx, 0x07, 0x05), GEN_VR_STVE(wx, 0x07, 0x06), -- 2.7.4 ^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-26 22:21 ` [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl Benjamin Herrenschmidt @ 2016-07-29 0:49 ` Richard Henderson 2016-07-29 2:13 ` Benjamin Herrenschmidt 2016-07-29 9:00 ` Benjamin Herrenschmidt 0 siblings, 2 replies; 15+ messages in thread From: Richard Henderson @ 2016-07-29 0:49 UTC (permalink / raw) To: Benjamin Herrenschmidt, qemu-ppc; +Cc: qemu-devel, david On 07/27/2016 03:51 AM, Benjamin Herrenschmidt wrote: > - tcg_gen_andi_tl(EA, EA, ~0xf); \ > - /* We only need to swap high and low halves. gen_qemu_ld64 does necessary \ > - 64-bit byteswap already. */ \ > - if (ctx->le_mode) { \ > - gen_qemu_ld64(ctx, cpu_avrl[rD(ctx->opcode)], EA); \ > - tcg_gen_addi_tl(EA, EA, 8); \ > - gen_qemu_ld64(ctx, cpu_avrh[rD(ctx->opcode)], EA); \ > - } else { \ > - gen_qemu_ld64(ctx, cpu_avrh[rD(ctx->opcode)], EA); \ > - tcg_gen_addi_tl(EA, EA, 8); \ > - gen_qemu_ld64(ctx, cpu_avrl[rD(ctx->opcode)], EA); \ > - } \ > + gen_helper_lvx(cpu_env, t0, EA); \ This, I'm not so keen on. (1) The helper, since it writes to registers controlled by tcg, must be described to clobber all registers. Which will noticeably increase memory traffic to ENV. For instance, you won't be able to hold the guest register holding the address in a host register across the call. (2) We're going to have to teach tcg about 16-byte data types soon anyway, for the proper emulation of 16-byte atomic operations. r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 0:49 ` Richard Henderson @ 2016-07-29 2:13 ` Benjamin Herrenschmidt 2016-07-29 3:34 ` David Gibson 2016-07-29 9:00 ` Benjamin Herrenschmidt 1 sibling, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 2:13 UTC (permalink / raw) To: Richard Henderson, qemu-ppc; +Cc: qemu-devel, david On Fri, 2016-07-29 at 06:19 +0530, Richard Henderson wrote: > This, I'm not so keen on. > > (1) The helper, since it writes to registers controlled by tcg, must be > described to clobber all registers. Which will noticeably increase memory > traffic to ENV. For instance, you won't be able to hold the guest register > holding the address in a host register across the call. Ah I wasn't aware of this. How do you describe such a clobber ? Can I describe specifically which one is clobbered ? I didn't think TCG kept track of the vector halves but I must admit I'm still a bit new with TCG in general. I noticed other constructs doing that (passing a register number to an opcode), what do I do to ensure the right clobbers are there ? > > (2) We're going to have to teach tcg about 16-byte data types soon anyway, for > the proper emulation of 16-byte atomic operations. Is anybody working on this already ? I thought about that approach as it would definitely make things easier for that and a couple of other cases such as lq/stq. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 2:13 ` Benjamin Herrenschmidt @ 2016-07-29 3:34 ` David Gibson 2016-07-29 4:40 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: David Gibson @ 2016-07-29 3:34 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Richard Henderson, qemu-ppc, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1499 bytes --] On Fri, Jul 29, 2016 at 12:13:01PM +1000, Benjamin Herrenschmidt wrote: > On Fri, 2016-07-29 at 06:19 +0530, Richard Henderson wrote: > > This, I'm not so keen on. > > > > (1) The helper, since it writes to registers controlled by tcg, must be > > described to clobber all registers. Which will noticeably increase memory > > traffic to ENV. For instance, you won't be able to hold the guest register > > holding the address in a host register across the call. > > Ah I wasn't aware of this. How do you describe such a clobber ? Can I describe > specifically which one is clobbered ? I didn't think TCG kept track of the vector > halves but I must admit I'm still a bit new with TCG in general. > > I noticed other constructs doing that (passing a register number to an opcode), > what do I do to ensure the right clobbers are there ? > > > > (2) We're going to have to teach tcg about 16-byte data types soon anyway, for > > the proper emulation of 16-byte atomic operations. > > Is anybody working on this already ? I thought about that approach as > it would definitely make things easier for that and a couple of other > cases such as lq/stq. What should I do with this in the short term? Leave it in ppc-for-2.8, or remove it for now pending possible changes? -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 3:34 ` David Gibson @ 2016-07-29 4:40 ` Benjamin Herrenschmidt 2016-07-29 4:58 ` Benjamin Herrenschmidt 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 4:40 UTC (permalink / raw) To: David Gibson; +Cc: Richard Henderson, qemu-ppc, qemu-devel On Fri, 2016-07-29 at 13:34 +1000, David Gibson wrote: > > What should I do with this in the short term? Leave it in > ppc-for-2.8, or remove it for now pending possible changes? I think I'm still measuring a performance improvement with this, I'll test a bit more and will get back to you. It will definitely better to do it without a helper once Richard's 128- bit stuff is in. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 4:40 ` Benjamin Herrenschmidt @ 2016-07-29 4:58 ` Benjamin Herrenschmidt 2016-07-29 5:42 ` David Gibson 0 siblings, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 4:58 UTC (permalink / raw) To: David Gibson; +Cc: Richard Henderson, qemu-ppc, qemu-devel On Fri, 2016-07-29 at 14:40 +1000, Benjamin Herrenschmidt wrote: > On Fri, 2016-07-29 at 13:34 +1000, David Gibson wrote: > > > > > > What should I do with this in the short term? Leave it in > > ppc-for-2.8, or remove it for now pending possible changes? > > I think I'm still measuring a performance improvement with this, I'll > test a bit more and will get back to you. > > It will definitely better to do it without a helper once Richard's 128- > bit stuff is in. Ok, after I nice'd myself on top of all the gcc's on that test machine I confirm that this patch is a loss on qemu-user. It might still be a win for qemu-system-ppc64 due to the translation but probably not worth bothering. So drop this one. I'll use the 128-bit support when it becomes available. Cheers, Ben. ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 4:58 ` Benjamin Herrenschmidt @ 2016-07-29 5:42 ` David Gibson 0 siblings, 0 replies; 15+ messages in thread From: David Gibson @ 2016-07-29 5:42 UTC (permalink / raw) To: Benjamin Herrenschmidt; +Cc: Richard Henderson, qemu-ppc, qemu-devel [-- Attachment #1: Type: text/plain, Size: 1113 bytes --] On Fri, Jul 29, 2016 at 02:58:07PM +1000, Benjamin Herrenschmidt wrote: > On Fri, 2016-07-29 at 14:40 +1000, Benjamin Herrenschmidt wrote: > > On Fri, 2016-07-29 at 13:34 +1000, David Gibson wrote: > > > > > > > > > What should I do with this in the short term? Leave it in > > > ppc-for-2.8, or remove it for now pending possible changes? > > > > I think I'm still measuring a performance improvement with this, I'll > > test a bit more and will get back to you. > > > > It will definitely better to do it without a helper once Richard's 128- > > bit stuff is in. > > Ok, after I nice'd myself on top of all the gcc's on that test machine > I confirm that this patch is a loss on qemu-user. It might still be a > win for qemu-system-ppc64 due to the translation but probably not > worth bothering. > > So drop this one. I'll use the 128-bit support when it becomes > available. Done. -- David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 0:49 ` Richard Henderson 2016-07-29 2:13 ` Benjamin Herrenschmidt @ 2016-07-29 9:00 ` Benjamin Herrenschmidt 2016-07-29 12:43 ` Richard Henderson 1 sibling, 1 reply; 15+ messages in thread From: Benjamin Herrenschmidt @ 2016-07-29 9:00 UTC (permalink / raw) To: Richard Henderson, qemu-ppc; +Cc: qemu-devel, david On Fri, 2016-07-29 at 06:19 +0530, Richard Henderson wrote: > (1) The helper, since it writes to registers controlled by tcg, must be > described to clobber all registers. Which will noticeably increase memory > traffic to ENV. For instance, you won't be able to hold the guest register > holding the address in a host register across the call. So after fixing my test setup, I did observe indeed a small performance loss using the helper in qemu-user. It might still win us something in softmmu due to avoiding extra translations but I will leave that aside as I mentioned separately. Now out of curosity, I tried this: --- a/target-ppc/helper.h +++ b/target-ppc/helper.h @@ -22,12 +22,12 @@ DEF_HELPER_1(check_tlb_flush, void, env) #endif DEF_HELPER_3(lmw, void, env, tl, i32) -DEF_HELPER_3(stmw, void, env, tl, i32) +DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32) DEF_HELPER_4(lsw, void, env, tl, i32, i32) DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32) -DEF_HELPER_4(stsw, void, env, tl, i32, i32) -DEF_HELPER_3(dcbz, void, env, tl, i32) -DEF_HELPER_2(icbi, void, env, tl) +DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32) +DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32) +DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl) DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32) #if defined(TARGET_PPC64) If my understanding is right, the above is correct, as none of these instructions will write to the env, though they can read from it and/ or generate faults. Sadly I haven't observed any performance improvement as a result in a few micro-benchmarks I cooked up. Cheers, Ben ^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl 2016-07-29 9:00 ` Benjamin Herrenschmidt @ 2016-07-29 12:43 ` Richard Henderson 0 siblings, 0 replies; 15+ messages in thread From: Richard Henderson @ 2016-07-29 12:43 UTC (permalink / raw) To: Benjamin Herrenschmidt, qemu-ppc; +Cc: qemu-devel, david On 07/29/2016 02:30 PM, Benjamin Herrenschmidt wrote: > DEF_HELPER_3(lmw, void, env, tl, i32) > -DEF_HELPER_3(stmw, void, env, tl, i32) > +DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32) > DEF_HELPER_4(lsw, void, env, tl, i32, i32) > DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32) > -DEF_HELPER_4(stsw, void, env, tl, i32, i32) > -DEF_HELPER_3(dcbz, void, env, tl, i32) > -DEF_HELPER_2(icbi, void, env, tl) > +DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32) > +DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32) > +DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl) > DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32) > > #if defined(TARGET_PPC64) > > If my understanding is right, the above is correct, as none of these > instructions will write to the env, though they can read from it and/ > or generate faults. > > Sadly I haven't observed any performance improvement as a result in > a few micro-benchmarks I cooked up. Too bad. But the changes look correct, so you might as well queue them. ;-) r~ ^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2016-07-29 12:43 UTC | newest] Thread overview: 15+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <579ad8bd.8481620a.89e78.f1ce@mx.google.com> 2016-07-29 4:44 ` [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl Benjamin Herrenschmidt 2016-07-29 6:42 ` Benjamin Herrenschmidt 2016-07-29 6:56 ` Benjamin Herrenschmidt 2016-07-29 12:30 ` Richard Henderson 2016-07-29 12:37 ` Richard Henderson 2016-07-29 6:58 ` Benjamin Herrenschmidt 2016-07-26 22:20 [Qemu-devel] [PATCH 01/32] ppc: Fix fault PC reporting for lve*/stve* VMX instructions Benjamin Herrenschmidt 2016-07-26 22:21 ` [Qemu-devel] [PATCH 28/32] ppc: Avoid double translation for lvx/lvxl/stvx/stvxl Benjamin Herrenschmidt 2016-07-29 0:49 ` Richard Henderson 2016-07-29 2:13 ` Benjamin Herrenschmidt 2016-07-29 3:34 ` David Gibson 2016-07-29 4:40 ` Benjamin Herrenschmidt 2016-07-29 4:58 ` Benjamin Herrenschmidt 2016-07-29 5:42 ` David Gibson 2016-07-29 9:00 ` Benjamin Herrenschmidt 2016-07-29 12:43 ` Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).