* [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements @ 2013-09-01 16:07 Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson ` (3 more replies) 0 siblings, 4 replies; 9+ messages in thread From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw) To: qemu-devel; +Cc: aurelien The first patch allows me to build the ppc32 target on a ppc64 host. The rest of them update the ppc32 backend on top of the v2 "Further tcg ldst improvements" patch set. r~ Richard Henderson (4): configure: Allow command-line configure for ppc32 tcg-ppc: Avoid code for nop move tcg-ppc: Convert to helper_ret_ld/st_mmu tcg-ppc: Fix and cleanup tcg_out_tlb_check configure | 8 + include/exec/exec-all.h | 4 +- tcg/ppc/tcg-target.c | 530 ++++++++++++++++++++---------------------------- 3 files changed, 226 insertions(+), 316 deletions(-) -- 1.8.3.1 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson @ 2013-09-01 16:07 ` Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move Richard Henderson ` (2 subsequent siblings) 3 siblings, 0 replies; 9+ messages in thread From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw) To: qemu-devel; +Cc: aurelien Similar to manually selecting i386 for an x86_64 host. Signed-off-by: Richard Henderson <rth@twiddle.net> --- configure | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/configure b/configure index 0a55c20..07eaa3c 100755 --- a/configure +++ b/configure @@ -951,6 +951,14 @@ for opt do done case "$cpu" in + ppc) + CPU_CFLAGS="-m32" + LDFLAGS="-m32 $LDFLAGS" + ;; + ppc64) + CPU_CFLAGS="-m64" + LDFLAGS="-m64 $LDFLAGS" + ;; sparc) LDFLAGS="-m32 $LDFLAGS" CPU_CFLAGS="-m32 -mcpu=ultrasparc" -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move 2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson @ 2013-09-01 16:07 ` Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check Richard Henderson 3 siblings, 0 replies; 9+ messages in thread From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw) To: qemu-devel; +Cc: Vassili Karpov (malc), aurelien, Richard Henderson While these are rare from code that's been through the optimizer, it's not uncommon within the tcg backend. Signed-off-by: Richard Henderson <rth@twiddle.net> --- tcg/ppc/tcg-target.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c index 9a73d06..b0fbc54 100644 --- a/tcg/ppc/tcg-target.c +++ b/tcg/ppc/tcg-target.c @@ -450,7 +450,9 @@ static const uint32_t tcg_to_bc[] = { static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg) { - tcg_out32 (s, OR | SAB (arg, ret, arg)); + if (ret != arg) { + tcg_out32(s, OR | SAB(arg, ret, arg)); + } } static void tcg_out_movi(TCGContext *s, TCGType type, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu 2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move Richard Henderson @ 2013-09-01 16:07 ` Richard Henderson 2013-09-07 9:46 ` Paolo Bonzini 2013-09-01 16:07 ` [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check Richard Henderson 3 siblings, 1 reply; 9+ messages in thread From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw) To: qemu-devel; +Cc: Vassili Karpov (malc), aurelien, Richard Henderson Drop the ld/st_trampolines, loading the return address into a parameter register directly. Signed-off-by: Richard Henderson <rth@twiddle.net> --- include/exec/exec-all.h | 4 +- tcg/ppc/tcg-target.c | 220 +++++++++++++++++++----------------------------- 2 files changed, 86 insertions(+), 138 deletions(-) diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h index beb4149..a81e805 100644 --- a/include/exec/exec-all.h +++ b/include/exec/exec-all.h @@ -324,9 +324,7 @@ extern uintptr_t tci_tb_ptr; In some implementations, we pass the "logical" return address manually; in others, we must infer the logical return from the true return. */ #if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU) -# if defined (_ARCH_PPC) && !defined (_ARCH_PPC64) -# define GETRA_LDST(RA) (*(int32_t *)((RA) - 4)) -# elif defined(__arm__) +# if defined(__arm__) /* We define two insns between the return address and the branch back to straight-line. Find and decode that branch insn. */ # define GETRA_LDST(RA) tcg_getra_ldst(RA) diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c index b0fbc54..a890319 100644 --- a/tcg/ppc/tcg-target.c +++ b/tcg/ppc/tcg-target.c @@ -551,27 +551,26 @@ static void add_qemu_ldst_label (TCGContext *s, label->label_ptr[0] = label_ptr; } -/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr, - int mmu_idx) */ +/* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr, + * int mmu_idx, uintptr_t ra) + */ static const void * const qemu_ld_helpers[4] = { - helper_ldb_mmu, - helper_ldw_mmu, - helper_ldl_mmu, - helper_ldq_mmu, + helper_ret_ldub_mmu, + helper_ret_lduw_mmu, + helper_ret_ldul_mmu, + helper_ret_ldq_mmu, }; -/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr, - uintxx_t val, int mmu_idx) */ +/* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr, + * uintxx_t val, int mmu_idx, uintptr_t ra) + */ static const void * const qemu_st_helpers[4] = { - helper_stb_mmu, - helper_stw_mmu, - helper_stl_mmu, - helper_stq_mmu, + helper_ret_stb_mmu, + helper_ret_stw_mmu, + helper_ret_stl_mmu, + helper_ret_stq_mmu, }; -static void *ld_trampolines[4]; -static void *st_trampolines[4]; - static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2, int addr_reg, int addr_reg2, int s_bits, int offset1, int offset2, uint8_t **label_ptr) @@ -608,9 +607,14 @@ static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2, tcg_out32 (s, CMP | BF (6) | RA (addr_reg2) | RB (r1)); tcg_out32 (s, CRAND | BT (7, CR_EQ) | BA (6, CR_EQ) | BB (7, CR_EQ)); #endif + + /* Use a conditional branch-and-link so that we load a pointer to + somewhere within the current opcode, for passing on to the helper. + This address cannot be used for a tail call, but it's shorter + than forming an address from scratch. */ *label_ptr = s->code_ptr; retranst = ((uint16_t *) s->code_ptr)[1] & ~3; - tcg_out32 (s, BC | BI (7, CR_EQ) | retranst | BO_COND_FALSE); + tcg_out32(s, BC | BI(7, CR_EQ) | retranst | BO_COND_FALSE | LK); /* r0 now contains &env->tlb_table[mem_index][index].addr_x */ tcg_out32 (s, (LWZ @@ -833,132 +837,99 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc) } #if defined(CONFIG_SOFTMMU) -static void tcg_out_qemu_ld_slow_path (TCGContext *s, TCGLabelQemuLdst *label) +static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb) { - int s_bits; - int ir; - int opc = label->opc; - int mem_index = label->mem_index; - int data_reg = label->datalo_reg; - int data_reg2 = label->datahi_reg; - int addr_reg = label->addrlo_reg; - uint8_t *raddr = label->raddr; - uint8_t **label_ptr = &label->label_ptr[0]; + TCGReg ir, datalo, datahi; + int opc = lb->opc; - s_bits = opc & 3; + reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr); - /* resolve label address */ - reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr); + ir = TCG_REG_R3; + tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0); - /* slow path */ - ir = 4; -#if TARGET_LONG_BITS == 32 - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); -#else + if (TARGET_LONG_BITS == 32) { + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); + } else { #ifdef TCG_TARGET_CALL_ALIGN_ARGS - ir |= 1; -#endif - tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg); - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); + ir |= 1; #endif - tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index); - tcg_out_call (s, (tcg_target_long) ld_trampolines[s_bits], 1); - tcg_out32 (s, (tcg_target_long) raddr); + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg); + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); + } + + tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index); + tcg_out32(s, MFSPR | RT(ir++) | LR); + + tcg_out_call(s, (uintptr_t)qemu_ld_helpers[opc & 3], 1); + + datalo = lb->datalo_reg; switch (opc) { case 0|4: - tcg_out32 (s, EXTSB | RA (data_reg) | RS (3)); + tcg_out32(s, EXTSB | RA(datalo) | RS(TCG_REG_R3)); break; case 1|4: - tcg_out32 (s, EXTSH | RA (data_reg) | RS (3)); + tcg_out32(s, EXTSH | RA(datalo) | RS(TCG_REG_R3)); break; - case 0: - case 1: - case 2: - if (data_reg != 3) - tcg_out_mov (s, TCG_TYPE_I32, data_reg, 3); + + default: + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R3); break; + case 3: - if (data_reg == 3) { - if (data_reg2 == 4) { - tcg_out_mov (s, TCG_TYPE_I32, 0, 4); - tcg_out_mov (s, TCG_TYPE_I32, 4, 3); - tcg_out_mov (s, TCG_TYPE_I32, 3, 0); - } - else { - tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3); - tcg_out_mov (s, TCG_TYPE_I32, 3, 4); - } - } - else { - if (data_reg != 4) tcg_out_mov (s, TCG_TYPE_I32, data_reg, 4); - if (data_reg2 != 3) tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3); + datahi = lb->datahi_reg; + if (datalo != TCG_REG_R3) { + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4); + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3); + } else if (datahi != TCG_REG_R4) { + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3); + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4); + } else { + tcg_out_mov(s, TCG_TYPE_I32, TCG_REG_R0, TCG_REG_R4); + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3); + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R0); } break; } + /* Jump to the code corresponding to next IR of qemu_st */ - tcg_out_b (s, 0, (tcg_target_long) raddr); + tcg_out_b(s, 0, (uintptr_t)lb->raddr); } -static void tcg_out_qemu_st_slow_path (TCGContext *s, TCGLabelQemuLdst *label) +static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb) { - int ir; - int opc = label->opc; - int mem_index = label->mem_index; - int data_reg = label->datalo_reg; - int data_reg2 = label->datahi_reg; - int addr_reg = label->addrlo_reg; - uint8_t *raddr = label->raddr; - uint8_t **label_ptr = &label->label_ptr[0]; - - /* resolve label address */ - reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr); - - /* slow path */ - ir = 4; -#if TARGET_LONG_BITS == 32 - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); -#else + TCGReg ir; + + reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr); + + ir = TCG_REG_R3; + tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0); + + if (TARGET_LONG_BITS == 32) { + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); + } else { #ifdef TCG_TARGET_CALL_ALIGN_ARGS - ir |= 1; -#endif - tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg); - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); + ir |= 1; #endif + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg); + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); + } - switch (opc) { - case 0: - tcg_out32 (s, (RLWINM - | RA (ir) - | RS (data_reg) - | SH (0) - | MB (24) - | ME (31))); - break; - case 1: - tcg_out32 (s, (RLWINM - | RA (ir) - | RS (data_reg) - | SH (0) - | MB (16) - | ME (31))); - break; - case 2: - tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg); - break; - case 3: + if (lb->opc != 3) { + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg); + } else { #ifdef TCG_TARGET_CALL_ALIGN_ARGS ir |= 1; #endif - tcg_out_mov (s, TCG_TYPE_I32, ir++, data_reg2); - tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg); - break; + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datahi_reg); + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg); } - ir++; - tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index); - tcg_out_call (s, (tcg_target_long) st_trampolines[opc], 1); - tcg_out32 (s, (tcg_target_long) raddr); - tcg_out_b (s, 0, (tcg_target_long) raddr); + tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index); + tcg_out32(s, MFSPR | RT(ir++) | LR); + + tcg_out_call(s, (uintptr_t)qemu_st_helpers[lb->opc], 1); + + tcg_out_b(s, 0, (uintptr_t)lb->raddr); } void tcg_out_tb_finalize(TCGContext *s) @@ -979,17 +950,6 @@ void tcg_out_tb_finalize(TCGContext *s) } #endif -#ifdef CONFIG_SOFTMMU -static void emit_ldst_trampoline (TCGContext *s, const void *ptr) -{ - tcg_out32 (s, MFSPR | RT (3) | LR); - tcg_out32 (s, ADDI | RT (3) | RA (3) | 4); - tcg_out32 (s, MTSPR | RS (3) | LR); - tcg_out_mov (s, TCG_TYPE_I32, 3, TCG_AREG0); - tcg_out_b (s, 0, (tcg_target_long) ptr); -} -#endif - static void tcg_target_qemu_prologue (TCGContext *s) { int i, frame_size; @@ -1050,16 +1010,6 @@ static void tcg_target_qemu_prologue (TCGContext *s) tcg_out32 (s, MTSPR | RS (0) | LR); tcg_out32 (s, ADDI | RT (1) | RA (1) | frame_size); tcg_out32 (s, BCLR | BO_ALWAYS); - -#ifdef CONFIG_SOFTMMU - for (i = 0; i < 4; ++i) { - ld_trampolines[i] = s->code_ptr; - emit_ldst_trampoline (s, qemu_ld_helpers[i]); - - st_trampolines[i] = s->code_ptr; - emit_ldst_trampoline (s, qemu_st_helpers[i]); - } -#endif } static void tcg_out_ld (TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu 2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson @ 2013-09-07 9:46 ` Paolo Bonzini 2013-09-09 17:42 ` Richard Henderson 0 siblings, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2013-09-07 9:46 UTC (permalink / raw) To: Richard Henderson; +Cc: Vassili Karpov (malc), qemu-devel, aurelien On 09/01/2013 06:07 PM, Richard Henderson wrote: > Drop the ld/st_trampolines, loading the return address into a > parameter register directly. > > Signed-off-by: Richard Henderson <rth@twiddle.net> > --- > include/exec/exec-all.h | 4 +- > tcg/ppc/tcg-target.c | 220 +++++++++++++++++++----------------------------- > 2 files changed, 86 insertions(+), 138 deletions(-) > > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h > index beb4149..a81e805 100644 > --- a/include/exec/exec-all.h > +++ b/include/exec/exec-all.h > @@ -324,9 +324,7 @@ extern uintptr_t tci_tb_ptr; > In some implementations, we pass the "logical" return address manually; > in others, we must infer the logical return from the true return. */ > #if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU) > -# if defined (_ARCH_PPC) && !defined (_ARCH_PPC64) > -# define GETRA_LDST(RA) (*(int32_t *)((RA) - 4)) > -# elif defined(__arm__) > +# if defined(__arm__) > /* We define two insns between the return address and the branch back to > straight-line. Find and decode that branch insn. */ > # define GETRA_LDST(RA) tcg_getra_ldst(RA) > diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c > index b0fbc54..a890319 100644 > --- a/tcg/ppc/tcg-target.c > +++ b/tcg/ppc/tcg-target.c > @@ -551,27 +551,26 @@ static void add_qemu_ldst_label (TCGContext *s, > label->label_ptr[0] = label_ptr; > } > > -/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr, > - int mmu_idx) */ > +/* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr, > + * int mmu_idx, uintptr_t ra) > + */ > static const void * const qemu_ld_helpers[4] = { > - helper_ldb_mmu, > - helper_ldw_mmu, > - helper_ldl_mmu, > - helper_ldq_mmu, > + helper_ret_ldub_mmu, > + helper_ret_lduw_mmu, > + helper_ret_ldul_mmu, > + helper_ret_ldq_mmu, > }; > > -/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr, > - uintxx_t val, int mmu_idx) */ > +/* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr, > + * uintxx_t val, int mmu_idx, uintptr_t ra) > + */ > static const void * const qemu_st_helpers[4] = { > - helper_stb_mmu, > - helper_stw_mmu, > - helper_stl_mmu, > - helper_stq_mmu, > + helper_ret_stb_mmu, > + helper_ret_stw_mmu, > + helper_ret_stl_mmu, > + helper_ret_stq_mmu, > }; > > -static void *ld_trampolines[4]; > -static void *st_trampolines[4]; > - > static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2, > int addr_reg, int addr_reg2, int s_bits, > int offset1, int offset2, uint8_t **label_ptr) > @@ -608,9 +607,14 @@ static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2, > tcg_out32 (s, CMP | BF (6) | RA (addr_reg2) | RB (r1)); > tcg_out32 (s, CRAND | BT (7, CR_EQ) | BA (6, CR_EQ) | BB (7, CR_EQ)); > #endif > + > + /* Use a conditional branch-and-link so that we load a pointer to > + somewhere within the current opcode, for passing on to the helper. > + This address cannot be used for a tail call, but it's shorter > + than forming an address from scratch. */ > *label_ptr = s->code_ptr; > retranst = ((uint16_t *) s->code_ptr)[1] & ~3; > - tcg_out32 (s, BC | BI (7, CR_EQ) | retranst | BO_COND_FALSE); > + tcg_out32(s, BC | BI(7, CR_EQ) | retranst | BO_COND_FALSE | LK); > > /* r0 now contains &env->tlb_table[mem_index][index].addr_x */ > tcg_out32 (s, (LWZ > @@ -833,132 +837,99 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc) > } > > #if defined(CONFIG_SOFTMMU) > -static void tcg_out_qemu_ld_slow_path (TCGContext *s, TCGLabelQemuLdst *label) > +static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb) > { > - int s_bits; > - int ir; > - int opc = label->opc; > - int mem_index = label->mem_index; > - int data_reg = label->datalo_reg; > - int data_reg2 = label->datahi_reg; > - int addr_reg = label->addrlo_reg; > - uint8_t *raddr = label->raddr; > - uint8_t **label_ptr = &label->label_ptr[0]; > + TCGReg ir, datalo, datahi; > + int opc = lb->opc; > > - s_bits = opc & 3; > + reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr); > > - /* resolve label address */ > - reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr); > + ir = TCG_REG_R3; > + tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0); > > - /* slow path */ > - ir = 4; > -#if TARGET_LONG_BITS == 32 > - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); > -#else > + if (TARGET_LONG_BITS == 32) { > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); > + } else { > #ifdef TCG_TARGET_CALL_ALIGN_ARGS > - ir |= 1; > -#endif > - tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg); > - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); > + ir |= 1; > #endif > - tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index); > - tcg_out_call (s, (tcg_target_long) ld_trampolines[s_bits], 1); > - tcg_out32 (s, (tcg_target_long) raddr); > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg); > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); > + } > + > + tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index); > + tcg_out32(s, MFSPR | RT(ir++) | LR); > + > + tcg_out_call(s, (uintptr_t)qemu_ld_helpers[opc & 3], 1); > + > + datalo = lb->datalo_reg; > switch (opc) { > case 0|4: > - tcg_out32 (s, EXTSB | RA (data_reg) | RS (3)); > + tcg_out32(s, EXTSB | RA(datalo) | RS(TCG_REG_R3)); > break; > case 1|4: > - tcg_out32 (s, EXTSH | RA (data_reg) | RS (3)); > + tcg_out32(s, EXTSH | RA(datalo) | RS(TCG_REG_R3)); > break; > - case 0: > - case 1: > - case 2: > - if (data_reg != 3) > - tcg_out_mov (s, TCG_TYPE_I32, data_reg, 3); > + > + default: > + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R3); > break; > + > case 3: > - if (data_reg == 3) { > - if (data_reg2 == 4) { > - tcg_out_mov (s, TCG_TYPE_I32, 0, 4); > - tcg_out_mov (s, TCG_TYPE_I32, 4, 3); > - tcg_out_mov (s, TCG_TYPE_I32, 3, 0); > - } > - else { > - tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3); > - tcg_out_mov (s, TCG_TYPE_I32, 3, 4); > - } > - } > - else { > - if (data_reg != 4) tcg_out_mov (s, TCG_TYPE_I32, data_reg, 4); > - if (data_reg2 != 3) tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3); > + datahi = lb->datahi_reg; > + if (datalo != TCG_REG_R3) { > + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4); > + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3); > + } else if (datahi != TCG_REG_R4) { > + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3); > + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4); > + } else { > + tcg_out_mov(s, TCG_TYPE_I32, TCG_REG_R0, TCG_REG_R4); > + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3); > + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R0); > } > break; > } > + > /* Jump to the code corresponding to next IR of qemu_st */ > - tcg_out_b (s, 0, (tcg_target_long) raddr); > + tcg_out_b(s, 0, (uintptr_t)lb->raddr); > } > > -static void tcg_out_qemu_st_slow_path (TCGContext *s, TCGLabelQemuLdst *label) > +static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb) > { > - int ir; > - int opc = label->opc; > - int mem_index = label->mem_index; > - int data_reg = label->datalo_reg; > - int data_reg2 = label->datahi_reg; > - int addr_reg = label->addrlo_reg; > - uint8_t *raddr = label->raddr; > - uint8_t **label_ptr = &label->label_ptr[0]; > - > - /* resolve label address */ > - reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr); > - > - /* slow path */ > - ir = 4; > -#if TARGET_LONG_BITS == 32 > - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); > -#else > + TCGReg ir; > + > + reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr); > + > + ir = TCG_REG_R3; > + tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0); > + > + if (TARGET_LONG_BITS == 32) { > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); > + } else { > #ifdef TCG_TARGET_CALL_ALIGN_ARGS > - ir |= 1; > -#endif > - tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg); > - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg); > + ir |= 1; > #endif > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg); > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg); > + } > > - switch (opc) { > - case 0: > - tcg_out32 (s, (RLWINM > - | RA (ir) > - | RS (data_reg) > - | SH (0) > - | MB (24) > - | ME (31))); > - break; > - case 1: > - tcg_out32 (s, (RLWINM > - | RA (ir) > - | RS (data_reg) > - | SH (0) > - | MB (16) > - | ME (31))); > - break; > - case 2: > - tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg); > - break; > - case 3: > + if (lb->opc != 3) { > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg); > + } else { > #ifdef TCG_TARGET_CALL_ALIGN_ARGS > ir |= 1; > #endif > - tcg_out_mov (s, TCG_TYPE_I32, ir++, data_reg2); > - tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg); > - break; > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datahi_reg); > + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg); > } > - ir++; > > - tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index); > - tcg_out_call (s, (tcg_target_long) st_trampolines[opc], 1); > - tcg_out32 (s, (tcg_target_long) raddr); > - tcg_out_b (s, 0, (tcg_target_long) raddr); > + tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index); > + tcg_out32(s, MFSPR | RT(ir++) | LR); > + > + tcg_out_call(s, (uintptr_t)qemu_st_helpers[lb->opc], 1); > + > + tcg_out_b(s, 0, (uintptr_t)lb->raddr); > } > > void tcg_out_tb_finalize(TCGContext *s) > @@ -979,17 +950,6 @@ void tcg_out_tb_finalize(TCGContext *s) > } > #endif > > -#ifdef CONFIG_SOFTMMU > -static void emit_ldst_trampoline (TCGContext *s, const void *ptr) > -{ > - tcg_out32 (s, MFSPR | RT (3) | LR); > - tcg_out32 (s, ADDI | RT (3) | RA (3) | 4); > - tcg_out32 (s, MTSPR | RS (3) | LR); > - tcg_out_mov (s, TCG_TYPE_I32, 3, TCG_AREG0); > - tcg_out_b (s, 0, (tcg_target_long) ptr); > -} > -#endif > - > static void tcg_target_qemu_prologue (TCGContext *s) > { > int i, frame_size; > @@ -1050,16 +1010,6 @@ static void tcg_target_qemu_prologue (TCGContext *s) > tcg_out32 (s, MTSPR | RS (0) | LR); > tcg_out32 (s, ADDI | RT (1) | RA (1) | frame_size); > tcg_out32 (s, BCLR | BO_ALWAYS); > - > -#ifdef CONFIG_SOFTMMU > - for (i = 0; i < 4; ++i) { > - ld_trampolines[i] = s->code_ptr; > - emit_ldst_trampoline (s, qemu_ld_helpers[i]); > - > - st_trampolines[i] = s->code_ptr; > - emit_ldst_trampoline (s, qemu_st_helpers[i]); > - } > -#endif > } > > static void tcg_out_ld (TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1, > Bad news... with this patch, either with or without patch 2, trying to execute sieve.flat from kvm-unit-tests (it doesn't matter if it is compiled as 32-bit or 64-bit, and with both i386-softmmu and x86_64-softmmu targets) fails as follows on my PowerBook: qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000 EAX=00006d20 EBX=00007025 ECX=00000000 EDX=00000000 ESI=07fd7bd0 EDI=000f1930 EBP=07fd7b00 ESP=00006e0c EIP=70270000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0000 00000000 ffffffff 008f9300 CS =f000 000f0000 ffffffff 008f9b00 SS =0000 00000000 ffffffff 008f9300 DS =0000 00000000 ffffffff 008f9300 FS =0000 00000000 ffffffff 008f9300 GS =0000 00000000 ffffffff 008f9300 LDT=0000 00000000 0000ffff 00008200 TR =0000 00000000 0000ffff 00008b00 GDT= 000f69e0 00000037 IDT= 00000000 000003ff CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 CCS=00000044 CCD=00006df8 CCO=ADDL EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Aborted The failure happens as soon as the first hardware interrupt is serviced: Servicing hardware INT=0x08 Servicing hardware INT=0x09 ---------------- IN: 0x000fe98f: push %es 0x000fe990: push %ebp 0x000fe992: push %edi 0x000fe994: push %esi 0x000fe996: push %ebx 0x000fe998: sub $0x44,%esp 0x000fe99c: mov $0x40,%eax 0x000fe9a2: mov %ax,%es 0x000fe9a4: mov %es:0x6c,%edx 0x000fe9aa: inc %edx 0x000fe9ac: cmp $0x1800af,%edx 0x000fe9b3: jbe 0xfe9c6 ---------------- IN: 0x000f77f4: xor %edx,%edx 0x000f77f7: calll *%ecx qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000 Command line: i386-softmmu/qemu-system-i386 -kernel sieve32.flat -serial stdio -device isa-debug-exit,iobase=0xf4 -nographic My two patches + 2/4 from this series work. I didn't try 4/4 because it doesn't apply cleanly on top of my patches. Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu 2013-09-07 9:46 ` Paolo Bonzini @ 2013-09-09 17:42 ` Richard Henderson 2013-09-09 17:49 ` Paolo Bonzini 0 siblings, 1 reply; 9+ messages in thread From: Richard Henderson @ 2013-09-09 17:42 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Vassili Karpov (malc), qemu-devel, aurelien On 08/19/2013 12:42 PM, Paolo Bonzini wrote: > Bad news... with this patch, either with or without patch 2, trying to execute > sieve.flat from kvm-unit-tests (it doesn't matter if it is compiled as 32-bit > or 64-bit, and with both i386-softmmu and x86_64-softmmu targets) fails as > follows on my PowerBook: > > qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000 Hum. Are you sure it's anything related to the ppc backend at all? This test doesn't work with an x86_64 host either. qemu: fatal: Trying to execute code outside RAM or ROM at 0x004001ba EAX=80000011 EBX=00009500 ECX=c0000080 EDX=00000000 ESI=00000000 EDI=00542000 EBP=00000000 ESP=0044abbc EIP=004001ba EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-] SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy GDT= 0040800a 00000447 IDT= 00000000 000003ff CR0=80000011 CR2=00000000 CR3=00407000 CR4=00000020 DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 DR6=ffff0ff0 DR7=00000400 CCS=00000000 CCD=00000000 CCO=SARL EFER=0000000000000000 FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 Aborted (core dumped) This happens after one of the writes to %cr0. Of course, the test works with kvm enabled, so I don't blame the test so much as the target-i386 front end... This is not new breakage, either. I've checked back through 1.4.0 and I can't make it work with any version of TCG. r~ ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu 2013-09-09 17:42 ` Richard Henderson @ 2013-09-09 17:49 ` Paolo Bonzini 2013-09-09 18:20 ` Richard Henderson 0 siblings, 1 reply; 9+ messages in thread From: Paolo Bonzini @ 2013-09-09 17:49 UTC (permalink / raw) To: Richard Henderson; +Cc: Vassili Karpov (malc), qemu-devel, aurelien Il 09/09/2013 19:42, Richard Henderson ha scritto: > On 08/19/2013 12:42 PM, Paolo Bonzini wrote: >> Bad news... with this patch, either with or without patch 2, trying to execute >> sieve.flat from kvm-unit-tests (it doesn't matter if it is compiled as 32-bit >> or 64-bit, and with both i386-softmmu and x86_64-softmmu targets) fails as >> follows on my PowerBook: >> >> qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000 > > Hum. Are you sure it's anything related to the ppc backend at all? This > test doesn't work with an x86_64 host either. > > qemu: fatal: Trying to execute code outside RAM or ROM at 0x004001ba > > EAX=80000011 EBX=00009500 ECX=c0000080 EDX=00000000 > ESI=00000000 EDI=00542000 EBP=00000000 ESP=0044abbc > EIP=004001ba EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0 > ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] > CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-] > SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] > DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] > FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] > GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA] > LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT > TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy > GDT= 0040800a 00000447 > IDT= 00000000 000003ff > CR0=80000011 CR2=00000000 CR3=00407000 CR4=00000020 > DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000 > DR6=ffff0ff0 DR7=00000400 > CCS=00000000 CCD=00000000 CCO=SARL > EFER=0000000000000000 > FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80 > FPR0=0000000000000000 0000 FPR1=0000000000000000 0000 > FPR2=0000000000000000 0000 FPR3=0000000000000000 0000 > FPR4=0000000000000000 0000 FPR5=0000000000000000 0000 > FPR6=0000000000000000 0000 FPR7=0000000000000000 0000 > XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000 > XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000 > XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000 > XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000 > Aborted (core dumped) > > This happens after one of the writes to %cr0. Of course, the test works with > kvm enabled, so I don't blame the test so much as the target-i386 front end... > > This is not new breakage, either. I've checked back through 1.4.0 and I can't > make it work with any version of TCG. Strange... works here with 1.6.0 from Fedora $ time qemu-system-x86_64 -device isa-debug-exit,iobase=0xf4 -serial stdio -kernel sieve64.flat enabling apic starting sieve static:78498 out of 1000000 paging enabled cr0 = 80010011 cr3 = 7fff000 cr4 = 20 mapped:78498 out of 1000000 virtual:5761455 out of 100000000 virtual:5761455 out of 100000000 virtual:5761455 out of 100000000 real 0m50.056s user 0m49.467s sys 0m0.415s I sent you my binaries offlist. Paolo ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu 2013-09-09 17:49 ` Paolo Bonzini @ 2013-09-09 18:20 ` Richard Henderson 0 siblings, 0 replies; 9+ messages in thread From: Richard Henderson @ 2013-09-09 18:20 UTC (permalink / raw) To: Paolo Bonzini; +Cc: Vassili Karpov (malc), qemu-devel, aurelien On 09/09/2013 10:49 AM, Paolo Bonzini wrote: > I sent you my binaries offlist. And apparently there was something wrong with the binaries I built myself, as yours work. I'll now look at my ppc32 changes and see what's what. r~ ^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check 2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson ` (2 preceding siblings ...) 2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson @ 2013-09-01 16:07 ` Richard Henderson 3 siblings, 0 replies; 9+ messages in thread From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw) To: qemu-devel; +Cc: Vassili Karpov (malc), aurelien, Richard Henderson The fix is that sparc has so many mmu modes that the last one overflowed the 16-bit signed offset we assumed would fit. Handle this, and check the new assumption at compile time. Load the tlb addend earlier for the fast path. Remove the explicit address + addend and make use of index addressing. Adjust constraints for qemu_ld64 such that we don't clobber the address register or tlb addend before loading both values. Signed-off-by: Richard Henderson <rth@twiddle.net> --- tcg/ppc/tcg-target.c | 302 ++++++++++++++++++++++----------------------------- 1 file changed, 127 insertions(+), 175 deletions(-) diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c index a890319..e190422 100644 --- a/tcg/ppc/tcg-target.c +++ b/tcg/ppc/tcg-target.c @@ -571,42 +571,72 @@ static const void * const qemu_st_helpers[4] = { helper_ret_stq_mmu, }; -static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2, - int addr_reg, int addr_reg2, int s_bits, - int offset1, int offset2, uint8_t **label_ptr) +/* Perform the TLB load and compare. Branches to the slow path, placing the + address of the branch in *LABEL_PTR. Loads the addend of the TLB into R0. + Clobbers R1 and R2. */ + +static void tcg_out_tlb_check(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2, + TCGReg addrlo, TCGReg addrhi, int s_bits, + int mem_index, int is_load, uint8_t **label_ptr) { + int cmp_off = + (is_load + ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read) + : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write)); + int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend); uint16_t retranst; + TCGReg base = TCG_AREG0; + + /* Extract the page index, shifted into place for tlb index. */ + tcg_out32(s, (RLWINM + | RA(r0) + | RS(addrlo) + | SH(32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS)) + | MB(32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS)) + | ME(31 - CPU_TLB_ENTRY_BITS))); + + /* Compensate for very large offsets. */ + if (add_off >= 0x8000) { + /* Most target env are smaller than 32k; none are larger than 64k. + Simplify the logic here merely to offset by 0x8000, giving us a + range just shy of 64k. Check this assumption. */ + QEMU_BUILD_BUG_ON(offsetof(CPUArchState, + tlb_table[NB_MMU_MODES - 1][1]) + > 0x8000 + 0x7fff); + tcg_out32(s, ADDI | RT(r1) | RA(base) | 0x8000); + base = r1; + cmp_off -= 0x8000; + add_off -= 0x8000; + } - tcg_out32 (s, (RLWINM - | RA (r0) - | RS (addr_reg) - | SH (32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS)) - | MB (32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS)) - | ME (31 - CPU_TLB_ENTRY_BITS) - ) - ); - tcg_out32 (s, ADD | RT (r0) | RA (r0) | RB (TCG_AREG0)); - tcg_out32 (s, (LWZU - | RT (r1) - | RA (r0) - | offset1 - ) - ); - tcg_out32 (s, (RLWINM - | RA (r2) - | RS (addr_reg) - | SH (0) - | MB ((32 - s_bits) & 31) - | ME (31 - TARGET_PAGE_BITS) - ) - ); + /* Clear the non-page, non-alignment bits from the address. */ + tcg_out32(s, (RLWINM + | RA(r2) + | RS(addrlo) + | SH(0) + | MB((32 - s_bits) & 31) + | ME(31 - TARGET_PAGE_BITS))); - tcg_out32 (s, CMP | BF (7) | RA (r2) | RB (r1)); -#if TARGET_LONG_BITS == 64 - tcg_out32 (s, LWZ | RT (r1) | RA (r0) | 4); - tcg_out32 (s, CMP | BF (6) | RA (addr_reg2) | RB (r1)); - tcg_out32 (s, CRAND | BT (7, CR_EQ) | BA (6, CR_EQ) | BB (7, CR_EQ)); -#endif + tcg_out32(s, ADD | RT(r0) | RA(r0) | RB(base)); + base = r0; + + /* Load the tlb comparator. */ + tcg_out32(s, LWZ | RT(r1) | RA(base) | (cmp_off & 0xffff)); + + tcg_out32(s, CMP | BF(7) | RA(r2) | RB(r1)); + + if (TARGET_LONG_BITS == 64) { + tcg_out32(s, LWZ | RT(r1) | RA(base) | ((cmp_off + 4) & 0xffff)); + } + + /* Load the tlb addend for use on the fast path. + Do this asap to minimize load delay. */ + tcg_out32(s, LWZ | RT(r0) | RA(base) | (add_off & 0xffff)); + + if (TARGET_LONG_BITS == 64) { + tcg_out32(s, CMP | BF(6) | RA(addrhi) | RB(r1)); + tcg_out32(s, CRAND | BT(7, CR_EQ) | BA(6, CR_EQ) | BB(7, CR_EQ)); + } /* Use a conditional branch-and-link so that we load a pointer to somewhere within the current opcode, for passing on to the helper. @@ -615,58 +645,31 @@ static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2, *label_ptr = s->code_ptr; retranst = ((uint16_t *) s->code_ptr)[1] & ~3; tcg_out32(s, BC | BI(7, CR_EQ) | retranst | BO_COND_FALSE | LK); - - /* r0 now contains &env->tlb_table[mem_index][index].addr_x */ - tcg_out32 (s, (LWZ - | RT (r0) - | RA (r0) - | offset2 - ) - ); - /* r0 = env->tlb_table[mem_index][index].addend */ - tcg_out32 (s, ADD | RT (r0) | RA (r0) | RB (addr_reg)); - /* r0 = env->tlb_table[mem_index][index].addend + addr */ - } #endif static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc) { - int addr_reg, data_reg, data_reg2, r0, r1, rbase, bswap; + TCGReg addrlo, datalo, datahi, rbase; + int bswap; #ifdef CONFIG_SOFTMMU - int mem_index, s_bits, r2, addr_reg2; + int mem_index; + TCGReg addrhi; uint8_t *label_ptr; #endif - data_reg = *args++; - if (opc == 3) - data_reg2 = *args++; - else - data_reg2 = 0; - addr_reg = *args++; + datalo = *args++; + datahi = (opc == 3 ? *args++ : 0); + addrlo = *args++; #ifdef CONFIG_SOFTMMU -#if TARGET_LONG_BITS == 64 - addr_reg2 = *args++; -#else - addr_reg2 = 0; -#endif + addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0); mem_index = *args; - s_bits = opc & 3; - r0 = 3; - r1 = 4; - r2 = 0; - rbase = 0; - - tcg_out_tlb_check ( - s, r0, r1, r2, addr_reg, addr_reg2, s_bits, - offsetof (CPUArchState, tlb_table[mem_index][0].addr_read), - offsetof (CPUTLBEntry, addend) - offsetof (CPUTLBEntry, addr_read), - &label_ptr - ); + + tcg_out_tlb_check(s, TCG_REG_R3, TCG_REG_R4, TCG_REG_R0, addrlo, + addrhi, opc & 3, mem_index, 0, &label_ptr); + rbase = TCG_REG_R3; #else /* !CONFIG_SOFTMMU */ - r0 = addr_reg; - r1 = 3; rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0; #endif @@ -679,106 +682,72 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc) switch (opc) { default: case 0: - tcg_out32 (s, LBZX | TAB (data_reg, rbase, r0)); + tcg_out32(s, LBZX | TAB(datalo, rbase, addrlo)); break; case 0|4: - tcg_out32 (s, LBZX | TAB (data_reg, rbase, r0)); - tcg_out32 (s, EXTSB | RA (data_reg) | RS (data_reg)); + tcg_out32(s, LBZX | TAB(datalo, rbase, addrlo)); + tcg_out32(s, EXTSB | RA(datalo) | RS(datalo)); break; case 1: - if (bswap) - tcg_out32 (s, LHBRX | TAB (data_reg, rbase, r0)); - else - tcg_out32 (s, LHZX | TAB (data_reg, rbase, r0)); + tcg_out32(s, (bswap ? LHBRX : LHZX) | TAB(datalo, rbase, addrlo)); break; case 1|4: if (bswap) { - tcg_out32 (s, LHBRX | TAB (data_reg, rbase, r0)); - tcg_out32 (s, EXTSH | RA (data_reg) | RS (data_reg)); + tcg_out32(s, LHBRX | TAB(datalo, rbase, addrlo)); + tcg_out32(s, EXTSH | RA(datalo) | RS(datalo)); + } else { + tcg_out32(s, LHAX | TAB(datalo, rbase, addrlo)); } - else tcg_out32 (s, LHAX | TAB (data_reg, rbase, r0)); break; case 2: - if (bswap) - tcg_out32 (s, LWBRX | TAB (data_reg, rbase, r0)); - else - tcg_out32 (s, LWZX | TAB (data_reg, rbase, r0)); + tcg_out32(s, (bswap ? LWBRX : LWZX) | TAB(datalo, rbase, addrlo)); break; case 3: if (bswap) { - tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4); - tcg_out32 (s, LWBRX | TAB (data_reg, rbase, r0)); - tcg_out32 (s, LWBRX | TAB (data_reg2, rbase, r1)); - } - else { -#ifdef CONFIG_USE_GUEST_BASE - tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4); - tcg_out32 (s, LWZX | TAB (data_reg2, rbase, r0)); - tcg_out32 (s, LWZX | TAB (data_reg, rbase, r1)); -#else - if (r0 == data_reg2) { - tcg_out32 (s, LWZ | RT (0) | RA (r0)); - tcg_out32 (s, LWZ | RT (data_reg) | RA (r0) | 4); - tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 0); - } - else { - tcg_out32 (s, LWZ | RT (data_reg2) | RA (r0)); - tcg_out32 (s, LWZ | RT (data_reg) | RA (r0) | 4); - } -#endif + tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4); + tcg_out32(s, LWBRX | TAB(datalo, rbase, addrlo)); + tcg_out32(s, LWBRX | TAB(datahi, rbase, TCG_REG_R0)); + } else if (rbase != 0) { + tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4); + tcg_out32(s, LWZX | TAB(datahi, rbase, addrlo)); + tcg_out32(s, LWZX | TAB(datalo, rbase, TCG_REG_R0)); + } else if (addrlo == datahi) { + tcg_out32(s, LWZ | RT(datalo) | RA(addrlo) | 4); + tcg_out32(s, LWZ | RT(datahi) | RA(addrlo)); + } else { + tcg_out32(s, LWZ | RT(datahi) | RA(addrlo)); + tcg_out32(s, LWZ | RT(datalo) | RA(addrlo) | 4); } break; } #ifdef CONFIG_SOFTMMU - add_qemu_ldst_label (s, - 1, - opc, - data_reg, - data_reg2, - addr_reg, - addr_reg2, - mem_index, - s->code_ptr, - label_ptr); + add_qemu_ldst_label(s, 1, opc, datalo, datahi, addrlo, + addrhi, mem_index, s->code_ptr, label_ptr); #endif } static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc) { - int addr_reg, r0, r1, data_reg, data_reg2, bswap, rbase; + TCGReg addrlo, datalo, datahi, rbase; + int bswap; #ifdef CONFIG_SOFTMMU - int mem_index, r2, addr_reg2; + int mem_index; + TCGReg addrhi; uint8_t *label_ptr; #endif - data_reg = *args++; - if (opc == 3) - data_reg2 = *args++; - else - data_reg2 = 0; - addr_reg = *args++; + datalo = *args++; + datahi = (opc == 3 ? *args++ : 0); + addrlo = *args++; #ifdef CONFIG_SOFTMMU -#if TARGET_LONG_BITS == 64 - addr_reg2 = *args++; -#else - addr_reg2 = 0; -#endif + addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0); mem_index = *args; - r0 = 3; - r1 = 4; - r2 = 0; - rbase = 0; - - tcg_out_tlb_check ( - s, r0, r1, r2, addr_reg, addr_reg2, opc & 3, - offsetof (CPUArchState, tlb_table[mem_index][0].addr_write), - offsetof (CPUTLBEntry, addend) - offsetof (CPUTLBEntry, addr_write), - &label_ptr - ); + + tcg_out_tlb_check(s, TCG_REG_R3, TCG_REG_R4, TCG_REG_R0, addrlo, + addrhi, opc & 3, mem_index, 0, &label_ptr); + rbase = TCG_REG_R3; #else /* !CONFIG_SOFTMMU */ - r0 = addr_reg; - r1 = 3; rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0; #endif @@ -789,50 +758,33 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc) #endif switch (opc) { case 0: - tcg_out32 (s, STBX | SAB (data_reg, rbase, r0)); + tcg_out32(s, STBX | SAB(datalo, rbase, addrlo)); break; case 1: - if (bswap) - tcg_out32 (s, STHBRX | SAB (data_reg, rbase, r0)); - else - tcg_out32 (s, STHX | SAB (data_reg, rbase, r0)); + tcg_out32(s, (bswap ? STHBRX : STHX) | SAB(datalo, rbase, addrlo)); break; case 2: - if (bswap) - tcg_out32 (s, STWBRX | SAB (data_reg, rbase, r0)); - else - tcg_out32 (s, STWX | SAB (data_reg, rbase, r0)); + tcg_out32(s, (bswap ? STWBRX : STWX) | SAB(datalo, rbase, addrlo)); break; case 3: if (bswap) { - tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4); - tcg_out32 (s, STWBRX | SAB (data_reg, rbase, r0)); - tcg_out32 (s, STWBRX | SAB (data_reg2, rbase, r1)); - } - else { -#ifdef CONFIG_USE_GUEST_BASE - tcg_out32 (s, STWX | SAB (data_reg2, rbase, r0)); - tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4); - tcg_out32 (s, STWX | SAB (data_reg, rbase, r1)); -#else - tcg_out32 (s, STW | RS (data_reg2) | RA (r0)); - tcg_out32 (s, STW | RS (data_reg) | RA (r0) | 4); -#endif + tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4); + tcg_out32(s, STWBRX | SAB(datalo, rbase, addrlo)); + tcg_out32(s, STWBRX | SAB(datahi, rbase, TCG_REG_R0)); + } else if (rbase != 0) { + tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4); + tcg_out32(s, STWX | SAB(datahi, rbase, addrlo)); + tcg_out32(s, STWX | SAB(datalo, rbase, TCG_REG_R0)); + } else { + tcg_out32(s, STW | RS(datahi) | RA(addrlo)); + tcg_out32(s, STW | RS(datalo) | RA(addrlo) | 4); } break; } #ifdef CONFIG_SOFTMMU - add_qemu_ldst_label (s, - 0, - opc, - data_reg, - data_reg2, - addr_reg, - addr_reg2, - mem_index, - s->code_ptr, - label_ptr); + add_qemu_ldst_label(s, 0, opc, datalo, datahi, addrlo, addrhi, + mem_index, s->code_ptr, label_ptr); #endif } @@ -1970,7 +1922,7 @@ static const TCGTargetOpDef ppc_op_defs[] = { { INDEX_op_qemu_ld16u, { "r", "L" } }, { INDEX_op_qemu_ld16s, { "r", "L" } }, { INDEX_op_qemu_ld32, { "r", "L" } }, - { INDEX_op_qemu_ld64, { "r", "r", "L" } }, + { INDEX_op_qemu_ld64, { "L", "L", "L" } }, { INDEX_op_qemu_st8, { "K", "K" } }, { INDEX_op_qemu_st16, { "K", "K" } }, @@ -1982,7 +1934,7 @@ static const TCGTargetOpDef ppc_op_defs[] = { { INDEX_op_qemu_ld16u, { "r", "L", "L" } }, { INDEX_op_qemu_ld16s, { "r", "L", "L" } }, { INDEX_op_qemu_ld32, { "r", "L", "L" } }, - { INDEX_op_qemu_ld64, { "r", "L", "L", "L" } }, + { INDEX_op_qemu_ld64, { "L", "L", "L", "L" } }, { INDEX_op_qemu_st8, { "K", "K", "K" } }, { INDEX_op_qemu_st16, { "K", "K", "K" } }, -- 1.8.3.1 ^ permalink raw reply related [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-09-09 18:21 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson 2013-09-07 9:46 ` Paolo Bonzini 2013-09-09 17:42 ` Richard Henderson 2013-09-09 17:49 ` Paolo Bonzini 2013-09-09 18:20 ` Richard Henderson 2013-09-01 16:07 ` [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).