* [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements
@ 2013-09-01 16:07 Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson
` (3 more replies)
0 siblings, 4 replies; 9+ messages in thread
From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
The first patch allows me to build the ppc32 target on a ppc64 host.
The rest of them update the ppc32 backend on top of the v2 "Further
tcg ldst improvements" patch set.
r~
Richard Henderson (4):
configure: Allow command-line configure for ppc32
tcg-ppc: Avoid code for nop move
tcg-ppc: Convert to helper_ret_ld/st_mmu
tcg-ppc: Fix and cleanup tcg_out_tlb_check
configure | 8 +
include/exec/exec-all.h | 4 +-
tcg/ppc/tcg-target.c | 530 ++++++++++++++++++++----------------------------
3 files changed, 226 insertions(+), 316 deletions(-)
--
1.8.3.1
^ permalink raw reply [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32
2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson
@ 2013-09-01 16:07 ` Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move Richard Henderson
` (2 subsequent siblings)
3 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
Similar to manually selecting i386 for an x86_64 host.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
configure | 8 ++++++++
1 file changed, 8 insertions(+)
diff --git a/configure b/configure
index 0a55c20..07eaa3c 100755
--- a/configure
+++ b/configure
@@ -951,6 +951,14 @@ for opt do
done
case "$cpu" in
+ ppc)
+ CPU_CFLAGS="-m32"
+ LDFLAGS="-m32 $LDFLAGS"
+ ;;
+ ppc64)
+ CPU_CFLAGS="-m64"
+ LDFLAGS="-m64 $LDFLAGS"
+ ;;
sparc)
LDFLAGS="-m32 $LDFLAGS"
CPU_CFLAGS="-m32 -mcpu=ultrasparc"
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move
2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson
@ 2013-09-01 16:07 ` Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check Richard Henderson
3 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), aurelien, Richard Henderson
While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc/tcg-target.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index 9a73d06..b0fbc54 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -450,7 +450,9 @@ static const uint32_t tcg_to_bc[] = {
static void tcg_out_mov(TCGContext *s, TCGType type, TCGReg ret, TCGReg arg)
{
- tcg_out32 (s, OR | SAB (arg, ret, arg));
+ if (ret != arg) {
+ tcg_out32(s, OR | SAB(arg, ret, arg));
+ }
}
static void tcg_out_movi(TCGContext *s, TCGType type,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu
2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move Richard Henderson
@ 2013-09-01 16:07 ` Richard Henderson
2013-09-07 9:46 ` Paolo Bonzini
2013-09-01 16:07 ` [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check Richard Henderson
3 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), aurelien, Richard Henderson
Drop the ld/st_trampolines, loading the return address into a
parameter register directly.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
include/exec/exec-all.h | 4 +-
tcg/ppc/tcg-target.c | 220 +++++++++++++++++++-----------------------------
2 files changed, 86 insertions(+), 138 deletions(-)
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index beb4149..a81e805 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -324,9 +324,7 @@ extern uintptr_t tci_tb_ptr;
In some implementations, we pass the "logical" return address manually;
in others, we must infer the logical return from the true return. */
#if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU)
-# if defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
-# define GETRA_LDST(RA) (*(int32_t *)((RA) - 4))
-# elif defined(__arm__)
+# if defined(__arm__)
/* We define two insns between the return address and the branch back to
straight-line. Find and decode that branch insn. */
# define GETRA_LDST(RA) tcg_getra_ldst(RA)
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index b0fbc54..a890319 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -551,27 +551,26 @@ static void add_qemu_ldst_label (TCGContext *s,
label->label_ptr[0] = label_ptr;
}
-/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
- int mmu_idx) */
+/* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
+ * int mmu_idx, uintptr_t ra)
+ */
static const void * const qemu_ld_helpers[4] = {
- helper_ldb_mmu,
- helper_ldw_mmu,
- helper_ldl_mmu,
- helper_ldq_mmu,
+ helper_ret_ldub_mmu,
+ helper_ret_lduw_mmu,
+ helper_ret_ldul_mmu,
+ helper_ret_ldq_mmu,
};
-/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
- uintxx_t val, int mmu_idx) */
+/* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
+ * uintxx_t val, int mmu_idx, uintptr_t ra)
+ */
static const void * const qemu_st_helpers[4] = {
- helper_stb_mmu,
- helper_stw_mmu,
- helper_stl_mmu,
- helper_stq_mmu,
+ helper_ret_stb_mmu,
+ helper_ret_stw_mmu,
+ helper_ret_stl_mmu,
+ helper_ret_stq_mmu,
};
-static void *ld_trampolines[4];
-static void *st_trampolines[4];
-
static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2,
int addr_reg, int addr_reg2, int s_bits,
int offset1, int offset2, uint8_t **label_ptr)
@@ -608,9 +607,14 @@ static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2,
tcg_out32 (s, CMP | BF (6) | RA (addr_reg2) | RB (r1));
tcg_out32 (s, CRAND | BT (7, CR_EQ) | BA (6, CR_EQ) | BB (7, CR_EQ));
#endif
+
+ /* Use a conditional branch-and-link so that we load a pointer to
+ somewhere within the current opcode, for passing on to the helper.
+ This address cannot be used for a tail call, but it's shorter
+ than forming an address from scratch. */
*label_ptr = s->code_ptr;
retranst = ((uint16_t *) s->code_ptr)[1] & ~3;
- tcg_out32 (s, BC | BI (7, CR_EQ) | retranst | BO_COND_FALSE);
+ tcg_out32(s, BC | BI(7, CR_EQ) | retranst | BO_COND_FALSE | LK);
/* r0 now contains &env->tlb_table[mem_index][index].addr_x */
tcg_out32 (s, (LWZ
@@ -833,132 +837,99 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
}
#if defined(CONFIG_SOFTMMU)
-static void tcg_out_qemu_ld_slow_path (TCGContext *s, TCGLabelQemuLdst *label)
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
{
- int s_bits;
- int ir;
- int opc = label->opc;
- int mem_index = label->mem_index;
- int data_reg = label->datalo_reg;
- int data_reg2 = label->datahi_reg;
- int addr_reg = label->addrlo_reg;
- uint8_t *raddr = label->raddr;
- uint8_t **label_ptr = &label->label_ptr[0];
+ TCGReg ir, datalo, datahi;
+ int opc = lb->opc;
- s_bits = opc & 3;
+ reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
- /* resolve label address */
- reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr);
+ ir = TCG_REG_R3;
+ tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0);
- /* slow path */
- ir = 4;
-#if TARGET_LONG_BITS == 32
- tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
-#else
+ if (TARGET_LONG_BITS == 32) {
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
+ } else {
#ifdef TCG_TARGET_CALL_ALIGN_ARGS
- ir |= 1;
-#endif
- tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg);
- tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
+ ir |= 1;
#endif
- tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index);
- tcg_out_call (s, (tcg_target_long) ld_trampolines[s_bits], 1);
- tcg_out32 (s, (tcg_target_long) raddr);
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg);
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
+ }
+
+ tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index);
+ tcg_out32(s, MFSPR | RT(ir++) | LR);
+
+ tcg_out_call(s, (uintptr_t)qemu_ld_helpers[opc & 3], 1);
+
+ datalo = lb->datalo_reg;
switch (opc) {
case 0|4:
- tcg_out32 (s, EXTSB | RA (data_reg) | RS (3));
+ tcg_out32(s, EXTSB | RA(datalo) | RS(TCG_REG_R3));
break;
case 1|4:
- tcg_out32 (s, EXTSH | RA (data_reg) | RS (3));
+ tcg_out32(s, EXTSH | RA(datalo) | RS(TCG_REG_R3));
break;
- case 0:
- case 1:
- case 2:
- if (data_reg != 3)
- tcg_out_mov (s, TCG_TYPE_I32, data_reg, 3);
+
+ default:
+ tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R3);
break;
+
case 3:
- if (data_reg == 3) {
- if (data_reg2 == 4) {
- tcg_out_mov (s, TCG_TYPE_I32, 0, 4);
- tcg_out_mov (s, TCG_TYPE_I32, 4, 3);
- tcg_out_mov (s, TCG_TYPE_I32, 3, 0);
- }
- else {
- tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3);
- tcg_out_mov (s, TCG_TYPE_I32, 3, 4);
- }
- }
- else {
- if (data_reg != 4) tcg_out_mov (s, TCG_TYPE_I32, data_reg, 4);
- if (data_reg2 != 3) tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3);
+ datahi = lb->datahi_reg;
+ if (datalo != TCG_REG_R3) {
+ tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4);
+ tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3);
+ } else if (datahi != TCG_REG_R4) {
+ tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3);
+ tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4);
+ } else {
+ tcg_out_mov(s, TCG_TYPE_I32, TCG_REG_R0, TCG_REG_R4);
+ tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3);
+ tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R0);
}
break;
}
+
/* Jump to the code corresponding to next IR of qemu_st */
- tcg_out_b (s, 0, (tcg_target_long) raddr);
+ tcg_out_b(s, 0, (uintptr_t)lb->raddr);
}
-static void tcg_out_qemu_st_slow_path (TCGContext *s, TCGLabelQemuLdst *label)
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
{
- int ir;
- int opc = label->opc;
- int mem_index = label->mem_index;
- int data_reg = label->datalo_reg;
- int data_reg2 = label->datahi_reg;
- int addr_reg = label->addrlo_reg;
- uint8_t *raddr = label->raddr;
- uint8_t **label_ptr = &label->label_ptr[0];
-
- /* resolve label address */
- reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr);
-
- /* slow path */
- ir = 4;
-#if TARGET_LONG_BITS == 32
- tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
-#else
+ TCGReg ir;
+
+ reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
+
+ ir = TCG_REG_R3;
+ tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0);
+
+ if (TARGET_LONG_BITS == 32) {
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
+ } else {
#ifdef TCG_TARGET_CALL_ALIGN_ARGS
- ir |= 1;
-#endif
- tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg);
- tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
+ ir |= 1;
#endif
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg);
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
+ }
- switch (opc) {
- case 0:
- tcg_out32 (s, (RLWINM
- | RA (ir)
- | RS (data_reg)
- | SH (0)
- | MB (24)
- | ME (31)));
- break;
- case 1:
- tcg_out32 (s, (RLWINM
- | RA (ir)
- | RS (data_reg)
- | SH (0)
- | MB (16)
- | ME (31)));
- break;
- case 2:
- tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg);
- break;
- case 3:
+ if (lb->opc != 3) {
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg);
+ } else {
#ifdef TCG_TARGET_CALL_ALIGN_ARGS
ir |= 1;
#endif
- tcg_out_mov (s, TCG_TYPE_I32, ir++, data_reg2);
- tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg);
- break;
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datahi_reg);
+ tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg);
}
- ir++;
- tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index);
- tcg_out_call (s, (tcg_target_long) st_trampolines[opc], 1);
- tcg_out32 (s, (tcg_target_long) raddr);
- tcg_out_b (s, 0, (tcg_target_long) raddr);
+ tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index);
+ tcg_out32(s, MFSPR | RT(ir++) | LR);
+
+ tcg_out_call(s, (uintptr_t)qemu_st_helpers[lb->opc], 1);
+
+ tcg_out_b(s, 0, (uintptr_t)lb->raddr);
}
void tcg_out_tb_finalize(TCGContext *s)
@@ -979,17 +950,6 @@ void tcg_out_tb_finalize(TCGContext *s)
}
#endif
-#ifdef CONFIG_SOFTMMU
-static void emit_ldst_trampoline (TCGContext *s, const void *ptr)
-{
- tcg_out32 (s, MFSPR | RT (3) | LR);
- tcg_out32 (s, ADDI | RT (3) | RA (3) | 4);
- tcg_out32 (s, MTSPR | RS (3) | LR);
- tcg_out_mov (s, TCG_TYPE_I32, 3, TCG_AREG0);
- tcg_out_b (s, 0, (tcg_target_long) ptr);
-}
-#endif
-
static void tcg_target_qemu_prologue (TCGContext *s)
{
int i, frame_size;
@@ -1050,16 +1010,6 @@ static void tcg_target_qemu_prologue (TCGContext *s)
tcg_out32 (s, MTSPR | RS (0) | LR);
tcg_out32 (s, ADDI | RT (1) | RA (1) | frame_size);
tcg_out32 (s, BCLR | BO_ALWAYS);
-
-#ifdef CONFIG_SOFTMMU
- for (i = 0; i < 4; ++i) {
- ld_trampolines[i] = s->code_ptr;
- emit_ldst_trampoline (s, qemu_ld_helpers[i]);
-
- st_trampolines[i] = s->code_ptr;
- emit_ldst_trampoline (s, qemu_st_helpers[i]);
- }
-#endif
}
static void tcg_out_ld (TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check
2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson
` (2 preceding siblings ...)
2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson
@ 2013-09-01 16:07 ` Richard Henderson
3 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2013-09-01 16:07 UTC (permalink / raw)
To: qemu-devel; +Cc: Vassili Karpov (malc), aurelien, Richard Henderson
The fix is that sparc has so many mmu modes that the last one overflowed
the 16-bit signed offset we assumed would fit. Handle this, and check
the new assumption at compile time.
Load the tlb addend earlier for the fast path.
Remove the explicit address + addend and make use of index addressing.
Adjust constraints for qemu_ld64 such that we don't clobber the address
register or tlb addend before loading both values.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/ppc/tcg-target.c | 302 ++++++++++++++++++++++-----------------------------
1 file changed, 127 insertions(+), 175 deletions(-)
diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
index a890319..e190422 100644
--- a/tcg/ppc/tcg-target.c
+++ b/tcg/ppc/tcg-target.c
@@ -571,42 +571,72 @@ static const void * const qemu_st_helpers[4] = {
helper_ret_stq_mmu,
};
-static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2,
- int addr_reg, int addr_reg2, int s_bits,
- int offset1, int offset2, uint8_t **label_ptr)
+/* Perform the TLB load and compare. Branches to the slow path, placing the
+ address of the branch in *LABEL_PTR. Loads the addend of the TLB into R0.
+ Clobbers R1 and R2. */
+
+static void tcg_out_tlb_check(TCGContext *s, TCGReg r0, TCGReg r1, TCGReg r2,
+ TCGReg addrlo, TCGReg addrhi, int s_bits,
+ int mem_index, int is_load, uint8_t **label_ptr)
{
+ int cmp_off =
+ (is_load
+ ? offsetof(CPUArchState, tlb_table[mem_index][0].addr_read)
+ : offsetof(CPUArchState, tlb_table[mem_index][0].addr_write));
+ int add_off = offsetof(CPUArchState, tlb_table[mem_index][0].addend);
uint16_t retranst;
+ TCGReg base = TCG_AREG0;
+
+ /* Extract the page index, shifted into place for tlb index. */
+ tcg_out32(s, (RLWINM
+ | RA(r0)
+ | RS(addrlo)
+ | SH(32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS))
+ | MB(32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS))
+ | ME(31 - CPU_TLB_ENTRY_BITS)));
+
+ /* Compensate for very large offsets. */
+ if (add_off >= 0x8000) {
+ /* Most target env are smaller than 32k; none are larger than 64k.
+ Simplify the logic here merely to offset by 0x8000, giving us a
+ range just shy of 64k. Check this assumption. */
+ QEMU_BUILD_BUG_ON(offsetof(CPUArchState,
+ tlb_table[NB_MMU_MODES - 1][1])
+ > 0x8000 + 0x7fff);
+ tcg_out32(s, ADDI | RT(r1) | RA(base) | 0x8000);
+ base = r1;
+ cmp_off -= 0x8000;
+ add_off -= 0x8000;
+ }
- tcg_out32 (s, (RLWINM
- | RA (r0)
- | RS (addr_reg)
- | SH (32 - (TARGET_PAGE_BITS - CPU_TLB_ENTRY_BITS))
- | MB (32 - (CPU_TLB_BITS + CPU_TLB_ENTRY_BITS))
- | ME (31 - CPU_TLB_ENTRY_BITS)
- )
- );
- tcg_out32 (s, ADD | RT (r0) | RA (r0) | RB (TCG_AREG0));
- tcg_out32 (s, (LWZU
- | RT (r1)
- | RA (r0)
- | offset1
- )
- );
- tcg_out32 (s, (RLWINM
- | RA (r2)
- | RS (addr_reg)
- | SH (0)
- | MB ((32 - s_bits) & 31)
- | ME (31 - TARGET_PAGE_BITS)
- )
- );
+ /* Clear the non-page, non-alignment bits from the address. */
+ tcg_out32(s, (RLWINM
+ | RA(r2)
+ | RS(addrlo)
+ | SH(0)
+ | MB((32 - s_bits) & 31)
+ | ME(31 - TARGET_PAGE_BITS)));
- tcg_out32 (s, CMP | BF (7) | RA (r2) | RB (r1));
-#if TARGET_LONG_BITS == 64
- tcg_out32 (s, LWZ | RT (r1) | RA (r0) | 4);
- tcg_out32 (s, CMP | BF (6) | RA (addr_reg2) | RB (r1));
- tcg_out32 (s, CRAND | BT (7, CR_EQ) | BA (6, CR_EQ) | BB (7, CR_EQ));
-#endif
+ tcg_out32(s, ADD | RT(r0) | RA(r0) | RB(base));
+ base = r0;
+
+ /* Load the tlb comparator. */
+ tcg_out32(s, LWZ | RT(r1) | RA(base) | (cmp_off & 0xffff));
+
+ tcg_out32(s, CMP | BF(7) | RA(r2) | RB(r1));
+
+ if (TARGET_LONG_BITS == 64) {
+ tcg_out32(s, LWZ | RT(r1) | RA(base) | ((cmp_off + 4) & 0xffff));
+ }
+
+ /* Load the tlb addend for use on the fast path.
+ Do this asap to minimize load delay. */
+ tcg_out32(s, LWZ | RT(r0) | RA(base) | (add_off & 0xffff));
+
+ if (TARGET_LONG_BITS == 64) {
+ tcg_out32(s, CMP | BF(6) | RA(addrhi) | RB(r1));
+ tcg_out32(s, CRAND | BT(7, CR_EQ) | BA(6, CR_EQ) | BB(7, CR_EQ));
+ }
/* Use a conditional branch-and-link so that we load a pointer to
somewhere within the current opcode, for passing on to the helper.
@@ -615,58 +645,31 @@ static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2,
*label_ptr = s->code_ptr;
retranst = ((uint16_t *) s->code_ptr)[1] & ~3;
tcg_out32(s, BC | BI(7, CR_EQ) | retranst | BO_COND_FALSE | LK);
-
- /* r0 now contains &env->tlb_table[mem_index][index].addr_x */
- tcg_out32 (s, (LWZ
- | RT (r0)
- | RA (r0)
- | offset2
- )
- );
- /* r0 = env->tlb_table[mem_index][index].addend */
- tcg_out32 (s, ADD | RT (r0) | RA (r0) | RB (addr_reg));
- /* r0 = env->tlb_table[mem_index][index].addend + addr */
-
}
#endif
static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
{
- int addr_reg, data_reg, data_reg2, r0, r1, rbase, bswap;
+ TCGReg addrlo, datalo, datahi, rbase;
+ int bswap;
#ifdef CONFIG_SOFTMMU
- int mem_index, s_bits, r2, addr_reg2;
+ int mem_index;
+ TCGReg addrhi;
uint8_t *label_ptr;
#endif
- data_reg = *args++;
- if (opc == 3)
- data_reg2 = *args++;
- else
- data_reg2 = 0;
- addr_reg = *args++;
+ datalo = *args++;
+ datahi = (opc == 3 ? *args++ : 0);
+ addrlo = *args++;
#ifdef CONFIG_SOFTMMU
-#if TARGET_LONG_BITS == 64
- addr_reg2 = *args++;
-#else
- addr_reg2 = 0;
-#endif
+ addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
mem_index = *args;
- s_bits = opc & 3;
- r0 = 3;
- r1 = 4;
- r2 = 0;
- rbase = 0;
-
- tcg_out_tlb_check (
- s, r0, r1, r2, addr_reg, addr_reg2, s_bits,
- offsetof (CPUArchState, tlb_table[mem_index][0].addr_read),
- offsetof (CPUTLBEntry, addend) - offsetof (CPUTLBEntry, addr_read),
- &label_ptr
- );
+
+ tcg_out_tlb_check(s, TCG_REG_R3, TCG_REG_R4, TCG_REG_R0, addrlo,
+ addrhi, opc & 3, mem_index, 0, &label_ptr);
+ rbase = TCG_REG_R3;
#else /* !CONFIG_SOFTMMU */
- r0 = addr_reg;
- r1 = 3;
rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
#endif
@@ -679,106 +682,72 @@ static void tcg_out_qemu_ld (TCGContext *s, const TCGArg *args, int opc)
switch (opc) {
default:
case 0:
- tcg_out32 (s, LBZX | TAB (data_reg, rbase, r0));
+ tcg_out32(s, LBZX | TAB(datalo, rbase, addrlo));
break;
case 0|4:
- tcg_out32 (s, LBZX | TAB (data_reg, rbase, r0));
- tcg_out32 (s, EXTSB | RA (data_reg) | RS (data_reg));
+ tcg_out32(s, LBZX | TAB(datalo, rbase, addrlo));
+ tcg_out32(s, EXTSB | RA(datalo) | RS(datalo));
break;
case 1:
- if (bswap)
- tcg_out32 (s, LHBRX | TAB (data_reg, rbase, r0));
- else
- tcg_out32 (s, LHZX | TAB (data_reg, rbase, r0));
+ tcg_out32(s, (bswap ? LHBRX : LHZX) | TAB(datalo, rbase, addrlo));
break;
case 1|4:
if (bswap) {
- tcg_out32 (s, LHBRX | TAB (data_reg, rbase, r0));
- tcg_out32 (s, EXTSH | RA (data_reg) | RS (data_reg));
+ tcg_out32(s, LHBRX | TAB(datalo, rbase, addrlo));
+ tcg_out32(s, EXTSH | RA(datalo) | RS(datalo));
+ } else {
+ tcg_out32(s, LHAX | TAB(datalo, rbase, addrlo));
}
- else tcg_out32 (s, LHAX | TAB (data_reg, rbase, r0));
break;
case 2:
- if (bswap)
- tcg_out32 (s, LWBRX | TAB (data_reg, rbase, r0));
- else
- tcg_out32 (s, LWZX | TAB (data_reg, rbase, r0));
+ tcg_out32(s, (bswap ? LWBRX : LWZX) | TAB(datalo, rbase, addrlo));
break;
case 3:
if (bswap) {
- tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4);
- tcg_out32 (s, LWBRX | TAB (data_reg, rbase, r0));
- tcg_out32 (s, LWBRX | TAB (data_reg2, rbase, r1));
- }
- else {
-#ifdef CONFIG_USE_GUEST_BASE
- tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4);
- tcg_out32 (s, LWZX | TAB (data_reg2, rbase, r0));
- tcg_out32 (s, LWZX | TAB (data_reg, rbase, r1));
-#else
- if (r0 == data_reg2) {
- tcg_out32 (s, LWZ | RT (0) | RA (r0));
- tcg_out32 (s, LWZ | RT (data_reg) | RA (r0) | 4);
- tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 0);
- }
- else {
- tcg_out32 (s, LWZ | RT (data_reg2) | RA (r0));
- tcg_out32 (s, LWZ | RT (data_reg) | RA (r0) | 4);
- }
-#endif
+ tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4);
+ tcg_out32(s, LWBRX | TAB(datalo, rbase, addrlo));
+ tcg_out32(s, LWBRX | TAB(datahi, rbase, TCG_REG_R0));
+ } else if (rbase != 0) {
+ tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4);
+ tcg_out32(s, LWZX | TAB(datahi, rbase, addrlo));
+ tcg_out32(s, LWZX | TAB(datalo, rbase, TCG_REG_R0));
+ } else if (addrlo == datahi) {
+ tcg_out32(s, LWZ | RT(datalo) | RA(addrlo) | 4);
+ tcg_out32(s, LWZ | RT(datahi) | RA(addrlo));
+ } else {
+ tcg_out32(s, LWZ | RT(datahi) | RA(addrlo));
+ tcg_out32(s, LWZ | RT(datalo) | RA(addrlo) | 4);
}
break;
}
#ifdef CONFIG_SOFTMMU
- add_qemu_ldst_label (s,
- 1,
- opc,
- data_reg,
- data_reg2,
- addr_reg,
- addr_reg2,
- mem_index,
- s->code_ptr,
- label_ptr);
+ add_qemu_ldst_label(s, 1, opc, datalo, datahi, addrlo,
+ addrhi, mem_index, s->code_ptr, label_ptr);
#endif
}
static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
{
- int addr_reg, r0, r1, data_reg, data_reg2, bswap, rbase;
+ TCGReg addrlo, datalo, datahi, rbase;
+ int bswap;
#ifdef CONFIG_SOFTMMU
- int mem_index, r2, addr_reg2;
+ int mem_index;
+ TCGReg addrhi;
uint8_t *label_ptr;
#endif
- data_reg = *args++;
- if (opc == 3)
- data_reg2 = *args++;
- else
- data_reg2 = 0;
- addr_reg = *args++;
+ datalo = *args++;
+ datahi = (opc == 3 ? *args++ : 0);
+ addrlo = *args++;
#ifdef CONFIG_SOFTMMU
-#if TARGET_LONG_BITS == 64
- addr_reg2 = *args++;
-#else
- addr_reg2 = 0;
-#endif
+ addrhi = (TARGET_LONG_BITS == 64 ? *args++ : 0);
mem_index = *args;
- r0 = 3;
- r1 = 4;
- r2 = 0;
- rbase = 0;
-
- tcg_out_tlb_check (
- s, r0, r1, r2, addr_reg, addr_reg2, opc & 3,
- offsetof (CPUArchState, tlb_table[mem_index][0].addr_write),
- offsetof (CPUTLBEntry, addend) - offsetof (CPUTLBEntry, addr_write),
- &label_ptr
- );
+
+ tcg_out_tlb_check(s, TCG_REG_R3, TCG_REG_R4, TCG_REG_R0, addrlo,
+ addrhi, opc & 3, mem_index, 0, &label_ptr);
+ rbase = TCG_REG_R3;
#else /* !CONFIG_SOFTMMU */
- r0 = addr_reg;
- r1 = 3;
rbase = GUEST_BASE ? TCG_GUEST_BASE_REG : 0;
#endif
@@ -789,50 +758,33 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
#endif
switch (opc) {
case 0:
- tcg_out32 (s, STBX | SAB (data_reg, rbase, r0));
+ tcg_out32(s, STBX | SAB(datalo, rbase, addrlo));
break;
case 1:
- if (bswap)
- tcg_out32 (s, STHBRX | SAB (data_reg, rbase, r0));
- else
- tcg_out32 (s, STHX | SAB (data_reg, rbase, r0));
+ tcg_out32(s, (bswap ? STHBRX : STHX) | SAB(datalo, rbase, addrlo));
break;
case 2:
- if (bswap)
- tcg_out32 (s, STWBRX | SAB (data_reg, rbase, r0));
- else
- tcg_out32 (s, STWX | SAB (data_reg, rbase, r0));
+ tcg_out32(s, (bswap ? STWBRX : STWX) | SAB(datalo, rbase, addrlo));
break;
case 3:
if (bswap) {
- tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4);
- tcg_out32 (s, STWBRX | SAB (data_reg, rbase, r0));
- tcg_out32 (s, STWBRX | SAB (data_reg2, rbase, r1));
- }
- else {
-#ifdef CONFIG_USE_GUEST_BASE
- tcg_out32 (s, STWX | SAB (data_reg2, rbase, r0));
- tcg_out32 (s, ADDI | RT (r1) | RA (r0) | 4);
- tcg_out32 (s, STWX | SAB (data_reg, rbase, r1));
-#else
- tcg_out32 (s, STW | RS (data_reg2) | RA (r0));
- tcg_out32 (s, STW | RS (data_reg) | RA (r0) | 4);
-#endif
+ tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4);
+ tcg_out32(s, STWBRX | SAB(datalo, rbase, addrlo));
+ tcg_out32(s, STWBRX | SAB(datahi, rbase, TCG_REG_R0));
+ } else if (rbase != 0) {
+ tcg_out32(s, ADDI | RT(TCG_REG_R0) | RA(addrlo) | 4);
+ tcg_out32(s, STWX | SAB(datahi, rbase, addrlo));
+ tcg_out32(s, STWX | SAB(datalo, rbase, TCG_REG_R0));
+ } else {
+ tcg_out32(s, STW | RS(datahi) | RA(addrlo));
+ tcg_out32(s, STW | RS(datalo) | RA(addrlo) | 4);
}
break;
}
#ifdef CONFIG_SOFTMMU
- add_qemu_ldst_label (s,
- 0,
- opc,
- data_reg,
- data_reg2,
- addr_reg,
- addr_reg2,
- mem_index,
- s->code_ptr,
- label_ptr);
+ add_qemu_ldst_label(s, 0, opc, datalo, datahi, addrlo, addrhi,
+ mem_index, s->code_ptr, label_ptr);
#endif
}
@@ -1970,7 +1922,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
{ INDEX_op_qemu_ld16u, { "r", "L" } },
{ INDEX_op_qemu_ld16s, { "r", "L" } },
{ INDEX_op_qemu_ld32, { "r", "L" } },
- { INDEX_op_qemu_ld64, { "r", "r", "L" } },
+ { INDEX_op_qemu_ld64, { "L", "L", "L" } },
{ INDEX_op_qemu_st8, { "K", "K" } },
{ INDEX_op_qemu_st16, { "K", "K" } },
@@ -1982,7 +1934,7 @@ static const TCGTargetOpDef ppc_op_defs[] = {
{ INDEX_op_qemu_ld16u, { "r", "L", "L" } },
{ INDEX_op_qemu_ld16s, { "r", "L", "L" } },
{ INDEX_op_qemu_ld32, { "r", "L", "L" } },
- { INDEX_op_qemu_ld64, { "r", "L", "L", "L" } },
+ { INDEX_op_qemu_ld64, { "L", "L", "L", "L" } },
{ INDEX_op_qemu_st8, { "K", "K", "K" } },
{ INDEX_op_qemu_st16, { "K", "K", "K" } },
--
1.8.3.1
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu
2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson
@ 2013-09-07 9:46 ` Paolo Bonzini
2013-09-09 17:42 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2013-09-07 9:46 UTC (permalink / raw)
To: Richard Henderson; +Cc: Vassili Karpov (malc), qemu-devel, aurelien
On 09/01/2013 06:07 PM, Richard Henderson wrote:
> Drop the ld/st_trampolines, loading the return address into a
> parameter register directly.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> include/exec/exec-all.h | 4 +-
> tcg/ppc/tcg-target.c | 220 +++++++++++++++++++-----------------------------
> 2 files changed, 86 insertions(+), 138 deletions(-)
>
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index beb4149..a81e805 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -324,9 +324,7 @@ extern uintptr_t tci_tb_ptr;
> In some implementations, we pass the "logical" return address manually;
> in others, we must infer the logical return from the true return. */
> #if defined(CONFIG_QEMU_LDST_OPTIMIZATION) && defined(CONFIG_SOFTMMU)
> -# if defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
> -# define GETRA_LDST(RA) (*(int32_t *)((RA) - 4))
> -# elif defined(__arm__)
> +# if defined(__arm__)
> /* We define two insns between the return address and the branch back to
> straight-line. Find and decode that branch insn. */
> # define GETRA_LDST(RA) tcg_getra_ldst(RA)
> diff --git a/tcg/ppc/tcg-target.c b/tcg/ppc/tcg-target.c
> index b0fbc54..a890319 100644
> --- a/tcg/ppc/tcg-target.c
> +++ b/tcg/ppc/tcg-target.c
> @@ -551,27 +551,26 @@ static void add_qemu_ldst_label (TCGContext *s,
> label->label_ptr[0] = label_ptr;
> }
>
> -/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
> - int mmu_idx) */
> +/* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
> + * int mmu_idx, uintptr_t ra)
> + */
> static const void * const qemu_ld_helpers[4] = {
> - helper_ldb_mmu,
> - helper_ldw_mmu,
> - helper_ldl_mmu,
> - helper_ldq_mmu,
> + helper_ret_ldub_mmu,
> + helper_ret_lduw_mmu,
> + helper_ret_ldul_mmu,
> + helper_ret_ldq_mmu,
> };
>
> -/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
> - uintxx_t val, int mmu_idx) */
> +/* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
> + * uintxx_t val, int mmu_idx, uintptr_t ra)
> + */
> static const void * const qemu_st_helpers[4] = {
> - helper_stb_mmu,
> - helper_stw_mmu,
> - helper_stl_mmu,
> - helper_stq_mmu,
> + helper_ret_stb_mmu,
> + helper_ret_stw_mmu,
> + helper_ret_stl_mmu,
> + helper_ret_stq_mmu,
> };
>
> -static void *ld_trampolines[4];
> -static void *st_trampolines[4];
> -
> static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2,
> int addr_reg, int addr_reg2, int s_bits,
> int offset1, int offset2, uint8_t **label_ptr)
> @@ -608,9 +607,14 @@ static void tcg_out_tlb_check (TCGContext *s, int r0, int r1, int r2,
> tcg_out32 (s, CMP | BF (6) | RA (addr_reg2) | RB (r1));
> tcg_out32 (s, CRAND | BT (7, CR_EQ) | BA (6, CR_EQ) | BB (7, CR_EQ));
> #endif
> +
> + /* Use a conditional branch-and-link so that we load a pointer to
> + somewhere within the current opcode, for passing on to the helper.
> + This address cannot be used for a tail call, but it's shorter
> + than forming an address from scratch. */
> *label_ptr = s->code_ptr;
> retranst = ((uint16_t *) s->code_ptr)[1] & ~3;
> - tcg_out32 (s, BC | BI (7, CR_EQ) | retranst | BO_COND_FALSE);
> + tcg_out32(s, BC | BI(7, CR_EQ) | retranst | BO_COND_FALSE | LK);
>
> /* r0 now contains &env->tlb_table[mem_index][index].addr_x */
> tcg_out32 (s, (LWZ
> @@ -833,132 +837,99 @@ static void tcg_out_qemu_st (TCGContext *s, const TCGArg *args, int opc)
> }
>
> #if defined(CONFIG_SOFTMMU)
> -static void tcg_out_qemu_ld_slow_path (TCGContext *s, TCGLabelQemuLdst *label)
> +static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
> {
> - int s_bits;
> - int ir;
> - int opc = label->opc;
> - int mem_index = label->mem_index;
> - int data_reg = label->datalo_reg;
> - int data_reg2 = label->datahi_reg;
> - int addr_reg = label->addrlo_reg;
> - uint8_t *raddr = label->raddr;
> - uint8_t **label_ptr = &label->label_ptr[0];
> + TCGReg ir, datalo, datahi;
> + int opc = lb->opc;
>
> - s_bits = opc & 3;
> + reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
>
> - /* resolve label address */
> - reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr);
> + ir = TCG_REG_R3;
> + tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0);
>
> - /* slow path */
> - ir = 4;
> -#if TARGET_LONG_BITS == 32
> - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
> -#else
> + if (TARGET_LONG_BITS == 32) {
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
> + } else {
> #ifdef TCG_TARGET_CALL_ALIGN_ARGS
> - ir |= 1;
> -#endif
> - tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg);
> - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
> + ir |= 1;
> #endif
> - tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index);
> - tcg_out_call (s, (tcg_target_long) ld_trampolines[s_bits], 1);
> - tcg_out32 (s, (tcg_target_long) raddr);
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg);
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
> + }
> +
> + tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index);
> + tcg_out32(s, MFSPR | RT(ir++) | LR);
> +
> + tcg_out_call(s, (uintptr_t)qemu_ld_helpers[opc & 3], 1);
> +
> + datalo = lb->datalo_reg;
> switch (opc) {
> case 0|4:
> - tcg_out32 (s, EXTSB | RA (data_reg) | RS (3));
> + tcg_out32(s, EXTSB | RA(datalo) | RS(TCG_REG_R3));
> break;
> case 1|4:
> - tcg_out32 (s, EXTSH | RA (data_reg) | RS (3));
> + tcg_out32(s, EXTSH | RA(datalo) | RS(TCG_REG_R3));
> break;
> - case 0:
> - case 1:
> - case 2:
> - if (data_reg != 3)
> - tcg_out_mov (s, TCG_TYPE_I32, data_reg, 3);
> +
> + default:
> + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R3);
> break;
> +
> case 3:
> - if (data_reg == 3) {
> - if (data_reg2 == 4) {
> - tcg_out_mov (s, TCG_TYPE_I32, 0, 4);
> - tcg_out_mov (s, TCG_TYPE_I32, 4, 3);
> - tcg_out_mov (s, TCG_TYPE_I32, 3, 0);
> - }
> - else {
> - tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3);
> - tcg_out_mov (s, TCG_TYPE_I32, 3, 4);
> - }
> - }
> - else {
> - if (data_reg != 4) tcg_out_mov (s, TCG_TYPE_I32, data_reg, 4);
> - if (data_reg2 != 3) tcg_out_mov (s, TCG_TYPE_I32, data_reg2, 3);
> + datahi = lb->datahi_reg;
> + if (datalo != TCG_REG_R3) {
> + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4);
> + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3);
> + } else if (datahi != TCG_REG_R4) {
> + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3);
> + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R4);
> + } else {
> + tcg_out_mov(s, TCG_TYPE_I32, TCG_REG_R0, TCG_REG_R4);
> + tcg_out_mov(s, TCG_TYPE_I32, datahi, TCG_REG_R3);
> + tcg_out_mov(s, TCG_TYPE_I32, datalo, TCG_REG_R0);
> }
> break;
> }
> +
> /* Jump to the code corresponding to next IR of qemu_st */
> - tcg_out_b (s, 0, (tcg_target_long) raddr);
> + tcg_out_b(s, 0, (uintptr_t)lb->raddr);
> }
>
> -static void tcg_out_qemu_st_slow_path (TCGContext *s, TCGLabelQemuLdst *label)
> +static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
> {
> - int ir;
> - int opc = label->opc;
> - int mem_index = label->mem_index;
> - int data_reg = label->datalo_reg;
> - int data_reg2 = label->datahi_reg;
> - int addr_reg = label->addrlo_reg;
> - uint8_t *raddr = label->raddr;
> - uint8_t **label_ptr = &label->label_ptr[0];
> -
> - /* resolve label address */
> - reloc_pc14 (label_ptr[0], (tcg_target_long) s->code_ptr);
> -
> - /* slow path */
> - ir = 4;
> -#if TARGET_LONG_BITS == 32
> - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
> -#else
> + TCGReg ir;
> +
> + reloc_pc14(lb->label_ptr[0], (uintptr_t)s->code_ptr);
> +
> + ir = TCG_REG_R3;
> + tcg_out_mov(s, TCG_TYPE_PTR, ir++, TCG_AREG0);
> +
> + if (TARGET_LONG_BITS == 32) {
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
> + } else {
> #ifdef TCG_TARGET_CALL_ALIGN_ARGS
> - ir |= 1;
> -#endif
> - tcg_out_mov (s, TCG_TYPE_I32, ir++, label->addrhi_reg);
> - tcg_out_mov (s, TCG_TYPE_I32, ir++, addr_reg);
> + ir |= 1;
> #endif
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrhi_reg);
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->addrlo_reg);
> + }
>
> - switch (opc) {
> - case 0:
> - tcg_out32 (s, (RLWINM
> - | RA (ir)
> - | RS (data_reg)
> - | SH (0)
> - | MB (24)
> - | ME (31)));
> - break;
> - case 1:
> - tcg_out32 (s, (RLWINM
> - | RA (ir)
> - | RS (data_reg)
> - | SH (0)
> - | MB (16)
> - | ME (31)));
> - break;
> - case 2:
> - tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg);
> - break;
> - case 3:
> + if (lb->opc != 3) {
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg);
> + } else {
> #ifdef TCG_TARGET_CALL_ALIGN_ARGS
> ir |= 1;
> #endif
> - tcg_out_mov (s, TCG_TYPE_I32, ir++, data_reg2);
> - tcg_out_mov (s, TCG_TYPE_I32, ir, data_reg);
> - break;
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datahi_reg);
> + tcg_out_mov(s, TCG_TYPE_I32, ir++, lb->datalo_reg);
> }
> - ir++;
>
> - tcg_out_movi (s, TCG_TYPE_I32, ir, mem_index);
> - tcg_out_call (s, (tcg_target_long) st_trampolines[opc], 1);
> - tcg_out32 (s, (tcg_target_long) raddr);
> - tcg_out_b (s, 0, (tcg_target_long) raddr);
> + tcg_out_movi(s, TCG_TYPE_I32, ir++, lb->mem_index);
> + tcg_out32(s, MFSPR | RT(ir++) | LR);
> +
> + tcg_out_call(s, (uintptr_t)qemu_st_helpers[lb->opc], 1);
> +
> + tcg_out_b(s, 0, (uintptr_t)lb->raddr);
> }
>
> void tcg_out_tb_finalize(TCGContext *s)
> @@ -979,17 +950,6 @@ void tcg_out_tb_finalize(TCGContext *s)
> }
> #endif
>
> -#ifdef CONFIG_SOFTMMU
> -static void emit_ldst_trampoline (TCGContext *s, const void *ptr)
> -{
> - tcg_out32 (s, MFSPR | RT (3) | LR);
> - tcg_out32 (s, ADDI | RT (3) | RA (3) | 4);
> - tcg_out32 (s, MTSPR | RS (3) | LR);
> - tcg_out_mov (s, TCG_TYPE_I32, 3, TCG_AREG0);
> - tcg_out_b (s, 0, (tcg_target_long) ptr);
> -}
> -#endif
> -
> static void tcg_target_qemu_prologue (TCGContext *s)
> {
> int i, frame_size;
> @@ -1050,16 +1010,6 @@ static void tcg_target_qemu_prologue (TCGContext *s)
> tcg_out32 (s, MTSPR | RS (0) | LR);
> tcg_out32 (s, ADDI | RT (1) | RA (1) | frame_size);
> tcg_out32 (s, BCLR | BO_ALWAYS);
> -
> -#ifdef CONFIG_SOFTMMU
> - for (i = 0; i < 4; ++i) {
> - ld_trampolines[i] = s->code_ptr;
> - emit_ldst_trampoline (s, qemu_ld_helpers[i]);
> -
> - st_trampolines[i] = s->code_ptr;
> - emit_ldst_trampoline (s, qemu_st_helpers[i]);
> - }
> -#endif
> }
>
> static void tcg_out_ld (TCGContext *s, TCGType type, TCGReg ret, TCGReg arg1,
>
Bad news... with this patch, either with or without patch 2, trying to
execute sieve.flat from kvm-unit-tests (it doesn't matter if it is
compiled as 32-bit or 64-bit, and with both i386-softmmu and
x86_64-softmmu targets) fails as follows on my PowerBook:
qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000
EAX=00006d20 EBX=00007025 ECX=00000000 EDX=00000000
ESI=07fd7bd0 EDI=000f1930 EBP=07fd7b00 ESP=00006e0c
EIP=70270000 EFL=00000002 [-------] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0000 00000000 ffffffff 008f9300
CS =f000 000f0000 ffffffff 008f9b00
SS =0000 00000000 ffffffff 008f9300
DS =0000 00000000 ffffffff 008f9300
FS =0000 00000000 ffffffff 008f9300
GS =0000 00000000 ffffffff 008f9300
LDT=0000 00000000 0000ffff 00008200
TR =0000 00000000 0000ffff 00008b00
GDT= 000f69e0 00000037
IDT= 00000000 000003ff
CR0=00000010 CR2=00000000 CR3=00000000 CR4=00000000
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000044 CCD=00006df8 CCO=ADDL
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000
XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000
XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000
XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000
XMM07=00000000000000000000000000000000
Aborted
The failure happens as soon as the first hardware interrupt is serviced:
Servicing hardware INT=0x08
Servicing hardware INT=0x09
----------------
IN:
0x000fe98f: push %es
0x000fe990: push %ebp
0x000fe992: push %edi
0x000fe994: push %esi
0x000fe996: push %ebx
0x000fe998: sub $0x44,%esp
0x000fe99c: mov $0x40,%eax
0x000fe9a2: mov %ax,%es
0x000fe9a4: mov %es:0x6c,%edx
0x000fe9aa: inc %edx
0x000fe9ac: cmp $0x1800af,%edx
0x000fe9b3: jbe 0xfe9c6
----------------
IN:
0x000f77f4: xor %edx,%edx
0x000f77f7: calll *%ecx
qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000
Command line:
i386-softmmu/qemu-system-i386 -kernel sieve32.flat -serial stdio -device
isa-debug-exit,iobase=0xf4 -nographic
My two patches + 2/4 from this series work. I didn't try 4/4 because it
doesn't apply cleanly on top of my patches.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu
2013-09-07 9:46 ` Paolo Bonzini
@ 2013-09-09 17:42 ` Richard Henderson
2013-09-09 17:49 ` Paolo Bonzini
0 siblings, 1 reply; 9+ messages in thread
From: Richard Henderson @ 2013-09-09 17:42 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Vassili Karpov (malc), qemu-devel, aurelien
On 08/19/2013 12:42 PM, Paolo Bonzini wrote:
> Bad news... with this patch, either with or without patch 2, trying to execute
> sieve.flat from kvm-unit-tests (it doesn't matter if it is compiled as 32-bit
> or 64-bit, and with both i386-softmmu and x86_64-softmmu targets) fails as
> follows on my PowerBook:
>
> qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000
Hum. Are you sure it's anything related to the ppc backend at all? This
test doesn't work with an x86_64 host either.
qemu: fatal: Trying to execute code outside RAM or ROM at 0x004001ba
EAX=80000011 EBX=00009500 ECX=c0000080 EDX=00000000
ESI=00000000 EDI=00542000 EBP=00000000 ESP=0044abbc
EIP=004001ba EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
GDT= 0040800a 00000447
IDT= 00000000 000003ff
CR0=80000011 CR2=00000000 CR3=00407000 CR4=00000020
DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
DR6=ffff0ff0 DR7=00000400
CCS=00000000 CCD=00000000 CCO=SARL
EFER=0000000000000000
FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
Aborted (core dumped)
This happens after one of the writes to %cr0. Of course, the test works with
kvm enabled, so I don't blame the test so much as the target-i386 front end...
This is not new breakage, either. I've checked back through 1.4.0 and I can't
make it work with any version of TCG.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu
2013-09-09 17:42 ` Richard Henderson
@ 2013-09-09 17:49 ` Paolo Bonzini
2013-09-09 18:20 ` Richard Henderson
0 siblings, 1 reply; 9+ messages in thread
From: Paolo Bonzini @ 2013-09-09 17:49 UTC (permalink / raw)
To: Richard Henderson; +Cc: Vassili Karpov (malc), qemu-devel, aurelien
Il 09/09/2013 19:42, Richard Henderson ha scritto:
> On 08/19/2013 12:42 PM, Paolo Bonzini wrote:
>> Bad news... with this patch, either with or without patch 2, trying to execute
>> sieve.flat from kvm-unit-tests (it doesn't matter if it is compiled as 32-bit
>> or 64-bit, and with both i386-softmmu and x86_64-softmmu targets) fails as
>> follows on my PowerBook:
>>
>> qemu: fatal: Trying to execute code outside RAM or ROM at 0x70360000
>
> Hum. Are you sure it's anything related to the ppc backend at all? This
> test doesn't work with an x86_64 host either.
>
> qemu: fatal: Trying to execute code outside RAM or ROM at 0x004001ba
>
> EAX=80000011 EBX=00009500 ECX=c0000080 EDX=00000000
> ESI=00000000 EDI=00542000 EBP=00000000 ESP=0044abbc
> EIP=004001ba EFL=00000046 [---Z-P-] CPL=0 II=0 A20=1 SMM=0 HLT=0
> ES =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
> CS =0008 00000000 ffffffff 00cf9a00 DPL=0 CS32 [-R-]
> SS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
> DS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
> FS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
> GS =0010 00000000 ffffffff 00cf9300 DPL=0 DS [-WA]
> LDT=0000 00000000 0000ffff 00008200 DPL=0 LDT
> TR =0000 00000000 0000ffff 00008b00 DPL=0 TSS32-busy
> GDT= 0040800a 00000447
> IDT= 00000000 000003ff
> CR0=80000011 CR2=00000000 CR3=00407000 CR4=00000020
> DR0=00000000 DR1=00000000 DR2=00000000 DR3=00000000
> DR6=ffff0ff0 DR7=00000400
> CCS=00000000 CCD=00000000 CCO=SARL
> EFER=0000000000000000
> FCW=037f FSW=0000 [ST=0] FTW=00 MXCSR=00001f80
> FPR0=0000000000000000 0000 FPR1=0000000000000000 0000
> FPR2=0000000000000000 0000 FPR3=0000000000000000 0000
> FPR4=0000000000000000 0000 FPR5=0000000000000000 0000
> FPR6=0000000000000000 0000 FPR7=0000000000000000 0000
> XMM00=00000000000000000000000000000000 XMM01=00000000000000000000000000000000
> XMM02=00000000000000000000000000000000 XMM03=00000000000000000000000000000000
> XMM04=00000000000000000000000000000000 XMM05=00000000000000000000000000000000
> XMM06=00000000000000000000000000000000 XMM07=00000000000000000000000000000000
> Aborted (core dumped)
>
> This happens after one of the writes to %cr0. Of course, the test works with
> kvm enabled, so I don't blame the test so much as the target-i386 front end...
>
> This is not new breakage, either. I've checked back through 1.4.0 and I can't
> make it work with any version of TCG.
Strange... works here with 1.6.0 from Fedora
$ time qemu-system-x86_64 -device isa-debug-exit,iobase=0xf4 -serial
stdio -kernel sieve64.flat
enabling apic
starting sieve
static:78498 out of 1000000
paging enabled
cr0 = 80010011
cr3 = 7fff000
cr4 = 20
mapped:78498 out of 1000000
virtual:5761455 out of 100000000
virtual:5761455 out of 100000000
virtual:5761455 out of 100000000
real 0m50.056s
user 0m49.467s
sys 0m0.415s
I sent you my binaries offlist.
Paolo
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu
2013-09-09 17:49 ` Paolo Bonzini
@ 2013-09-09 18:20 ` Richard Henderson
0 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2013-09-09 18:20 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: Vassili Karpov (malc), qemu-devel, aurelien
On 09/09/2013 10:49 AM, Paolo Bonzini wrote:
> I sent you my binaries offlist.
And apparently there was something wrong with the binaries I built myself, as
yours work. I'll now look at my ppc32 changes and see what's what.
r~
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2013-09-09 18:21 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-09-01 16:07 [Qemu-devel] [PATCH 0/4] tcg-ppc ldst improvements Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 1/4] configure: Allow command-line configure for ppc32 Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 2/4] tcg-ppc: Avoid code for nop move Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 3/4] tcg-ppc: Convert to helper_ret_ld/st_mmu Richard Henderson
2013-09-07 9:46 ` Paolo Bonzini
2013-09-09 17:42 ` Richard Henderson
2013-09-09 17:49 ` Paolo Bonzini
2013-09-09 18:20 ` Richard Henderson
2013-09-01 16:07 ` [Qemu-devel] [PATCH 4/4] tcg-ppc: Fix and cleanup tcg_out_tlb_check Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).