qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization
@ 2013-08-05 18:07 Richard Henderson
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 1/8] tcg-i386: Add and use tcg_out64 Richard Henderson
                   ` (7 more replies)
  0 siblings, 8 replies; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel

The existing code for the i386 ldst optimization does

	jmps	.+5
	jmpl	restart
	jmpl	restart

for the store path.  This is idiotic to say the least.  Especially
for x86_64, where we have available parameter registers.  We replace
that with a simple

	leaq	restart(%rip), %rdx

and we're also able to discard all of the code in the _mmu path that
decodes that "jmpl restart" to find the return address.

For arm, we have no free parameter registers, but we can generate a
conditional call instruction *into* the slow path, and then tail-call
from the slow path into the generic code.  This gets us the return
address set up exactly as we'd like, with the restriction that we
must instruct TCG to use the return value register for all loads.
This turns out to not be much of a restriction in practice.


r~


Richard Henderson (8):
  tcg-i386: Add and use tcg_out64
  tcg-i386: Try pc-relative lea for constant formation
  tcg-i386: Tidy qemu_ld/st slow path
  tcg: Add mmu helpers that take a return address argument
  tcg: Tidy softmmu_template.h
  tcg-i386: Use new return-argument ld/st helpers
  tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64
  tcg-arm: Rearrange slow-path qemu_ld/st

 include/exec/exec-all.h         |  36 +----
 include/exec/softmmu_defs.h     |  46 +++---
 include/exec/softmmu_template.h | 309 +++++++++++++++------------------------
 tcg/arm/tcg-target.c            | 313 ++++++++++++++++++++++------------------
 tcg/i386/tcg-target.c           | 259 +++++++++++++++------------------
 tcg/tcg.c                       |   6 +
 6 files changed, 449 insertions(+), 520 deletions(-)

-- 
1.8.3.1

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 1/8] tcg-i386: Add and use tcg_out64
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-15 15:54   ` Aurelien Jarno
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 2/8] tcg-i386: Try pc-relative lea for constant formation Richard Henderson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

No point in splitting the write into 32-bit pieces.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 3 +--
 tcg/tcg.c             | 6 ++++++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 87eeab3..841bd75 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -552,8 +552,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
         tcg_out32(s, arg);
     } else {
         tcg_out_opc(s, OPC_MOVL_Iv + P_REXW + LOWREGMASK(ret), 0, ret, 0);
-        tcg_out32(s, arg);
-        tcg_out32(s, arg >> 31 >> 1);
+        tcg_out64(s, arg);
     }
 }
 
diff --git a/tcg/tcg.c b/tcg/tcg.c
index dac8224..9355b57 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -131,6 +131,12 @@ static inline void tcg_out32(TCGContext *s, uint32_t v)
     s->code_ptr += 4;
 }
 
+static inline void tcg_out64(TCGContext *s, uint64_t v)
+{
+    *(uint64_t *)s->code_ptr = v;
+    s->code_ptr += 8;
+}
+
 /* label relocation processing */
 
 static void tcg_out_reloc(TCGContext *s, uint8_t *code_ptr, int type,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 2/8] tcg-i386: Try pc-relative lea for constant formation
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 1/8] tcg-i386: Add and use tcg_out64 Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-15 15:54   ` Aurelien Jarno
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 3/8] tcg-i386: Tidy qemu_ld/st slow path Richard Henderson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Use a 7 byte lea before the ultimate 10 byte movq.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 19 ++++++++++++++++---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 841bd75..456bd9e 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -541,19 +541,32 @@ static inline void tcg_out_mov(TCGContext *s, TCGType type,
 static void tcg_out_movi(TCGContext *s, TCGType type,
                          TCGReg ret, tcg_target_long arg)
 {
+    tcg_target_long diff;
+
     if (arg == 0) {
         tgen_arithr(s, ARITH_XOR, ret, ret);
         return;
     } else if (arg == (uint32_t)arg || type == TCG_TYPE_I32) {
         tcg_out_opc(s, OPC_MOVL_Iv + LOWREGMASK(ret), 0, ret, 0);
         tcg_out32(s, arg);
+        return;
     } else if (arg == (int32_t)arg) {
         tcg_out_modrm(s, OPC_MOVL_EvIz + P_REXW, 0, ret);
         tcg_out32(s, arg);
-    } else {
-        tcg_out_opc(s, OPC_MOVL_Iv + P_REXW + LOWREGMASK(ret), 0, ret, 0);
-        tcg_out64(s, arg);
+        return;
     }
+
+    /* Try a 7 byte pc-relative lea before the 10 byte movq.  */
+    diff = arg - ((tcg_target_long)s->code_ptr + 7);
+    if (diff == (int32_t)diff) {
+        tcg_out_opc(s, OPC_LEA | P_REXW, ret, 0, 0);
+        tcg_out8(s, (LOWREGMASK(ret) << 3) | 5);
+        tcg_out32(s, diff);
+        return;
+    }
+
+    tcg_out_opc(s, OPC_MOVL_Iv + P_REXW + LOWREGMASK(ret), 0, ret, 0);
+    tcg_out64(s, arg);
 }
 
 static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 3/8] tcg-i386: Tidy qemu_ld/st slow path
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 1/8] tcg-i386: Add and use tcg_out64 Richard Henderson
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 2/8] tcg-i386: Try pc-relative lea for constant formation Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-15 15:54   ` Aurelien Jarno
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument Richard Henderson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Use existing stack space for arguments; don't push/pop.
Use less ifdefs and more C ifs.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/i386/tcg-target.c | 159 +++++++++++++++++++++-----------------------------
 1 file changed, 68 insertions(+), 91 deletions(-)

diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 456bd9e..8addfa1 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -1461,22 +1461,12 @@ static void add_qemu_ldst_label(TCGContext *s,
 /*
  * Generate code for the slow path for a load at the end of block
  */
-static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
+static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-    int s_bits;
-    int opc = label->opc;
-    int mem_index = label->mem_index;
-#if TCG_TARGET_REG_BITS == 32
-    int stack_adjust;
-    int addrlo_reg = label->addrlo_reg;
-    int addrhi_reg = label->addrhi_reg;
-#endif
-    int data_reg = label->datalo_reg;
-    int data_reg2 = label->datahi_reg;
-    uint8_t *raddr = label->raddr;
-    uint8_t **label_ptr = &label->label_ptr[0];
-
-    s_bits = opc & 3;
+    int opc = l->opc;
+    int s_bits = opc & 3;
+    TCGReg data_reg;
+    uint8_t **label_ptr = &l->label_ptr[0];
 
     /* resolve label address */
     *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
@@ -1484,22 +1474,28 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
         *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
     }
 
-#if TCG_TARGET_REG_BITS == 32
-    tcg_out_pushi(s, mem_index);
-    stack_adjust = 4;
-    if (TARGET_LONG_BITS == 64) {
-        tcg_out_push(s, addrhi_reg);
-        stack_adjust += 4;
+    if (TCG_TARGET_REG_BITS == 32) {
+        int ofs = 0;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        if (TARGET_LONG_BITS == 64) {
+            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
+            ofs += 4;
+        }
+
+        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
+        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
+        /* The second argument is already loaded with addrlo.  */
+        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2],
+                     l->mem_index);
     }
-    tcg_out_push(s, addrlo_reg);
-    stack_adjust += 4;
-    tcg_out_push(s, TCG_AREG0);
-    stack_adjust += 4;
-#else
-    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
-    /* The second argument is already loaded with addrlo.  */
-    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], mem_index);
-#endif
 
     /* Code generation of qemu_ld/st's slow path calling MMU helper
 
@@ -1518,18 +1514,10 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
     tcg_out8(s, 5);
     /* Dummy backward jump having information of fast path'pc for MMU helpers */
     tcg_out8(s, OPC_JMP_long);
-    *(int32_t *)s->code_ptr = (int32_t)(raddr - s->code_ptr - 4);
+    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
     s->code_ptr += 4;
 
-#if TCG_TARGET_REG_BITS == 32
-    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
-        /* Pop and discard.  This is 2 bytes smaller than the add.  */
-        tcg_out_pop(s, TCG_REG_ECX);
-    } else if (stack_adjust != 0) {
-        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
-    }
-#endif
-
+    data_reg = l->datalo_reg;
     switch(opc) {
     case 0 | 4:
         tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
@@ -1557,10 +1545,10 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
         } else if (data_reg == TCG_REG_EDX) {
             /* xchg %edx, %eax */
             tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
-            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EAX);
+            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EAX);
         } else {
             tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
-            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EDX);
+            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
         }
         break;
     default:
@@ -1568,28 +1556,17 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
     }
 
     /* Jump to the code corresponding to next IR of qemu_st */
-    tcg_out_jmp(s, (tcg_target_long)raddr);
+    tcg_out_jmp(s, (tcg_target_long)l->raddr);
 }
 
 /*
  * Generate code for the slow path for a store at the end of block
  */
-static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
+static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 {
-    int s_bits;
-    int stack_adjust;
-    int opc = label->opc;
-    int mem_index = label->mem_index;
-    int data_reg = label->datalo_reg;
-#if TCG_TARGET_REG_BITS == 32
-    int data_reg2 = label->datahi_reg;
-    int addrlo_reg = label->addrlo_reg;
-    int addrhi_reg = label->addrhi_reg;
-#endif
-    uint8_t *raddr = label->raddr;
-    uint8_t **label_ptr = &label->label_ptr[0];
-
-    s_bits = opc & 3;
+    int opc = l->opc;
+    int s_bits = opc & 3;
+    uint8_t **label_ptr = &l->label_ptr[0];
 
     /* resolve label address */
     *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
@@ -1597,31 +1574,38 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
         *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
     }
 
-#if TCG_TARGET_REG_BITS == 32
-    tcg_out_pushi(s, mem_index);
-    stack_adjust = 4;
-    if (opc == 3) {
-        tcg_out_push(s, data_reg2);
-        stack_adjust += 4;
-    }
-    tcg_out_push(s, data_reg);
-    stack_adjust += 4;
-    if (TARGET_LONG_BITS == 64) {
-        tcg_out_push(s, addrhi_reg);
-        stack_adjust += 4;
+    if (TCG_TARGET_REG_BITS == 32) {
+        int ofs = 0;
+
+        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        if (TARGET_LONG_BITS == 64) {
+            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
+            ofs += 4;
+        }
+
+        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        if (opc == 3) {
+            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
+            ofs += 4;
+        }
+
+        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
+        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
+    } else {
+        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
+        /* The second argument is already loaded with addrlo.  */
+        tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
+                    tcg_target_call_iarg_regs[2], l->datalo_reg);
+        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
+                     l->mem_index);
     }
-    tcg_out_push(s, addrlo_reg);
-    stack_adjust += 4;
-    tcg_out_push(s, TCG_AREG0);
-    stack_adjust += 4;
-#else
-    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
-    /* The second argument is already loaded with addrlo.  */
-    tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
-                tcg_target_call_iarg_regs[2], data_reg);
-    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], mem_index);
-    stack_adjust = 0;
-#endif
 
     /* Code generation of qemu_ld/st's slow path calling MMU helper
 
@@ -1640,18 +1624,11 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
     tcg_out8(s, 5);
     /* Dummy backward jump having information of fast path'pc for MMU helpers */
     tcg_out8(s, OPC_JMP_long);
-    *(int32_t *)s->code_ptr = (int32_t)(raddr - s->code_ptr - 4);
+    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
     s->code_ptr += 4;
 
-    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
-        /* Pop and discard.  This is 2 bytes smaller than the add.  */
-        tcg_out_pop(s, TCG_REG_ECX);
-    } else if (stack_adjust != 0) {
-        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
-    }
-
     /* Jump to the code corresponding to next IR of qemu_st */
-    tcg_out_jmp(s, (tcg_target_long)raddr);
+    tcg_out_jmp(s, (tcg_target_long)l->raddr);
 }
 
 /*
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
                   ` (2 preceding siblings ...)
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 3/8] tcg-i386: Tidy qemu_ld/st slow path Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-15 15:54   ` Aurelien Jarno
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 5/8] tcg: Tidy softmmu_template.h Richard Henderson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel

Allow the code that tcg generates to be less obtuse, passing in
the return address directly instead of computing it in the helper.

Maintain the old entrance point unchanged as an alternate entry point.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/softmmu_defs.h     | 46 ++++++++++++++++++++++++++---------------
 include/exec/softmmu_template.h | 42 +++++++++++++++++++++++--------------
 2 files changed, 55 insertions(+), 33 deletions(-)

diff --git a/include/exec/softmmu_defs.h b/include/exec/softmmu_defs.h
index 1f25e33..e55e717 100644
--- a/include/exec/softmmu_defs.h
+++ b/include/exec/softmmu_defs.h
@@ -9,29 +9,41 @@
 #ifndef SOFTMMU_DEFS_H
 #define SOFTMMU_DEFS_H
 
+uint8_t helper_ret_ldb_mmu(CPUArchState *env, target_ulong addr,
+                           int mmu_idx, uintptr_t retaddr);
+uint16_t helper_ret_ldw_mmu(CPUArchState *env, target_ulong addr,
+                            int mmu_idx, uintptr_t retaddr);
+uint32_t helper_ret_ldl_mmu(CPUArchState *env, target_ulong addr,
+                            int mmu_idx, uintptr_t retaddr);
+uint64_t helper_ret_ldq_mmu(CPUArchState *env, target_ulong addr,
+                            int mmu_idx, uintptr_t retaddr);
+
+void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
+                        int mmu_idx, uintptr_t retaddr);
+void helper_ret_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
+                        int mmu_idx, uintptr_t retaddr);
+void helper_ret_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
+                        int mmu_idx, uintptr_t retaddr);
+void helper_ret_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
+                        int mmu_idx, uintptr_t retaddr);
+
 uint8_t helper_ldb_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
-                    int mmu_idx);
 uint16_t helper_ldw_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
-                    int mmu_idx);
 uint32_t helper_ldl_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
-                    int mmu_idx);
 uint64_t helper_ldq_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
-                    int mmu_idx);
+
+void helper_stb_mmu(CPUArchState *env, target_ulong addr,
+                    uint8_t val, int mmu_idx);
+void helper_stw_mmu(CPUArchState *env, target_ulong addr,
+                    uint16_t val, int mmu_idx);
+void helper_stl_mmu(CPUArchState *env, target_ulong addr,
+                    uint32_t val, int mmu_idx);
+void helper_stq_mmu(CPUArchState *env, target_ulong addr,
+                    uint64_t val, int mmu_idx);
 
 uint8_t helper_ldb_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stb_cmmu(CPUArchState *env, target_ulong addr, uint8_t val,
-int mmu_idx);
 uint16_t helper_ldw_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stw_cmmu(CPUArchState *env, target_ulong addr, uint16_t val,
-                     int mmu_idx);
 uint32_t helper_ldl_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stl_cmmu(CPUArchState *env, target_ulong addr, uint32_t val,
-                     int mmu_idx);
 uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
-void helper_stq_cmmu(CPUArchState *env, target_ulong addr, uint64_t val,
-                     int mmu_idx);
-#endif
+
+#endif /* SOFTMMU_DEFS_H */
diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
index 8584902..7d8bcb5 100644
--- a/include/exec/softmmu_template.h
+++ b/include/exec/softmmu_template.h
@@ -78,15 +78,18 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
 }
 
 /* handle all cases except unaligned access which span two pages */
+#ifdef SOFTMMU_CODE_ACCESS
+static
+#endif
 DATA_TYPE
-glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
-                                         int mmu_idx)
+glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
+                                             target_ulong addr, int mmu_idx,
+                                             uintptr_t retaddr)
 {
     DATA_TYPE res;
     int index;
     target_ulong tlb_addr;
     hwaddr ioaddr;
-    uintptr_t retaddr;
 
     /* test if there is match for unaligned or IO access */
     /* XXX: could done more in memory macro in a non portable way */
@@ -98,13 +101,11 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
             /* IO access */
             if ((addr & (DATA_SIZE - 1)) != 0)
                 goto do_unaligned_access;
-            retaddr = GETPC_EXT();
             ioaddr = env->iotlb[mmu_idx][index];
             res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
         } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
             /* slow unaligned access (it spans two pages or IO) */
         do_unaligned_access:
-            retaddr = GETPC_EXT();
 #ifdef ALIGNED_ONLY
             do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
 #endif
@@ -115,7 +116,6 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
             uintptr_t addend;
 #ifdef ALIGNED_ONLY
             if ((addr & (DATA_SIZE - 1)) != 0) {
-                retaddr = GETPC_EXT();
                 do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
             }
 #endif
@@ -124,8 +124,6 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
                                                 (addr + addend));
         }
     } else {
-        /* the page is not in the TLB : fill it */
-        retaddr = GETPC_EXT();
 #ifdef ALIGNED_ONLY
         if ((addr & (DATA_SIZE - 1)) != 0)
             do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
@@ -136,6 +134,14 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
     return res;
 }
 
+DATA_TYPE
+glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
+                                         int mmu_idx)
+{
+    return glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(env, addr, mmu_idx,
+                                                        GETPC_EXT());
+}
+
 /* handle all unaligned cases */
 static DATA_TYPE
 glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
@@ -214,13 +220,13 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
     io_mem_write(mr, physaddr, val, 1 << SHIFT);
 }
 
-void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                              target_ulong addr, DATA_TYPE val,
-                                              int mmu_idx)
+void
+glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
+                                             target_ulong addr, DATA_TYPE val,
+                                             int mmu_idx, uintptr_t retaddr)
 {
     hwaddr ioaddr;
     target_ulong tlb_addr;
-    uintptr_t retaddr;
     int index;
 
     index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
@@ -231,12 +237,10 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
             /* IO access */
             if ((addr & (DATA_SIZE - 1)) != 0)
                 goto do_unaligned_access;
-            retaddr = GETPC_EXT();
             ioaddr = env->iotlb[mmu_idx][index];
             glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
         } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
         do_unaligned_access:
-            retaddr = GETPC_EXT();
 #ifdef ALIGNED_ONLY
             do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
 #endif
@@ -247,7 +251,6 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
             uintptr_t addend;
 #ifdef ALIGNED_ONLY
             if ((addr & (DATA_SIZE - 1)) != 0) {
-                retaddr = GETPC_EXT();
                 do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
             }
 #endif
@@ -257,7 +260,6 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
         }
     } else {
         /* the page is not in the TLB : fill it */
-        retaddr = GETPC_EXT();
 #ifdef ALIGNED_ONLY
         if ((addr & (DATA_SIZE - 1)) != 0)
             do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
@@ -267,6 +269,14 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
     }
 }
 
+void
+glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
+                                         DATA_TYPE val, int mmu_idx)
+{
+    glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(env, addr, val, mmu_idx,
+                                                 GETPC_EXT());
+}
+
 /* handles all unaligned cases */
 static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
                                                    target_ulong addr,
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 5/8] tcg: Tidy softmmu_template.h
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
                   ` (3 preceding siblings ...)
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-15 15:54   ` Aurelien Jarno
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers Richard Henderson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel

Avoid a loop in the tlb_fill path; the fill will either succeed or
generate an exception.

Inline the slow_ld/st function; it was a complete copy of the main
helper except for the actual cross-page unaligned code, and the
compiler was inlining it anyway.

Add unlikely markers optimizing for the most common case of simple
tlb miss.

Make sure the compiler can optimize away the unaligned paths for a
1 byte access.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/softmmu_template.h | 287 +++++++++++++++-------------------------
 1 file changed, 104 insertions(+), 183 deletions(-)

diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
index 7d8bcb5..03e5155 100644
--- a/include/exec/softmmu_template.h
+++ b/include/exec/softmmu_template.h
@@ -54,10 +54,6 @@
 #define ADDR_READ addr_read
 #endif
 
-static DATA_TYPE glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                                        target_ulong addr,
-                                                        int mmu_idx,
-                                                        uintptr_t retaddr);
 static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
                                               hwaddr physaddr,
                                               target_ulong addr,
@@ -86,52 +82,67 @@ glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
                                              target_ulong addr, int mmu_idx,
                                              uintptr_t retaddr)
 {
-    DATA_TYPE res;
-    int index;
-    target_ulong tlb_addr;
-    hwaddr ioaddr;
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
+    uintptr_t haddr;
 
-    /* test if there is match for unaligned or IO access */
-    /* XXX: could done more in memory macro in a non portable way */
-    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
- redo:
-    tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
-    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
-        if (tlb_addr & ~TARGET_PAGE_MASK) {
-            /* IO access */
-            if ((addr & (DATA_SIZE - 1)) != 0)
-                goto do_unaligned_access;
-            ioaddr = env->iotlb[mmu_idx][index];
-            res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
-        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
-            /* slow unaligned access (it spans two pages or IO) */
-        do_unaligned_access:
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if ((addr & TARGET_PAGE_MASK)
+         != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
 #ifdef ALIGNED_ONLY
+        if ((addr & (DATA_SIZE - 1)) != 0) {
             do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        }
 #endif
-            res = glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(env, addr,
-                                                         mmu_idx, retaddr);
-        } else {
-            /* unaligned/aligned access in the same page */
-            uintptr_t addend;
-#ifdef ALIGNED_ONLY
-            if ((addr & (DATA_SIZE - 1)) != 0) {
-                do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
-            }
-#endif
-            addend = env->tlb_table[mmu_idx][index].addend;
-            res = glue(glue(ld, USUFFIX), _raw)((uint8_t *)(intptr_t)
-                                                (addr + addend));
+        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        hwaddr ioaddr;
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            goto do_unaligned_access;
         }
-    } else {
+        ioaddr = env->iotlb[mmu_idx][index];
+        return glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                    >= TARGET_PAGE_SIZE)) {
+        target_ulong addr1, addr2;
+        DATA_TYPE res1, res2, res;
+        unsigned shift;
+    do_unaligned_access:
 #ifdef ALIGNED_ONLY
-        if ((addr & (DATA_SIZE - 1)) != 0)
-            do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
+        do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
 #endif
-        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
-        goto redo;
+        addr1 = addr & ~(DATA_SIZE - 1);
+        addr2 = addr1 + DATA_SIZE;
+        res1 = glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(env, addr1,
+                                                            mmu_idx, retaddr);
+        res2 = glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(env, addr2,
+                                                            mmu_idx, retaddr);
+        shift = (addr & (DATA_SIZE - 1)) * 8;
+#ifdef TARGET_WORDS_BIGENDIAN
+        res = (res1 << shift) | (res2 >> ((DATA_SIZE * 8) - shift));
+#else
+        res = (res1 >> shift) | (res2 << ((DATA_SIZE * 8) - shift));
+#endif
+        return res;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
+#ifdef ALIGNED_ONLY
+    if ((addr & (DATA_SIZE - 1)) != 0) {
+        do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
     }
-    return res;
+#endif
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+    return glue(glue(ld, USUFFIX), _raw)((uint8_t *)haddr);
 }
 
 DATA_TYPE
@@ -142,66 +153,8 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
                                                         GETPC_EXT());
 }
 
-/* handle all unaligned cases */
-static DATA_TYPE
-glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                       target_ulong addr,
-                                       int mmu_idx,
-                                       uintptr_t retaddr)
-{
-    DATA_TYPE res, res1, res2;
-    int index, shift;
-    hwaddr ioaddr;
-    target_ulong tlb_addr, addr1, addr2;
-
-    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
- redo:
-    tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
-    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
-        if (tlb_addr & ~TARGET_PAGE_MASK) {
-            /* IO access */
-            if ((addr & (DATA_SIZE - 1)) != 0)
-                goto do_unaligned_access;
-            ioaddr = env->iotlb[mmu_idx][index];
-            res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
-        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
-        do_unaligned_access:
-            /* slow unaligned access (it spans two pages) */
-            addr1 = addr & ~(DATA_SIZE - 1);
-            addr2 = addr1 + DATA_SIZE;
-            res1 = glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(env, addr1,
-                                                          mmu_idx, retaddr);
-            res2 = glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(env, addr2,
-                                                          mmu_idx, retaddr);
-            shift = (addr & (DATA_SIZE - 1)) * 8;
-#ifdef TARGET_WORDS_BIGENDIAN
-            res = (res1 << shift) | (res2 >> ((DATA_SIZE * 8) - shift));
-#else
-            res = (res1 >> shift) | (res2 << ((DATA_SIZE * 8) - shift));
-#endif
-            res = (DATA_TYPE)res;
-        } else {
-            /* unaligned/aligned access in the same page */
-            uintptr_t addend = env->tlb_table[mmu_idx][index].addend;
-            res = glue(glue(ld, USUFFIX), _raw)((uint8_t *)(intptr_t)
-                                                (addr + addend));
-        }
-    } else {
-        /* the page is not in the TLB : fill it */
-        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
-        goto redo;
-    }
-    return res;
-}
-
 #ifndef SOFTMMU_CODE_ACCESS
 
-static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                                   target_ulong addr,
-                                                   DATA_TYPE val,
-                                                   int mmu_idx,
-                                                   uintptr_t retaddr);
-
 static inline void glue(io_write, SUFFIX)(CPUArchState *env,
                                           hwaddr physaddr,
                                           DATA_TYPE val,
@@ -225,48 +178,66 @@ glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
                                              target_ulong addr, DATA_TYPE val,
                                              int mmu_idx, uintptr_t retaddr)
 {
-    hwaddr ioaddr;
-    target_ulong tlb_addr;
-    int index;
+    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
+    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    uintptr_t haddr;
 
-    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
- redo:
-    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
-        if (tlb_addr & ~TARGET_PAGE_MASK) {
-            /* IO access */
-            if ((addr & (DATA_SIZE - 1)) != 0)
-                goto do_unaligned_access;
-            ioaddr = env->iotlb[mmu_idx][index];
-            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
-        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
-        do_unaligned_access:
+    /* If the TLB entry is for a different page, reload and try again.  */
+    if ((addr & TARGET_PAGE_MASK) 
+        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
 #ifdef ALIGNED_ONLY
+        if ((addr & (DATA_SIZE - 1)) != 0) {
             do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
+        }
 #endif
-            glue(glue(slow_st, SUFFIX), MMUSUFFIX)(env, addr, val,
-                                                   mmu_idx, retaddr);
-        } else {
-            /* aligned/unaligned access in the same page */
-            uintptr_t addend;
+        tlb_fill(env, addr, 1, mmu_idx, retaddr);
+        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
+    }
+
+    /* Handle an IO access.  */
+    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
+        hwaddr ioaddr;
+        if ((addr & (DATA_SIZE - 1)) != 0) {
+            goto do_unaligned_access;
+        }
+        ioaddr = env->iotlb[mmu_idx][index];
+        glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
+        return;
+    }
+
+    /* Handle slow unaligned access (it spans two pages or IO).  */
+    if (DATA_SIZE > 1
+        && unlikely ((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
+                     >= TARGET_PAGE_SIZE)) {
+        int i;
+    do_unaligned_access:
 #ifdef ALIGNED_ONLY
-            if ((addr & (DATA_SIZE - 1)) != 0) {
-                do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
-            }
+        do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
+#endif
+        /* XXX: not efficient, but simple */
+        /* Note: relies on the fact that tlb_fill() does not remove the
+         * previous page from the TLB cache.  */
+        for (i = DATA_SIZE - 1; i >= 0; i--) {
+#ifdef TARGET_WORDS_BIGENDIAN
+            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
+#else
+            uint8_t val8 = val >> (i * 8);
 #endif
-            addend = env->tlb_table[mmu_idx][index].addend;
-            glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
-                                         (addr + addend), val);
+            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
+                                            mmu_idx, retaddr);
         }
-    } else {
-        /* the page is not in the TLB : fill it */
+        return;
+    }
+
+    /* Handle aligned access or unaligned access in the same page.  */
 #ifdef ALIGNED_ONLY
-        if ((addr & (DATA_SIZE - 1)) != 0)
-            do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
-#endif
-        tlb_fill(env, addr, 1, mmu_idx, retaddr);
-        goto redo;
+    if ((addr & (DATA_SIZE - 1)) != 0) {
+        do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
     }
+#endif
+
+    haddr = addr + env->tlb_table[mmu_idx][index].addend;
+    glue(glue(st, SUFFIX), _raw)((uint8_t *)haddr, val);
 }
 
 void
@@ -277,56 +248,6 @@ glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
                                                  GETPC_EXT());
 }
 
-/* handles all unaligned cases */
-static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
-                                                   target_ulong addr,
-                                                   DATA_TYPE val,
-                                                   int mmu_idx,
-                                                   uintptr_t retaddr)
-{
-    hwaddr ioaddr;
-    target_ulong tlb_addr;
-    int index, i;
-
-    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
- redo:
-    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
-    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
-        if (tlb_addr & ~TARGET_PAGE_MASK) {
-            /* IO access */
-            if ((addr & (DATA_SIZE - 1)) != 0)
-                goto do_unaligned_access;
-            ioaddr = env->iotlb[mmu_idx][index];
-            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
-        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
-        do_unaligned_access:
-            /* XXX: not efficient, but simple */
-            /* Note: relies on the fact that tlb_fill() does not remove the
-             * previous page from the TLB cache.  */
-            for(i = DATA_SIZE - 1; i >= 0; i--) {
-#ifdef TARGET_WORDS_BIGENDIAN
-                glue(slow_stb, MMUSUFFIX)(env, addr + i,
-                                          val >> (((DATA_SIZE - 1) * 8) - (i * 8)),
-                                          mmu_idx, retaddr);
-#else
-                glue(slow_stb, MMUSUFFIX)(env, addr + i,
-                                          val >> (i * 8),
-                                          mmu_idx, retaddr);
-#endif
-            }
-        } else {
-            /* aligned/unaligned access in the same page */
-            uintptr_t addend = env->tlb_table[mmu_idx][index].addend;
-            glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
-                                         (addr + addend), val);
-        }
-    } else {
-        /* the page is not in the TLB : fill it */
-        tlb_fill(env, addr, 1, mmu_idx, retaddr);
-        goto redo;
-    }
-}
-
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
 #undef READ_ACCESS_TYPE
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
                   ` (4 preceding siblings ...)
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 5/8] tcg: Tidy softmmu_template.h Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-15 15:54   ` Aurelien Jarno
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 7/8] tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64 Richard Henderson
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st Richard Henderson
  7 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Discontinue the jump-around-jump-to-jump scheme, trading it for a single
immediate move instruction.  The two extra jumps always consume 7 bytes,
whereas the immediate move is either 5 or 7 bytes depending on where the
code_gen_buffer gets located.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/exec-all.h |  13 +------
 tcg/i386/tcg-target.c   | 100 +++++++++++++++++++++---------------------------
 2 files changed, 46 insertions(+), 67 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5920f73..b70028a 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -326,18 +326,9 @@ extern uintptr_t tci_tb_ptr;
    (6) jump to corresponding code of the next of fast path
  */
 # if defined(__i386__) || defined(__x86_64__)
-/* To avoid broken disassembling, long jmp is used for embedding fast path pc,
-   so that the destination is the next code of fast path, though this jmp is
-   never executed.
-
-   call MMU helper
-   jmp POST_PROC (2byte)    <- GETRA()
-   jmp NEXT_CODE (5byte)
-   POST_PROCESS ...         <- GETRA() + 7
- */
 #  define GETRA() ((uintptr_t)__builtin_return_address(0))
-#  define GETPC_LDST() ((uintptr_t)(GETRA() + 7 + \
-                                    *(int32_t *)((void *)GETRA() + 3) - 1))
+/* The return address argument for ldst is passed directly.  */
+#  define GETPC_LDST()  (abort(), 0)
 # elif defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
 #  define GETRA() ((uintptr_t)__builtin_return_address(0))
 #  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
index 8addfa1..c7a02a3 100644
--- a/tcg/i386/tcg-target.c
+++ b/tcg/i386/tcg-target.c
@@ -190,11 +190,11 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         /* qemu_ld/st address constraint */
     case 'L':
         ct->ct |= TCG_CT_REG;
-#if TCG_TARGET_REG_BITS == 64
+        if (TCG_TARGET_REG_BITS == 64) {
             tcg_regset_set32(ct->u.regs, 0, 0xffff);
-#else
+        } else {
             tcg_regset_set32(ct->u.regs, 0, 0xff);
-#endif
+        }
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_L0);
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_L1);
         break;
@@ -1015,22 +1015,24 @@ static void tcg_out_jmp(TCGContext *s, tcg_target_long dest)
 
 #include "exec/softmmu_defs.h"
 
-/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
-   int mmu_idx) */
-static const void *qemu_ld_helpers[4] = {
-    helper_ldb_mmu,
-    helper_ldw_mmu,
-    helper_ldl_mmu,
-    helper_ldq_mmu,
+/* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
+ *                                     int mmu_idx, uintptr_t ra)
+ */
+static const void * const qemu_ld_helpers[4] = {
+    helper_ret_ldb_mmu,
+    helper_ret_ldw_mmu,
+    helper_ret_ldl_mmu,
+    helper_ret_ldq_mmu,
 };
 
-/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
-   uintxx_t val, int mmu_idx) */
-static const void *qemu_st_helpers[4] = {
-    helper_stb_mmu,
-    helper_stw_mmu,
-    helper_stl_mmu,
-    helper_stq_mmu,
+/* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
+ *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
+ */
+static const void * const qemu_st_helpers[4] = {
+    helper_ret_stb_mmu,
+    helper_ret_stw_mmu,
+    helper_ret_stl_mmu,
+    helper_ret_stq_mmu,
 };
 
 static void add_qemu_ldst_label(TCGContext *s,
@@ -1458,6 +1460,12 @@ static void add_qemu_ldst_label(TCGContext *s,
     }
 }
 
+/* See the GETPC definition in include/exec/exec-all.h.  */
+static inline uintptr_t do_getpc(uint8_t *raddr)
+{
+    return (uintptr_t)raddr - 1;
+}
+
 /*
  * Generate code for the slow path for a load at the end of block
  */
@@ -1490,33 +1498,21 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 
         tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
         tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, do_getpc(l->raddr));
+        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
     } else {
-        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
+        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
         /* The second argument is already loaded with addrlo.  */
         tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2],
                      l->mem_index);
+        tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
+                     do_getpc(l->raddr));
     }
 
-    /* Code generation of qemu_ld/st's slow path calling MMU helper
-
-       PRE_PROC ...
-       call MMU helper
-       jmp POST_PROC (2b) : short forward jump <- GETRA()
-       jmp next_code (5b) : dummy long backward jump which is never executed
-       POST_PROC ... : do post-processing <- GETRA() + 7
-       jmp next_code : jump to the code corresponding to next IR of qemu_ld/st
-    */
-
     tcg_out_calli(s, (tcg_target_long)qemu_ld_helpers[s_bits]);
 
-    /* Jump to post-processing code */
-    tcg_out8(s, OPC_JMP_short);
-    tcg_out8(s, 5);
-    /* Dummy backward jump having information of fast path'pc for MMU helpers */
-    tcg_out8(s, OPC_JMP_long);
-    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
-    s->code_ptr += 4;
-
     data_reg = l->datalo_reg;
     switch(opc) {
     case 0 | 4:
@@ -1598,36 +1594,28 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
 
         tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
         tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
+        ofs += 4;
+
+        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, do_getpc(l->raddr));
+        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
     } else {
-        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
+        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
         /* The second argument is already loaded with addrlo.  */
         tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
                     tcg_target_call_iarg_regs[2], l->datalo_reg);
         tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
                      l->mem_index);
+        if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
+            tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[4],
+                         do_getpc(l->raddr));
+        } else {
+            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_RAX, do_getpc(l->raddr));
+            tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP, 0);
+        }
     }
 
-    /* Code generation of qemu_ld/st's slow path calling MMU helper
-
-       PRE_PROC ...
-       call MMU helper
-       jmp POST_PROC (2b) : short forward jump <- GETRA()
-       jmp next_code (5b) : dummy long backward jump which is never executed
-       POST_PROC ... : do post-processing <- GETRA() + 7
-       jmp next_code : jump to the code corresponding to next IR of qemu_ld/st
-    */
-
     tcg_out_calli(s, (tcg_target_long)qemu_st_helpers[s_bits]);
 
-    /* Jump to post-processing code */
-    tcg_out8(s, OPC_JMP_short);
-    tcg_out8(s, 5);
-    /* Dummy backward jump having information of fast path'pc for MMU helpers */
-    tcg_out8(s, OPC_JMP_long);
-    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
-    s->code_ptr += 4;
-
-    /* Jump to the code corresponding to next IR of qemu_st */
     tcg_out_jmp(s, (tcg_target_long)l->raddr);
 }
 
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 7/8] tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
                   ` (5 preceding siblings ...)
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st Richard Henderson
  7 siblings, 0 replies; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 tcg/arm/tcg-target.c | 48 +++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 6c4854d..9a14a20 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -320,6 +320,9 @@ typedef enum {
     INSN_STRB_REG  = 0x06400000,
 
     INSN_LDRD_IMM  = 0x004000d0,
+    INSN_LDRD_REG  = 0x000000d0,
+    INSN_STRD_IMM  = 0x004000f0,
+    INSN_STRD_REG  = 0x000000f0,
 } ARMInsn;
 
 #define SHIFT_IMM_LSL(im)	(((im) << 7) | 0x00)
@@ -810,6 +813,30 @@ static inline void tcg_out_st32_r(TCGContext *s, int cond, TCGReg rt,
     tcg_out_memop_r(s, cond, INSN_STR_REG, rt, rn, rm, 1, 1, 0);
 }
 
+static inline void tcg_out_ldrd_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
+{
+    tcg_out_memop_12(s, cond, INSN_LDRD_IMM, rt, rn, imm8, 1, 0);
+}
+
+static inline void tcg_out_ldrd_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_LDRD_REG, rt, rn, rm, 1, 1, 0);
+}
+
+static inline void tcg_out_strd_8(TCGContext *s, int cond, TCGReg rt,
+                                   TCGReg rn, int imm8)
+{
+    tcg_out_memop_12(s, cond, INSN_STRD_IMM, rt, rn, imm8, 1, 0);
+}
+
+static inline void tcg_out_strd_r(TCGContext *s, int cond, TCGReg rt,
+                                  TCGReg rn, TCGReg rm)
+{
+    tcg_out_memop_r(s, cond, INSN_STRD_REG, rt, rn, rm, 1, 1, 0);
+}
+
 /* Register pre-increment with base writeback.  */
 static inline void tcg_out_ld32_rwb(TCGContext *s, int cond, TCGReg rt,
                                     TCGReg rn, TCGReg rm)
@@ -1399,6 +1426,9 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_ld32_12(s, COND_AL, data_reg, TCG_REG_R1, 4);
             tcg_out_bswap32(s, COND_AL, data_reg2, data_reg2);
             tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
+        } else if (use_armv6_instructions
+                   && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
+            tcg_out_ldrd_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         } else {
             tcg_out_ld32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
             tcg_out_ld32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
@@ -1452,9 +1482,13 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
         }
         break;
     case 3:
-        /* TODO: use block load -
-         * check that data_reg2 > data_reg or the other way */
-        if (data_reg == addr_reg) {
+        if (use_armv6_instructions && !bswap
+            && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
+            tcg_out_ldrd_8(s, COND_AL, data_reg, addr_reg, 0);
+        } else if (use_armv6_instructions && bswap
+                   && (data_reg2 & 1) == 0 && data_reg == data_reg2 + 1) {
+            tcg_out_ldrd_8(s, COND_AL, data2_reg, addr_reg, 0);
+        } else if (data_reg == addr_reg) {
             tcg_out_ld32_12(s, COND_AL, data_reg2, addr_reg, bswap ? 0 : 4);
             tcg_out_ld32_12(s, COND_AL, data_reg, addr_reg, bswap ? 4 : 0);
         } else {
@@ -1531,6 +1565,9 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
             tcg_out_st32_rwb(s, COND_AL, TCG_REG_R0, TCG_REG_R1, addr_reg);
             tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
             tcg_out_st32_12(s, COND_AL, TCG_REG_R0, TCG_REG_R1, 4);
+        } else if (use_armv6_instructions
+                   && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
+            tcg_out_strd_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
         } else {
             tcg_out_st32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
             tcg_out_st32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
@@ -1578,13 +1615,14 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
         }
         break;
     case 3:
-        /* TODO: use block store -
-         * check that data_reg2 > data_reg or the other way */
         if (bswap) {
             tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg2);
             tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addr_reg, 0);
             tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
             tcg_out_st32_12(s, COND_AL, TCG_REG_R0, addr_reg, 4);
+        } else if (use_armv6_instructions
+                   && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
+            tcg_out_strd_8(s, COND_EQ, data_reg, addr_reg, 0);
         } else {
             tcg_out_st32_12(s, COND_AL, data_reg, addr_reg, 0);
             tcg_out_st32_12(s, COND_AL, data_reg2, addr_reg, 4);
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st
  2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
                   ` (6 preceding siblings ...)
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 7/8] tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64 Richard Henderson
@ 2013-08-05 18:07 ` Richard Henderson
  2013-08-16  8:36   ` Aurelien Jarno
  2013-08-16  8:55   ` Andreas Färber
  7 siblings, 2 replies; 21+ messages in thread
From: Richard Henderson @ 2013-08-05 18:07 UTC (permalink / raw)
  To: qemu-devel; +Cc: Richard Henderson

Instead of using a branch-call-branch sequence, arrange for a
call-branch sequence, using the ARM's conditional call insn.
This reduces the size of the slow-path within the TB, and makes
the GETPC_EXT implementation identical for TCG and not-TCG.

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 include/exec/exec-all.h |  23 +----
 tcg/arm/tcg-target.c    | 269 +++++++++++++++++++++++-------------------------
 2 files changed, 133 insertions(+), 159 deletions(-)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index b70028a..b3402a1 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -333,23 +333,8 @@ extern uintptr_t tci_tb_ptr;
 #  define GETRA() ((uintptr_t)__builtin_return_address(0))
 #  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
 # elif defined(__arm__)
-/* We define two insns between the return address and the branch back to
-   straight-line.  Find and decode that branch insn.  */
-#  define GETRA()       ((uintptr_t)__builtin_return_address(0))
-#  define GETPC_LDST()  tcg_getpc_ldst(GETRA())
-static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
-{
-    int32_t b;
-    ra += 8;                    /* skip the two insns */
-    b = *(int32_t *)ra;         /* load the branch insn */
-    b = (b << 8) >> (8 - 2);    /* extract the displacement */
-    ra += 8;                    /* branches are relative to pc+8 */
-    ra += b;                    /* apply the displacement */
-    ra -= 4;                    /* return a pointer into the current opcode,
-                                   not the start of the next opcode  */
-    return ra;
-}
-#elif defined(__aarch64__)
+#  define GETPC_EXT()  GETPC()
+# elif defined(__aarch64__)
 #  define GETRA()       ((uintptr_t)__builtin_return_address(0))
 #  define GETPC_LDST()  tcg_getpc_ldst(GETRA())
 static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
@@ -367,7 +352,9 @@ static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
 #  error "CONFIG_QEMU_LDST_OPTIMIZATION needs GETPC_LDST() implementation!"
 # endif
 bool is_tcg_gen_code(uintptr_t pc_ptr);
-# define GETPC_EXT() (is_tcg_gen_code(GETRA()) ? GETPC_LDST() : GETPC())
+# ifndef GETPC_EXT
+#  define GETPC_EXT() (is_tcg_gen_code(GETRA()) ? GETPC_LDST() : GETPC())
+# endif
 #else
 # define GETPC_EXT() GETPC()
 #endif
diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
index 9a14a20..89917d6 100644
--- a/tcg/arm/tcg-target.c
+++ b/tcg/arm/tcg-target.c
@@ -166,10 +166,26 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         break;
 
     case 'r':
+#ifndef CONFIG_SOFTMMU
+    case 'a':
+    case 'b':
+#endif
         ct->ct |= TCG_CT_REG;
         tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
         break;
 
+#ifdef CONFIG_SOFTMMU
+    /* qemu_ld data registers -- softmmu uses the return registers only.  */
+    case 'a':
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, 1);
+        break;
+    case 'b':
+        ct->ct |= TCG_CT_REG;
+        tcg_regset_set32(ct->u.regs, 0, 2);
+        break;
+#endif
+
     /* qemu_ld address */
     case 'l':
         ct->ct |= TCG_CT_REG;
@@ -182,15 +198,6 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
         tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
 #endif
         break;
-    case 'L':
-        ct->ct |= TCG_CT_REG;
-        tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
-#ifdef CONFIG_SOFTMMU
-        /* r1 is still needed to load data_reg or data_reg2,
-           so don't use it. */
-        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
-#endif
-        break;
 
     /* qemu_st address & data_reg */
     case 's':
@@ -382,13 +389,17 @@ static inline void tcg_out_b_noaddr(TCGContext *s, int cond)
     /* We pay attention here to not modify the branch target by skipping
        the corresponding bytes. This ensure that caches and memory are
        kept coherent during retranslation. */
-#ifdef HOST_WORDS_BIGENDIAN
-    tcg_out8(s, (cond << 4) | 0x0a);
-    s->code_ptr += 3;
-#else
     s->code_ptr += 3;
     tcg_out8(s, (cond << 4) | 0x0a);
-#endif
+}
+
+static inline void tcg_out_bl_noaddr(TCGContext *s, int cond)
+{
+    /* We pay attention here to not modify the branch target by skipping
+       the corresponding bytes. This ensure that caches and memory are
+       kept coherent during retranslation. */
+    s->code_ptr += 3;
+    tcg_out8(s, (cond << 4) | 0x0b);
 }
 
 static inline void tcg_out_bl(TCGContext *s, int cond, int32_t offset)
@@ -1002,34 +1013,27 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
         tcg_out_st8_12(s, cond, rd, rn, offset);
 }
 
-/* The _goto case is normally between TBs within the same code buffer,
- * and with the code buffer limited to 16MB we shouldn't need the long
- * case.
- *
- * .... except to the prologue that is in its own buffer.
+/* The _goto case is normally between TBs within the same code buffer, and
+ * with the code buffer limited to 16MB we wouldn't need the long case.
+ * But we also use it for the tail-call to the qemu_ld/st helpers, which does.
  */
 static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr)
 {
-    int32_t val;
+    int32_t disp = addr - (tcg_target_long) s->code_ptr;
 
-    if (addr & 1) {
-        /* goto to a Thumb destination isn't supported */
-        tcg_abort();
+    if ((addr & 1) == 0 && disp - 8 < 0x01fffffd && disp - 8 > -0x01fffffd) {
+        tcg_out_b(s, cond, disp);
+        return;
     }
 
-    val = addr - (tcg_target_long) s->code_ptr;
-    if (val - 8 < 0x01fffffd && val - 8 > -0x01fffffd)
-        tcg_out_b(s, cond, val);
-    else {
-        if (cond == COND_AL) {
-            tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
-            tcg_out32(s, addr);
-        } else {
-            tcg_out_movi32(s, cond, TCG_REG_TMP, val - 8);
-            tcg_out_dat_reg(s, cond, ARITH_ADD,
-                            TCG_REG_PC, TCG_REG_PC,
-                            TCG_REG_TMP, SHIFT_IMM_LSL(0));
+    tcg_out_movi32(s, cond, TCG_REG_TMP, addr);
+    if (use_armv5t_instructions) {
+        tcg_out_bx(s, cond, TCG_REG_TMP);
+    } else {
+        if (addr & 1) {
+            tcg_abort();
         }
+        tcg_out_mov_reg(s, cond, TCG_REG_PC, TCG_REG_TMP);
     }
 }
 
@@ -1087,17 +1091,44 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, int label_index)
 
 #include "exec/softmmu_defs.h"
 
+static uint32_t arm_ldbu_mmu(CPUState *env, target_ulong addr, int idx)
+{
+    return (uint8_t)helper_ret_ldb_mmu(env, addr, idx, GETPC());
+}
+
+static uint32_t arm_ldbs_mmu(CPUState *env, target_ulong addr, int idx)
+{
+    return (int8_t)helper_ret_ldb_mmu(env, addr, idx, GETPC());
+}
+
+static uint32_t arm_ldwu_mmu(CPUState *env, target_ulong addr, int idx)
+{
+    return (uint16_t)helper_ret_ldw_mmu(env, addr, idx, GETPC());
+}
+
+static uint32_t arm_ldws_mmu(CPUState *env, target_ulong addr, int idx)
+{
+    return (int16_t)helper_ret_ldw_mmu(env, addr, idx, GETPC());
+}
+
 /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
-   int mmu_idx) */
-static const void * const qemu_ld_helpers[4] = {
-    helper_ldb_mmu,
-    helper_ldw_mmu,
+ *                                 int mmu_idx)
+ */
+static const void * const qemu_ld_helpers[8] = {
+    arm_ldbu_mmu,
+    arm_ldwu_mmu,
+    helper_ldl_mmu,
+    helper_ldq_mmu,
+
+    arm_ldbs_mmu,
+    arm_ldws_mmu,
     helper_ldl_mmu,
     helper_ldq_mmu,
 };
 
 /* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
-   uintxx_t val, int mmu_idx) */
+ *                                 uintxx_t val, int mmu_idx)
+ */
 static const void * const qemu_st_helpers[4] = {
     helper_stb_mmu,
     helper_stw_mmu,
@@ -1260,8 +1291,7 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
 
 static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-    TCGReg argreg, data_reg, data_reg2;
-    uint8_t *start;
+    TCGReg argreg;
 
     reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
 
@@ -1272,59 +1302,18 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
         argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
     }
     argreg = tcg_out_arg_imm32(s, argreg, lb->mem_index);
-    tcg_out_call(s, (tcg_target_long) qemu_ld_helpers[lb->opc & 3]);
-
-    data_reg = lb->datalo_reg;
-    data_reg2 = lb->datahi_reg;
-
-    start = s->code_ptr;
-    switch (lb->opc) {
-    case 0 | 4:
-        tcg_out_ext8s(s, COND_AL, data_reg, TCG_REG_R0);
-        break;
-    case 1 | 4:
-        tcg_out_ext16s(s, COND_AL, data_reg, TCG_REG_R0);
-        break;
-    case 0:
-    case 1:
-    case 2:
-    default:
-        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
-        break;
-    case 3:
-        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
-        tcg_out_mov_reg(s, COND_AL, data_reg2, TCG_REG_R1);
-        break;
-    }
 
-    /* For GETPC_LDST in exec-all.h, we architect exactly 2 insns between
-       the call and the branch back to straight-line code.  Note that the
-       moves above could be elided by register allocation, nor do we know
-       which code alternative we chose for extension.  */
-    switch (s->code_ptr - start) {
-    case 0:
-        tcg_out_nop(s);
-        /* FALLTHRU */
-    case 4:
-        tcg_out_nop(s);
-        /* FALLTHRU */
-    case 8:
-        break;
-    default:
-        abort();
-    }
-
-    tcg_out_goto(s, COND_AL, (tcg_target_long)lb->raddr);
+    /* Tail-call to the helper, which will return to the fast path.  */
+    tcg_out_goto(s, COND_AL, (tcg_target_long) qemu_ld_helpers[lb->opc]);
 }
 
 static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
 {
-    TCGReg argreg, data_reg, data_reg2;
+    TCGReg argreg;
 
     reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
 
-    argreg = TCG_REG_R0;
-    argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
+    argreg = tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
     if (TARGET_LONG_BITS == 64) {
         argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
     } else {
@@ -1349,13 +1338,9 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
     }
 
     argreg = tcg_out_arg_imm32(s, argreg, lb->mem_index);
-    tcg_out_call(s, (tcg_target_long) qemu_st_helpers[lb->opc & 3]);
 
-    /* For GETPC_LDST in exec-all.h, we architect exactly 2 insns between
-       the call and the branch back to straight-line code.  */
-    tcg_out_nop(s);
-    tcg_out_nop(s);
-    tcg_out_goto(s, COND_AL, (tcg_target_long)lb->raddr);
+    /* Tail-call to the helper, which will return to the fast path.  */
+    tcg_out_goto(s, COND_AL, (tcg_target_long) qemu_st_helpers[lb->opc & 3]);
 }
 #endif /* SOFTMMU */
 
@@ -1385,57 +1370,58 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
     tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits,
                      offsetof(CPUArchState, tlb_table[mem_index][0].addr_read));
 
-    label_ptr = s->code_ptr;
-    tcg_out_b_noaddr(s, COND_NE);
-
     tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_read));
 
     switch (opc) {
     case 0:
-        tcg_out_ld8_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld8_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         break;
     case 0 | 4:
-        tcg_out_ld8s_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld8s_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         break;
     case 1:
-        tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld16u_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         if (bswap) {
-            tcg_out_bswap16(s, COND_AL, data_reg, data_reg);
+            tcg_out_bswap16(s, COND_EQ, data_reg, data_reg);
         }
         break;
     case 1 | 4:
         if (bswap) {
-            tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
-            tcg_out_bswap16s(s, COND_AL, data_reg, data_reg);
+            tcg_out_ld16u_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_bswap16s(s, COND_EQ, data_reg, data_reg);
         } else {
-            tcg_out_ld16s_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_ld16s_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         }
         break;
     case 2:
     default:
-        tcg_out_ld32_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_ld32_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         if (bswap) {
-            tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
+            tcg_out_bswap32(s, COND_EQ, data_reg, data_reg);
         }
         break;
     case 3:
         if (bswap) {
-            tcg_out_ld32_rwb(s, COND_AL, data_reg2, TCG_REG_R1, addr_reg);
-            tcg_out_ld32_12(s, COND_AL, data_reg, TCG_REG_R1, 4);
-            tcg_out_bswap32(s, COND_AL, data_reg2, data_reg2);
-            tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
+            tcg_out_ld32_rwb(s, COND_EQ, data_reg2, TCG_REG_R1, addr_reg);
+            tcg_out_ld32_12(s, COND_EQ, data_reg, TCG_REG_R1, 4);
+            tcg_out_bswap32(s, COND_EQ, data_reg2, data_reg2);
+            tcg_out_bswap32(s, COND_EQ, data_reg, data_reg);
         } else if (use_armv6_instructions
                    && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
-            tcg_out_ldrd_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_ldrd_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         } else {
-            tcg_out_ld32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
-            tcg_out_ld32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
+            tcg_out_ld32_rwb(s, COND_EQ, data_reg, TCG_REG_R1, addr_reg);
+            tcg_out_ld32_12(s, COND_EQ, data_reg2, TCG_REG_R1, 4);
         }
         break;
     }
 
+    /* The conditional call must come last, as we're going to return here.  */
+    label_ptr = s->code_ptr;
+    tcg_out_bl_noaddr(s, COND_NE);
+
     add_qemu_ldst_label(s, 1, opc, data_reg, data_reg2, addr_reg, addr_reg2,
                         mem_index, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
@@ -1531,50 +1517,51 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
                      offsetof(CPUArchState,
                               tlb_table[mem_index][0].addr_write));
 
-    label_ptr = s->code_ptr;
-    tcg_out_b_noaddr(s, COND_NE);
-
     tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2,
                     offsetof(CPUTLBEntry, addend)
                     - offsetof(CPUTLBEntry, addr_write));
 
     switch (opc) {
     case 0:
-        tcg_out_st8_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+        tcg_out_st8_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         break;
     case 1:
         if (bswap) {
-            tcg_out_bswap16st(s, COND_AL, TCG_REG_R0, data_reg);
-            tcg_out_st16_r(s, COND_AL, TCG_REG_R0, addr_reg, TCG_REG_R1);
+            tcg_out_bswap16st(s, COND_EQ, TCG_REG_R0, data_reg);
+            tcg_out_st16_r(s, COND_EQ, TCG_REG_R0, addr_reg, TCG_REG_R1);
         } else {
-            tcg_out_st16_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_st16_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         }
         break;
     case 2:
     default:
         if (bswap) {
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
-            tcg_out_st32_r(s, COND_AL, TCG_REG_R0, addr_reg, TCG_REG_R1);
+            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg);
+            tcg_out_st32_r(s, COND_EQ, TCG_REG_R0, addr_reg, TCG_REG_R1);
         } else {
-            tcg_out_st32_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_st32_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         }
         break;
     case 3:
         if (bswap) {
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg2);
-            tcg_out_st32_rwb(s, COND_AL, TCG_REG_R0, TCG_REG_R1, addr_reg);
-            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
-            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, TCG_REG_R1, 4);
+            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg2);
+            tcg_out_st32_rwb(s, COND_EQ, TCG_REG_R0, TCG_REG_R1, addr_reg);
+            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg);
+            tcg_out_st32_12(s, COND_EQ, TCG_REG_R0, TCG_REG_R1, 4);
         } else if (use_armv6_instructions
                    && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
-            tcg_out_strd_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
+            tcg_out_strd_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
         } else {
-            tcg_out_st32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
-            tcg_out_st32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
+            tcg_out_st32_rwb(s, COND_EQ, data_reg, TCG_REG_R1, addr_reg);
+            tcg_out_st32_12(s, COND_EQ, data_reg2, TCG_REG_R1, 4);
         }
         break;
     }
 
+    /* The conditional call must come last, as we're going to return here.  */
+    label_ptr = s->code_ptr;
+    tcg_out_bl_noaddr(s, COND_NE);
+
     add_qemu_ldst_label(s, 0, opc, data_reg, data_reg2, addr_reg, addr_reg2,
                         mem_index, s->code_ptr, label_ptr);
 #else /* !CONFIG_SOFTMMU */
@@ -2026,24 +2013,24 @@ static const TCGTargetOpDef arm_op_defs[] = {
     { INDEX_op_setcond2_i32, { "r", "r", "r", "rIN", "rIN" } },
 
 #if TARGET_LONG_BITS == 32
-    { INDEX_op_qemu_ld8u, { "r", "l" } },
-    { INDEX_op_qemu_ld8s, { "r", "l" } },
-    { INDEX_op_qemu_ld16u, { "r", "l" } },
-    { INDEX_op_qemu_ld16s, { "r", "l" } },
-    { INDEX_op_qemu_ld32, { "r", "l" } },
-    { INDEX_op_qemu_ld64, { "L", "L", "l" } },
+    { INDEX_op_qemu_ld8u, { "a", "l" } },
+    { INDEX_op_qemu_ld8s, { "a", "l" } },
+    { INDEX_op_qemu_ld16u, { "a", "l" } },
+    { INDEX_op_qemu_ld16s, { "a", "l" } },
+    { INDEX_op_qemu_ld32, { "a", "l" } },
+    { INDEX_op_qemu_ld64, { "a", "b", "l" } },
 
     { INDEX_op_qemu_st8, { "s", "s" } },
     { INDEX_op_qemu_st16, { "s", "s" } },
     { INDEX_op_qemu_st32, { "s", "s" } },
     { INDEX_op_qemu_st64, { "s", "s", "s" } },
 #else
-    { INDEX_op_qemu_ld8u, { "r", "l", "l" } },
-    { INDEX_op_qemu_ld8s, { "r", "l", "l" } },
-    { INDEX_op_qemu_ld16u, { "r", "l", "l" } },
-    { INDEX_op_qemu_ld16s, { "r", "l", "l" } },
-    { INDEX_op_qemu_ld32, { "r", "l", "l" } },
-    { INDEX_op_qemu_ld64, { "L", "L", "l", "l" } },
+    { INDEX_op_qemu_ld8u, { "a", "l", "l" } },
+    { INDEX_op_qemu_ld8s, { "a", "l", "l" } },
+    { INDEX_op_qemu_ld16u, { "a", "l", "l" } },
+    { INDEX_op_qemu_ld16s, { "a", "l", "l" } },
+    { INDEX_op_qemu_ld32, { "a", "l", "l" } },
+    { INDEX_op_qemu_ld64, { "a", "b", "l", "l" } },
 
     { INDEX_op_qemu_st8, { "s", "s", "s" } },
     { INDEX_op_qemu_st16, { "s", "s", "s" } },
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 1/8] tcg-i386: Add and use tcg_out64
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 1/8] tcg-i386: Add and use tcg_out64 Richard Henderson
@ 2013-08-15 15:54   ` Aurelien Jarno
  0 siblings, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-15 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Aug 05, 2013 at 08:07:18AM -1000, Richard Henderson wrote:
> No point in splitting the write into 32-bit pieces.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/i386/tcg-target.c | 3 +--
>  tcg/tcg.c             | 6 ++++++
>  2 files changed, 7 insertions(+), 2 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index 87eeab3..841bd75 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -552,8 +552,7 @@ static void tcg_out_movi(TCGContext *s, TCGType type,
>          tcg_out32(s, arg);
>      } else {
>          tcg_out_opc(s, OPC_MOVL_Iv + P_REXW + LOWREGMASK(ret), 0, ret, 0);
> -        tcg_out32(s, arg);
> -        tcg_out32(s, arg >> 31 >> 1);
> +        tcg_out64(s, arg);
>      }
>  }
>  
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index dac8224..9355b57 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -131,6 +131,12 @@ static inline void tcg_out32(TCGContext *s, uint32_t v)
>      s->code_ptr += 4;
>  }
>  
> +static inline void tcg_out64(TCGContext *s, uint64_t v)
> +{
> +    *(uint64_t *)s->code_ptr = v;
> +    s->code_ptr += 8;
> +}
> +
>  /* label relocation processing */
>  
>  static void tcg_out_reloc(TCGContext *s, uint8_t *code_ptr, int type,

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 2/8] tcg-i386: Try pc-relative lea for constant formation
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 2/8] tcg-i386: Try pc-relative lea for constant formation Richard Henderson
@ 2013-08-15 15:54   ` Aurelien Jarno
  0 siblings, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-15 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Aug 05, 2013 at 08:07:19AM -1000, Richard Henderson wrote:
> Use a 7 byte lea before the ultimate 10 byte movq.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/i386/tcg-target.c | 19 ++++++++++++++++---
>  1 file changed, 16 insertions(+), 3 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index 841bd75..456bd9e 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -541,19 +541,32 @@ static inline void tcg_out_mov(TCGContext *s, TCGType type,
>  static void tcg_out_movi(TCGContext *s, TCGType type,
>                           TCGReg ret, tcg_target_long arg)
>  {
> +    tcg_target_long diff;
> +
>      if (arg == 0) {
>          tgen_arithr(s, ARITH_XOR, ret, ret);
>          return;
>      } else if (arg == (uint32_t)arg || type == TCG_TYPE_I32) {
>          tcg_out_opc(s, OPC_MOVL_Iv + LOWREGMASK(ret), 0, ret, 0);
>          tcg_out32(s, arg);
> +        return;
>      } else if (arg == (int32_t)arg) {
>          tcg_out_modrm(s, OPC_MOVL_EvIz + P_REXW, 0, ret);
>          tcg_out32(s, arg);
> -    } else {
> -        tcg_out_opc(s, OPC_MOVL_Iv + P_REXW + LOWREGMASK(ret), 0, ret, 0);
> -        tcg_out64(s, arg);
> +        return;
>      }

Now that all the else parts end up with a return, it would improve
readability to remove them and keep only the ifs. 

> +
> +    /* Try a 7 byte pc-relative lea before the 10 byte movq.  */
> +    diff = arg - ((tcg_target_long)s->code_ptr + 7);
> +    if (diff == (int32_t)diff) {
> +        tcg_out_opc(s, OPC_LEA | P_REXW, ret, 0, 0);
> +        tcg_out8(s, (LOWREGMASK(ret) << 3) | 5);
> +        tcg_out32(s, diff);
> +        return;
> +    }
> +
> +    tcg_out_opc(s, OPC_MOVL_Iv + P_REXW + LOWREGMASK(ret), 0, ret, 0);
> +    tcg_out64(s, arg);
>  }
>  
>  static inline void tcg_out_pushi(TCGContext *s, tcg_target_long val)

Otherwise it looks good.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 3/8] tcg-i386: Tidy qemu_ld/st slow path
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 3/8] tcg-i386: Tidy qemu_ld/st slow path Richard Henderson
@ 2013-08-15 15:54   ` Aurelien Jarno
  0 siblings, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-15 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Aug 05, 2013 at 08:07:20AM -1000, Richard Henderson wrote:
> Use existing stack space for arguments; don't push/pop.
> Use less ifdefs and more C ifs.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  tcg/i386/tcg-target.c | 159 +++++++++++++++++++++-----------------------------
>  1 file changed, 68 insertions(+), 91 deletions(-)
> 
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index 456bd9e..8addfa1 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -1461,22 +1461,12 @@ static void add_qemu_ldst_label(TCGContext *s,
>  /*
>   * Generate code for the slow path for a load at the end of block
>   */
> -static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
> +static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
>  {
> -    int s_bits;
> -    int opc = label->opc;
> -    int mem_index = label->mem_index;
> -#if TCG_TARGET_REG_BITS == 32
> -    int stack_adjust;
> -    int addrlo_reg = label->addrlo_reg;
> -    int addrhi_reg = label->addrhi_reg;
> -#endif
> -    int data_reg = label->datalo_reg;
> -    int data_reg2 = label->datahi_reg;
> -    uint8_t *raddr = label->raddr;
> -    uint8_t **label_ptr = &label->label_ptr[0];
> -
> -    s_bits = opc & 3;
> +    int opc = l->opc;
> +    int s_bits = opc & 3;
> +    TCGReg data_reg;
> +    uint8_t **label_ptr = &l->label_ptr[0];
>  
>      /* resolve label address */
>      *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
> @@ -1484,22 +1474,28 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
>          *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
>      }
>  
> -#if TCG_TARGET_REG_BITS == 32
> -    tcg_out_pushi(s, mem_index);
> -    stack_adjust = 4;
> -    if (TARGET_LONG_BITS == 64) {
> -        tcg_out_push(s, addrhi_reg);
> -        stack_adjust += 4;
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        int ofs = 0;
> +
> +        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
> +        ofs += 4;
> +
> +        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
> +        ofs += 4;
> +
> +        if (TARGET_LONG_BITS == 64) {
> +            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
> +            ofs += 4;
> +        }
> +
> +        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
> +        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);

The same way the previous code was able to push an immediate, it should
be possible to directly store an immediate value here:

            tcg_out_modrm_offset(s, OPC_MOVL_EvIz, 0, TCG_REG_ESP, ofs);
            tcg_out32(s, l->mem_index);

> +    } else {
> +        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
> +        /* The second argument is already loaded with addrlo.  */
> +        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2],
> +                     l->mem_index);

>      }
> -    tcg_out_push(s, addrlo_reg);
> -    stack_adjust += 4;
> -    tcg_out_push(s, TCG_AREG0);
> -    stack_adjust += 4;
> -#else
> -    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
> -    /* The second argument is already loaded with addrlo.  */
> -    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2], mem_index);
> -#endif
>  
>      /* Code generation of qemu_ld/st's slow path calling MMU helper
>  
> @@ -1518,18 +1514,10 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
>      tcg_out8(s, 5);
>      /* Dummy backward jump having information of fast path'pc for MMU helpers */
>      tcg_out8(s, OPC_JMP_long);
> -    *(int32_t *)s->code_ptr = (int32_t)(raddr - s->code_ptr - 4);
> +    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
>      s->code_ptr += 4;
>  
> -#if TCG_TARGET_REG_BITS == 32
> -    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
> -        /* Pop and discard.  This is 2 bytes smaller than the add.  */
> -        tcg_out_pop(s, TCG_REG_ECX);
> -    } else if (stack_adjust != 0) {
> -        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
> -    }
> -#endif
> -
> +    data_reg = l->datalo_reg;
>      switch(opc) {
>      case 0 | 4:
>          tcg_out_ext8s(s, data_reg, TCG_REG_EAX, P_REXW);
> @@ -1557,10 +1545,10 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
>          } else if (data_reg == TCG_REG_EDX) {
>              /* xchg %edx, %eax */
>              tcg_out_opc(s, OPC_XCHG_ax_r32 + TCG_REG_EDX, 0, 0, 0);
> -            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EAX);
> +            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EAX);
>          } else {
>              tcg_out_mov(s, TCG_TYPE_I32, data_reg, TCG_REG_EAX);
> -            tcg_out_mov(s, TCG_TYPE_I32, data_reg2, TCG_REG_EDX);
> +            tcg_out_mov(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_EDX);
>          }
>          break;
>      default:
> @@ -1568,28 +1556,17 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
>      }
>  
>      /* Jump to the code corresponding to next IR of qemu_st */
> -    tcg_out_jmp(s, (tcg_target_long)raddr);
> +    tcg_out_jmp(s, (tcg_target_long)l->raddr);
>  }
>  
>  /*
>   * Generate code for the slow path for a store at the end of block
>   */
> -static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
> +static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
>  {
> -    int s_bits;
> -    int stack_adjust;
> -    int opc = label->opc;
> -    int mem_index = label->mem_index;
> -    int data_reg = label->datalo_reg;
> -#if TCG_TARGET_REG_BITS == 32
> -    int data_reg2 = label->datahi_reg;
> -    int addrlo_reg = label->addrlo_reg;
> -    int addrhi_reg = label->addrhi_reg;
> -#endif
> -    uint8_t *raddr = label->raddr;
> -    uint8_t **label_ptr = &label->label_ptr[0];
> -
> -    s_bits = opc & 3;
> +    int opc = l->opc;
> +    int s_bits = opc & 3;
> +    uint8_t **label_ptr = &l->label_ptr[0];
>  
>      /* resolve label address */
>      *(uint32_t *)label_ptr[0] = (uint32_t)(s->code_ptr - label_ptr[0] - 4);
> @@ -1597,31 +1574,38 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
>          *(uint32_t *)label_ptr[1] = (uint32_t)(s->code_ptr - label_ptr[1] - 4);
>      }
>  
> -#if TCG_TARGET_REG_BITS == 32
> -    tcg_out_pushi(s, mem_index);
> -    stack_adjust = 4;
> -    if (opc == 3) {
> -        tcg_out_push(s, data_reg2);
> -        stack_adjust += 4;
> -    }
> -    tcg_out_push(s, data_reg);
> -    stack_adjust += 4;
> -    if (TARGET_LONG_BITS == 64) {
> -        tcg_out_push(s, addrhi_reg);
> -        stack_adjust += 4;
> +    if (TCG_TARGET_REG_BITS == 32) {
> +        int ofs = 0;
> +
> +        tcg_out_st(s, TCG_TYPE_PTR, TCG_AREG0, TCG_REG_ESP, ofs);
> +        ofs += 4;
> +
> +        tcg_out_st(s, TCG_TYPE_I32, l->addrlo_reg, TCG_REG_ESP, ofs);
> +        ofs += 4;
> +
> +        if (TARGET_LONG_BITS == 64) {
> +            tcg_out_st(s, TCG_TYPE_I32, l->addrhi_reg, TCG_REG_ESP, ofs);
> +            ofs += 4;
> +        }
> +
> +        tcg_out_st(s, TCG_TYPE_I32, l->datalo_reg, TCG_REG_ESP, ofs);
> +        ofs += 4;
> +
> +        if (opc == 3) {
> +            tcg_out_st(s, TCG_TYPE_I32, l->datahi_reg, TCG_REG_ESP, ofs);
> +            ofs += 4;
> +        }
> +
> +        tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
> +        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);

Same there.

> +    } else {
> +        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
> +        /* The second argument is already loaded with addrlo.  */
> +        tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
> +                    tcg_target_call_iarg_regs[2], l->datalo_reg);
> +        tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
> +                     l->mem_index);
>      }
> -    tcg_out_push(s, addrlo_reg);
> -    stack_adjust += 4;
> -    tcg_out_push(s, TCG_AREG0);
> -    stack_adjust += 4;
> -#else
> -    tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
> -    /* The second argument is already loaded with addrlo.  */
> -    tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
> -                tcg_target_call_iarg_regs[2], data_reg);
> -    tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3], mem_index);
> -    stack_adjust = 0;
> -#endif
>  
>      /* Code generation of qemu_ld/st's slow path calling MMU helper
>  
> @@ -1640,18 +1624,11 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *label)
>      tcg_out8(s, 5);
>      /* Dummy backward jump having information of fast path'pc for MMU helpers */
>      tcg_out8(s, OPC_JMP_long);
> -    *(int32_t *)s->code_ptr = (int32_t)(raddr - s->code_ptr - 4);
> +    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
>      s->code_ptr += 4;
>  
> -    if (stack_adjust == (TCG_TARGET_REG_BITS / 8)) {
> -        /* Pop and discard.  This is 2 bytes smaller than the add.  */
> -        tcg_out_pop(s, TCG_REG_ECX);
> -    } else if (stack_adjust != 0) {
> -        tcg_out_addi(s, TCG_REG_CALL_STACK, stack_adjust);
> -    }
> -
>      /* Jump to the code corresponding to next IR of qemu_st */
> -    tcg_out_jmp(s, (tcg_target_long)raddr);
> +    tcg_out_jmp(s, (tcg_target_long)l->raddr);
>  }
>  

The remaining looks fine to me.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument Richard Henderson
@ 2013-08-15 15:54   ` Aurelien Jarno
  2013-08-15 20:45     ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-15 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Aug 05, 2013 at 08:07:21AM -1000, Richard Henderson wrote:
> Allow the code that tcg generates to be less obtuse, passing in
> the return address directly instead of computing it in the helper.
> 
> Maintain the old entrance point unchanged as an alternate entry point.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/softmmu_defs.h     | 46 ++++++++++++++++++++++++++---------------
>  include/exec/softmmu_template.h | 42 +++++++++++++++++++++++--------------
>  2 files changed, 55 insertions(+), 33 deletions(-)
> 
> diff --git a/include/exec/softmmu_defs.h b/include/exec/softmmu_defs.h
> index 1f25e33..e55e717 100644
> --- a/include/exec/softmmu_defs.h
> +++ b/include/exec/softmmu_defs.h
> @@ -9,29 +9,41 @@
>  #ifndef SOFTMMU_DEFS_H
>  #define SOFTMMU_DEFS_H
>  
> +uint8_t helper_ret_ldb_mmu(CPUArchState *env, target_ulong addr,
> +                           int mmu_idx, uintptr_t retaddr);
> +uint16_t helper_ret_ldw_mmu(CPUArchState *env, target_ulong addr,
> +                            int mmu_idx, uintptr_t retaddr);
> +uint32_t helper_ret_ldl_mmu(CPUArchState *env, target_ulong addr,
> +                            int mmu_idx, uintptr_t retaddr);
> +uint64_t helper_ret_ldq_mmu(CPUArchState *env, target_ulong addr,
> +                            int mmu_idx, uintptr_t retaddr);
> +
> +void helper_ret_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
> +                        int mmu_idx, uintptr_t retaddr);
> +void helper_ret_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> +                        int mmu_idx, uintptr_t retaddr);
> +void helper_ret_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> +                        int mmu_idx, uintptr_t retaddr);
> +void helper_ret_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> +                        int mmu_idx, uintptr_t retaddr);
> +
>  uint8_t helper_ldb_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stb_mmu(CPUArchState *env, target_ulong addr, uint8_t val,
> -                    int mmu_idx);
>  uint16_t helper_ldw_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stw_mmu(CPUArchState *env, target_ulong addr, uint16_t val,
> -                    int mmu_idx);
>  uint32_t helper_ldl_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stl_mmu(CPUArchState *env, target_ulong addr, uint32_t val,
> -                    int mmu_idx);
>  uint64_t helper_ldq_mmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stq_mmu(CPUArchState *env, target_ulong addr, uint64_t val,
> -                    int mmu_idx);
> +
> +void helper_stb_mmu(CPUArchState *env, target_ulong addr,
> +                    uint8_t val, int mmu_idx);
> +void helper_stw_mmu(CPUArchState *env, target_ulong addr,
> +                    uint16_t val, int mmu_idx);
> +void helper_stl_mmu(CPUArchState *env, target_ulong addr,
> +                    uint32_t val, int mmu_idx);
> +void helper_stq_mmu(CPUArchState *env, target_ulong addr,
> +                    uint64_t val, int mmu_idx);
>  
>  uint8_t helper_ldb_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stb_cmmu(CPUArchState *env, target_ulong addr, uint8_t val,
> -int mmu_idx);
>  uint16_t helper_ldw_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stw_cmmu(CPUArchState *env, target_ulong addr, uint16_t val,
> -                     int mmu_idx);
>  uint32_t helper_ldl_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stl_cmmu(CPUArchState *env, target_ulong addr, uint32_t val,
> -                     int mmu_idx);
>  uint64_t helper_ldq_cmmu(CPUArchState *env, target_ulong addr, int mmu_idx);
> -void helper_stq_cmmu(CPUArchState *env, target_ulong addr, uint64_t val,
> -                     int mmu_idx);
> -#endif
> +
> +#endif /* SOFTMMU_DEFS_H */

Why removing this st*_cmmu versions? There might be a good reason, but
it should be indicated in the patch description.

> diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
> index 8584902..7d8bcb5 100644
> --- a/include/exec/softmmu_template.h
> +++ b/include/exec/softmmu_template.h
> @@ -78,15 +78,18 @@ static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>  }
>  
>  /* handle all cases except unaligned access which span two pages */
> +#ifdef SOFTMMU_CODE_ACCESS
> +static
> +#endif
>  DATA_TYPE
> -glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
> -                                         int mmu_idx)
> +glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> +                                             target_ulong addr, int mmu_idx,
> +                                             uintptr_t retaddr)
>  {
>      DATA_TYPE res;
>      int index;
>      target_ulong tlb_addr;
>      hwaddr ioaddr;
> -    uintptr_t retaddr;
>  
>      /* test if there is match for unaligned or IO access */
>      /* XXX: could done more in memory macro in a non portable way */
> @@ -98,13 +101,11 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
>              /* IO access */
>              if ((addr & (DATA_SIZE - 1)) != 0)
>                  goto do_unaligned_access;
> -            retaddr = GETPC_EXT();
>              ioaddr = env->iotlb[mmu_idx][index];
>              res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
>          } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
>              /* slow unaligned access (it spans two pages or IO) */
>          do_unaligned_access:
> -            retaddr = GETPC_EXT();
>  #ifdef ALIGNED_ONLY
>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>  #endif
> @@ -115,7 +116,6 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
>              uintptr_t addend;
>  #ifdef ALIGNED_ONLY
>              if ((addr & (DATA_SIZE - 1)) != 0) {
> -                retaddr = GETPC_EXT();
>                  do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>              }
>  #endif
> @@ -124,8 +124,6 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
>                                                  (addr + addend));
>          }
>      } else {
> -        /* the page is not in the TLB : fill it */
> -        retaddr = GETPC_EXT();
>  #ifdef ALIGNED_ONLY
>          if ((addr & (DATA_SIZE - 1)) != 0)
>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> @@ -136,6 +134,14 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
>      return res;
>  }
>  
> +DATA_TYPE
> +glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
> +                                         int mmu_idx)
> +{
> +    return glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(env, addr, mmu_idx,
> +                                                        GETPC_EXT());
> +}
> +
>  /* handle all unaligned cases */
>  static DATA_TYPE
>  glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> @@ -214,13 +220,13 @@ static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>      io_mem_write(mr, physaddr, val, 1 << SHIFT);
>  }
>  
> -void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> -                                              target_ulong addr, DATA_TYPE val,
> -                                              int mmu_idx)
> +void
> +glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> +                                             target_ulong addr, DATA_TYPE val,
> +                                             int mmu_idx, uintptr_t retaddr)
>  {
>      hwaddr ioaddr;
>      target_ulong tlb_addr;
> -    uintptr_t retaddr;
>      int index;
>  
>      index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> @@ -231,12 +237,10 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>              /* IO access */
>              if ((addr & (DATA_SIZE - 1)) != 0)
>                  goto do_unaligned_access;
> -            retaddr = GETPC_EXT();
>              ioaddr = env->iotlb[mmu_idx][index];
>              glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
>          } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
>          do_unaligned_access:
> -            retaddr = GETPC_EXT();
>  #ifdef ALIGNED_ONLY
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>  #endif
> @@ -247,7 +251,6 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>              uintptr_t addend;
>  #ifdef ALIGNED_ONLY
>              if ((addr & (DATA_SIZE - 1)) != 0) {
> -                retaddr = GETPC_EXT();
>                  do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>              }
>  #endif
> @@ -257,7 +260,6 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>          }
>      } else {
>          /* the page is not in the TLB : fill it */
> -        retaddr = GETPC_EXT();
>  #ifdef ALIGNED_ONLY
>          if ((addr & (DATA_SIZE - 1)) != 0)
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
> @@ -267,6 +269,14 @@ void glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>      }
>  }
>  
> +void
> +glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
> +                                         DATA_TYPE val, int mmu_idx)
> +{
> +    glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(env, addr, val, mmu_idx,
> +                                                 GETPC_EXT());
> +}
> +
>  /* handles all unaligned cases */
>  static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>                                                     target_ulong addr,

The remaining looks fine.


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 5/8] tcg: Tidy softmmu_template.h
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 5/8] tcg: Tidy softmmu_template.h Richard Henderson
@ 2013-08-15 15:54   ` Aurelien Jarno
  0 siblings, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-15 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Aug 05, 2013 at 08:07:22AM -1000, Richard Henderson wrote:
> Avoid a loop in the tlb_fill path; the fill will either succeed or
> generate an exception.
> 
> Inline the slow_ld/st function; it was a complete copy of the main
> helper except for the actual cross-page unaligned code, and the
> compiler was inlining it anyway.
> 
> Add unlikely markers optimizing for the most common case of simple
> tlb miss.
> 
> Make sure the compiler can optimize away the unaligned paths for a
> 1 byte access.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/softmmu_template.h | 287 +++++++++++++++-------------------------
>  1 file changed, 104 insertions(+), 183 deletions(-)
> 
> diff --git a/include/exec/softmmu_template.h b/include/exec/softmmu_template.h
> index 7d8bcb5..03e5155 100644
> --- a/include/exec/softmmu_template.h
> +++ b/include/exec/softmmu_template.h
> @@ -54,10 +54,6 @@
>  #define ADDR_READ addr_read
>  #endif
>  
> -static DATA_TYPE glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> -                                                        target_ulong addr,
> -                                                        int mmu_idx,
> -                                                        uintptr_t retaddr);
>  static inline DATA_TYPE glue(io_read, SUFFIX)(CPUArchState *env,
>                                                hwaddr physaddr,
>                                                target_ulong addr,
> @@ -86,52 +82,67 @@ glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>                                               target_ulong addr, int mmu_idx,
>                                               uintptr_t retaddr)
>  {
> -    DATA_TYPE res;
> -    int index;
> -    target_ulong tlb_addr;
> -    hwaddr ioaddr;
> +    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
> +    uintptr_t haddr;
>  
> -    /* test if there is match for unaligned or IO access */
> -    /* XXX: could done more in memory macro in a non portable way */
> -    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> - redo:
> -    tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
> -    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> -        if (tlb_addr & ~TARGET_PAGE_MASK) {
> -            /* IO access */
> -            if ((addr & (DATA_SIZE - 1)) != 0)
> -                goto do_unaligned_access;
> -            ioaddr = env->iotlb[mmu_idx][index];
> -            res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
> -        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
> -            /* slow unaligned access (it spans two pages or IO) */
> -        do_unaligned_access:
> +    /* If the TLB entry is for a different page, reload and try again.  */
> +    if ((addr & TARGET_PAGE_MASK)

Checkpatch complains about a whitespace at the end.

> +         != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
>  #ifdef ALIGNED_ONLY
> +        if ((addr & (DATA_SIZE - 1)) != 0) {
>              do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        }
>  #endif
> -            res = glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(env, addr,
> -                                                         mmu_idx, retaddr);
> -        } else {
> -            /* unaligned/aligned access in the same page */
> -            uintptr_t addend;
> -#ifdef ALIGNED_ONLY
> -            if ((addr & (DATA_SIZE - 1)) != 0) {
> -                do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> -            }
> -#endif
> -            addend = env->tlb_table[mmu_idx][index].addend;
> -            res = glue(glue(ld, USUFFIX), _raw)((uint8_t *)(intptr_t)
> -                                                (addr + addend));
> +        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
> +    }
> +
> +    /* Handle an IO access.  */
> +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> +        hwaddr ioaddr;
> +        if ((addr & (DATA_SIZE - 1)) != 0) {
> +            goto do_unaligned_access;
>          }
> -    } else {
> +        ioaddr = env->iotlb[mmu_idx][index];
> +        return glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
> +    }
> +
> +    /* Handle slow unaligned access (it spans two pages or IO).  */
> +    if (DATA_SIZE > 1
> +        && unlikely((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> +                    >= TARGET_PAGE_SIZE)) {
> +        target_ulong addr1, addr2;
> +        DATA_TYPE res1, res2, res;
> +        unsigned shift;
> +    do_unaligned_access:
>  #ifdef ALIGNED_ONLY
> -        if ((addr & (DATA_SIZE - 1)) != 0)
> -            do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> +        do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>  #endif
> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> -        goto redo;
> +        addr1 = addr & ~(DATA_SIZE - 1);
> +        addr2 = addr1 + DATA_SIZE;
> +        res1 = glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(env, addr1,
> +                                                            mmu_idx, retaddr);
> +        res2 = glue(glue(helper_ret_ld, SUFFIX), MMUSUFFIX)(env, addr2,
> +                                                            mmu_idx, retaddr);
> +        shift = (addr & (DATA_SIZE - 1)) * 8;
> +#ifdef TARGET_WORDS_BIGENDIAN
> +        res = (res1 << shift) | (res2 >> ((DATA_SIZE * 8) - shift));
> +#else
> +        res = (res1 >> shift) | (res2 << ((DATA_SIZE * 8) - shift));
> +#endif
> +        return res;
> +    }
> +
> +    /* Handle aligned access or unaligned access in the same page.  */
> +#ifdef ALIGNED_ONLY
> +    if ((addr & (DATA_SIZE - 1)) != 0) {
> +        do_unaligned_access(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
>      }
> -    return res;
> +#endif
> +
> +    haddr = addr + env->tlb_table[mmu_idx][index].addend;
> +    return glue(glue(ld, USUFFIX), _raw)((uint8_t *)haddr);
>  }
>  
>  DATA_TYPE
> @@ -142,66 +153,8 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
>                                                          GETPC_EXT());
>  }
>  
> -/* handle all unaligned cases */
> -static DATA_TYPE
> -glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> -                                       target_ulong addr,
> -                                       int mmu_idx,
> -                                       uintptr_t retaddr)
> -{
> -    DATA_TYPE res, res1, res2;
> -    int index, shift;
> -    hwaddr ioaddr;
> -    target_ulong tlb_addr, addr1, addr2;
> -
> -    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> - redo:
> -    tlb_addr = env->tlb_table[mmu_idx][index].ADDR_READ;
> -    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> -        if (tlb_addr & ~TARGET_PAGE_MASK) {
> -            /* IO access */
> -            if ((addr & (DATA_SIZE - 1)) != 0)
> -                goto do_unaligned_access;
> -            ioaddr = env->iotlb[mmu_idx][index];
> -            res = glue(io_read, SUFFIX)(env, ioaddr, addr, retaddr);
> -        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
> -        do_unaligned_access:
> -            /* slow unaligned access (it spans two pages) */
> -            addr1 = addr & ~(DATA_SIZE - 1);
> -            addr2 = addr1 + DATA_SIZE;
> -            res1 = glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(env, addr1,
> -                                                          mmu_idx, retaddr);
> -            res2 = glue(glue(slow_ld, SUFFIX), MMUSUFFIX)(env, addr2,
> -                                                          mmu_idx, retaddr);
> -            shift = (addr & (DATA_SIZE - 1)) * 8;
> -#ifdef TARGET_WORDS_BIGENDIAN
> -            res = (res1 << shift) | (res2 >> ((DATA_SIZE * 8) - shift));
> -#else
> -            res = (res1 >> shift) | (res2 << ((DATA_SIZE * 8) - shift));
> -#endif
> -            res = (DATA_TYPE)res;
> -        } else {
> -            /* unaligned/aligned access in the same page */
> -            uintptr_t addend = env->tlb_table[mmu_idx][index].addend;
> -            res = glue(glue(ld, USUFFIX), _raw)((uint8_t *)(intptr_t)
> -                                                (addr + addend));
> -        }
> -    } else {
> -        /* the page is not in the TLB : fill it */
> -        tlb_fill(env, addr, READ_ACCESS_TYPE, mmu_idx, retaddr);
> -        goto redo;
> -    }
> -    return res;
> -}
> -
>  #ifndef SOFTMMU_CODE_ACCESS
>  
> -static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> -                                                   target_ulong addr,
> -                                                   DATA_TYPE val,
> -                                                   int mmu_idx,
> -                                                   uintptr_t retaddr);
> -
>  static inline void glue(io_write, SUFFIX)(CPUArchState *env,
>                                            hwaddr physaddr,
>                                            DATA_TYPE val,
> @@ -225,48 +178,66 @@ glue(glue(helper_ret_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
>                                               target_ulong addr, DATA_TYPE val,
>                                               int mmu_idx, uintptr_t retaddr)
>  {
> -    hwaddr ioaddr;
> -    target_ulong tlb_addr;
> -    int index;
> +    int index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> +    target_ulong tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +    uintptr_t haddr;
>  
> -    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> - redo:
> -    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> -    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> -        if (tlb_addr & ~TARGET_PAGE_MASK) {
> -            /* IO access */
> -            if ((addr & (DATA_SIZE - 1)) != 0)
> -                goto do_unaligned_access;
> -            ioaddr = env->iotlb[mmu_idx][index];
> -            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
> -        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
> -        do_unaligned_access:
> +    /* If the TLB entry is for a different page, reload and try again.  */
> +    if ((addr & TARGET_PAGE_MASK) 
> +        != (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
>  #ifdef ALIGNED_ONLY
> +        if ((addr & (DATA_SIZE - 1)) != 0) {
>              do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
> +        }
>  #endif
> -            glue(glue(slow_st, SUFFIX), MMUSUFFIX)(env, addr, val,
> -                                                   mmu_idx, retaddr);
> -        } else {
> -            /* aligned/unaligned access in the same page */
> -            uintptr_t addend;
> +        tlb_fill(env, addr, 1, mmu_idx, retaddr);
> +        tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> +    }
> +
> +    /* Handle an IO access.  */
> +    if (unlikely(tlb_addr & ~TARGET_PAGE_MASK)) {
> +        hwaddr ioaddr;
> +        if ((addr & (DATA_SIZE - 1)) != 0) {
> +            goto do_unaligned_access;
> +        }
> +        ioaddr = env->iotlb[mmu_idx][index];
> +        glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
> +        return;
> +    }
> +
> +    /* Handle slow unaligned access (it spans two pages or IO).  */
> +    if (DATA_SIZE > 1
> +        && unlikely ((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1
> +                     >= TARGET_PAGE_SIZE)) {
> +        int i;
> +    do_unaligned_access:
>  #ifdef ALIGNED_ONLY
> -            if ((addr & (DATA_SIZE - 1)) != 0) {
> -                do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
> -            }
> +        do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
> +#endif
> +        /* XXX: not efficient, but simple */
> +        /* Note: relies on the fact that tlb_fill() does not remove the
> +         * previous page from the TLB cache.  */
> +        for (i = DATA_SIZE - 1; i >= 0; i--) {
> +#ifdef TARGET_WORDS_BIGENDIAN
> +            uint8_t val8 = val >> (((DATA_SIZE - 1) * 8) - (i * 8));
> +#else
> +            uint8_t val8 = val >> (i * 8);
>  #endif
> -            addend = env->tlb_table[mmu_idx][index].addend;
> -            glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
> -                                         (addr + addend), val);
> +            glue(helper_ret_stb, MMUSUFFIX)(env, addr + i, val8,
> +                                            mmu_idx, retaddr);
>          }
> -    } else {
> -        /* the page is not in the TLB : fill it */
> +        return;
> +    }
> +
> +    /* Handle aligned access or unaligned access in the same page.  */
>  #ifdef ALIGNED_ONLY
> -        if ((addr & (DATA_SIZE - 1)) != 0)
> -            do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
> -#endif
> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
> -        goto redo;
> +    if ((addr & (DATA_SIZE - 1)) != 0) {
> +        do_unaligned_access(env, addr, 1, mmu_idx, retaddr);
>      }
> +#endif
> +
> +    haddr = addr + env->tlb_table[mmu_idx][index].addend;
> +    glue(glue(st, SUFFIX), _raw)((uint8_t *)haddr, val);
>  }
>  
>  void
> @@ -277,56 +248,6 @@ glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
>                                                   GETPC_EXT());
>  }
>  
> -/* handles all unaligned cases */
> -static void glue(glue(slow_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
> -                                                   target_ulong addr,
> -                                                   DATA_TYPE val,
> -                                                   int mmu_idx,
> -                                                   uintptr_t retaddr)
> -{
> -    hwaddr ioaddr;
> -    target_ulong tlb_addr;
> -    int index, i;
> -
> -    index = (addr >> TARGET_PAGE_BITS) & (CPU_TLB_SIZE - 1);
> - redo:
> -    tlb_addr = env->tlb_table[mmu_idx][index].addr_write;
> -    if ((addr & TARGET_PAGE_MASK) == (tlb_addr & (TARGET_PAGE_MASK | TLB_INVALID_MASK))) {
> -        if (tlb_addr & ~TARGET_PAGE_MASK) {
> -            /* IO access */
> -            if ((addr & (DATA_SIZE - 1)) != 0)
> -                goto do_unaligned_access;
> -            ioaddr = env->iotlb[mmu_idx][index];
> -            glue(io_write, SUFFIX)(env, ioaddr, val, addr, retaddr);
> -        } else if (((addr & ~TARGET_PAGE_MASK) + DATA_SIZE - 1) >= TARGET_PAGE_SIZE) {
> -        do_unaligned_access:
> -            /* XXX: not efficient, but simple */
> -            /* Note: relies on the fact that tlb_fill() does not remove the
> -             * previous page from the TLB cache.  */
> -            for(i = DATA_SIZE - 1; i >= 0; i--) {
> -#ifdef TARGET_WORDS_BIGENDIAN
> -                glue(slow_stb, MMUSUFFIX)(env, addr + i,
> -                                          val >> (((DATA_SIZE - 1) * 8) - (i * 8)),
> -                                          mmu_idx, retaddr);
> -#else
> -                glue(slow_stb, MMUSUFFIX)(env, addr + i,
> -                                          val >> (i * 8),
> -                                          mmu_idx, retaddr);
> -#endif
> -            }
> -        } else {
> -            /* aligned/unaligned access in the same page */
> -            uintptr_t addend = env->tlb_table[mmu_idx][index].addend;
> -            glue(glue(st, SUFFIX), _raw)((uint8_t *)(intptr_t)
> -                                         (addr + addend), val);
> -        }
> -    } else {
> -        /* the page is not in the TLB : fill it */
> -        tlb_fill(env, addr, 1, mmu_idx, retaddr);
> -        goto redo;
> -    }
> -}
> -
>  #endif /* !defined(SOFTMMU_CODE_ACCESS) */
>  
>  #undef READ_ACCESS_TYPE

Beside the minor nitpick above:

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers Richard Henderson
@ 2013-08-15 15:54   ` Aurelien Jarno
  2013-08-15 20:44     ` Richard Henderson
  0 siblings, 1 reply; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-15 15:54 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Aug 05, 2013 at 08:07:23AM -1000, Richard Henderson wrote:
> Discontinue the jump-around-jump-to-jump scheme, trading it for a single
> immediate move instruction.  The two extra jumps always consume 7 bytes,
> whereas the immediate move is either 5 or 7 bytes depending on where the
> code_gen_buffer gets located.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/exec-all.h |  13 +------
>  tcg/i386/tcg-target.c   | 100 +++++++++++++++++++++---------------------------
>  2 files changed, 46 insertions(+), 67 deletions(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 5920f73..b70028a 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -326,18 +326,9 @@ extern uintptr_t tci_tb_ptr;
>     (6) jump to corresponding code of the next of fast path
>   */
>  # if defined(__i386__) || defined(__x86_64__)
> -/* To avoid broken disassembling, long jmp is used for embedding fast path pc,
> -   so that the destination is the next code of fast path, though this jmp is
> -   never executed.
> -
> -   call MMU helper
> -   jmp POST_PROC (2byte)    <- GETRA()
> -   jmp NEXT_CODE (5byte)
> -   POST_PROCESS ...         <- GETRA() + 7
> - */
>  #  define GETRA() ((uintptr_t)__builtin_return_address(0))
> -#  define GETPC_LDST() ((uintptr_t)(GETRA() + 7 + \
> -                                    *(int32_t *)((void *)GETRA() + 3) - 1))
> +/* The return address argument for ldst is passed directly.  */
> +#  define GETPC_LDST()  (abort(), 0)

Why an abort here, while in the arm version, you adds support for
not defining GETPC_LDST?

>  # elif defined (_ARCH_PPC) && !defined (_ARCH_PPC64)
>  #  define GETRA() ((uintptr_t)__builtin_return_address(0))
>  #  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
> diff --git a/tcg/i386/tcg-target.c b/tcg/i386/tcg-target.c
> index 8addfa1..c7a02a3 100644
> --- a/tcg/i386/tcg-target.c
> +++ b/tcg/i386/tcg-target.c
> @@ -190,11 +190,11 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          /* qemu_ld/st address constraint */
>      case 'L':
>          ct->ct |= TCG_CT_REG;
> -#if TCG_TARGET_REG_BITS == 64
> +        if (TCG_TARGET_REG_BITS == 64) {
>              tcg_regset_set32(ct->u.regs, 0, 0xffff);
> -#else
> +        } else {
>              tcg_regset_set32(ct->u.regs, 0, 0xff);
> -#endif
> +        }
>          tcg_regset_reset_reg(ct->u.regs, TCG_REG_L0);
>          tcg_regset_reset_reg(ct->u.regs, TCG_REG_L1);
>          break;
> @@ -1015,22 +1015,24 @@ static void tcg_out_jmp(TCGContext *s, tcg_target_long dest)
>  
>  #include "exec/softmmu_defs.h"
>  
> -/* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
> -   int mmu_idx) */
> -static const void *qemu_ld_helpers[4] = {
> -    helper_ldb_mmu,
> -    helper_ldw_mmu,
> -    helper_ldl_mmu,
> -    helper_ldq_mmu,
> +/* helper signature: helper_ret_ld_mmu(CPUState *env, target_ulong addr,
> + *                                     int mmu_idx, uintptr_t ra)
> + */
> +static const void * const qemu_ld_helpers[4] = {
> +    helper_ret_ldb_mmu,
> +    helper_ret_ldw_mmu,
> +    helper_ret_ldl_mmu,
> +    helper_ret_ldq_mmu,
>  };
>  
> -/* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
> -   uintxx_t val, int mmu_idx) */
> -static const void *qemu_st_helpers[4] = {
> -    helper_stb_mmu,
> -    helper_stw_mmu,
> -    helper_stl_mmu,
> -    helper_stq_mmu,
> +/* helper signature: helper_ret_st_mmu(CPUState *env, target_ulong addr,
> + *                                     uintxx_t val, int mmu_idx, uintptr_t ra)
> + */
> +static const void * const qemu_st_helpers[4] = {
> +    helper_ret_stb_mmu,
> +    helper_ret_stw_mmu,
> +    helper_ret_stl_mmu,
> +    helper_ret_stq_mmu,
>  };
>  
>  static void add_qemu_ldst_label(TCGContext *s,
> @@ -1458,6 +1460,12 @@ static void add_qemu_ldst_label(TCGContext *s,
>      }
>  }
>  
> +/* See the GETPC definition in include/exec/exec-all.h.  */
> +static inline uintptr_t do_getpc(uint8_t *raddr)
> +{
> +    return (uintptr_t)raddr - 1;
> +}
> +
>  /*
>   * Generate code for the slow path for a load at the end of block
>   */
> @@ -1490,33 +1498,21 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
>  
>          tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
>          tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
> +        ofs += 4;
> +
> +        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, do_getpc(l->raddr));
> +        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);

Same as the other patch, this can be done in one instruction.

>      } else {
> -        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
> +        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
>          /* The second argument is already loaded with addrlo.  */
>          tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[2],
>                       l->mem_index);
> +        tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[3],
> +                     do_getpc(l->raddr));
>      }
>  
> -    /* Code generation of qemu_ld/st's slow path calling MMU helper
> -
> -       PRE_PROC ...
> -       call MMU helper
> -       jmp POST_PROC (2b) : short forward jump <- GETRA()
> -       jmp next_code (5b) : dummy long backward jump which is never executed
> -       POST_PROC ... : do post-processing <- GETRA() + 7
> -       jmp next_code : jump to the code corresponding to next IR of qemu_ld/st
> -    */
> -
>      tcg_out_calli(s, (tcg_target_long)qemu_ld_helpers[s_bits]);
>  
> -    /* Jump to post-processing code */
> -    tcg_out8(s, OPC_JMP_short);
> -    tcg_out8(s, 5);
> -    /* Dummy backward jump having information of fast path'pc for MMU helpers */
> -    tcg_out8(s, OPC_JMP_long);
> -    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
> -    s->code_ptr += 4;
> -
>      data_reg = l->datalo_reg;
>      switch(opc) {
>      case 0 | 4:
> @@ -1598,36 +1594,28 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
>  
>          tcg_out_movi(s, TCG_TYPE_I32, TCG_REG_EAX, l->mem_index);
>          tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
> +        ofs += 4;
> +
> +        tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_EAX, do_getpc(l->raddr));
> +        tcg_out_st(s, TCG_TYPE_I32, TCG_REG_EAX, TCG_REG_ESP, ofs);
>      } else {
> -        tcg_out_mov(s, TCG_TYPE_I64, tcg_target_call_iarg_regs[0], TCG_AREG0);
> +        tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
>          /* The second argument is already loaded with addrlo.  */
>          tcg_out_mov(s, (opc == 3 ? TCG_TYPE_I64 : TCG_TYPE_I32),
>                      tcg_target_call_iarg_regs[2], l->datalo_reg);
>          tcg_out_movi(s, TCG_TYPE_I32, tcg_target_call_iarg_regs[3],
>                       l->mem_index);
> +        if (ARRAY_SIZE(tcg_target_call_iarg_regs) > 4) {
> +            tcg_out_movi(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[4],
> +                         do_getpc(l->raddr));
> +        } else {
> +            tcg_out_movi(s, TCG_TYPE_PTR, TCG_REG_RAX, do_getpc(l->raddr));
> +            tcg_out_st(s, TCG_TYPE_PTR, TCG_REG_RAX, TCG_REG_ESP, 0);
> +        }
>      }
>  
> -    /* Code generation of qemu_ld/st's slow path calling MMU helper
> -
> -       PRE_PROC ...
> -       call MMU helper
> -       jmp POST_PROC (2b) : short forward jump <- GETRA()
> -       jmp next_code (5b) : dummy long backward jump which is never executed
> -       POST_PROC ... : do post-processing <- GETRA() + 7
> -       jmp next_code : jump to the code corresponding to next IR of qemu_ld/st
> -    */
> -
>      tcg_out_calli(s, (tcg_target_long)qemu_st_helpers[s_bits]);
>  
> -    /* Jump to post-processing code */
> -    tcg_out8(s, OPC_JMP_short);
> -    tcg_out8(s, 5);
> -    /* Dummy backward jump having information of fast path'pc for MMU helpers */
> -    tcg_out8(s, OPC_JMP_long);
> -    *(int32_t *)s->code_ptr = (int32_t)(l->raddr - s->code_ptr - 4);
> -    s->code_ptr += 4;
> -
> -    /* Jump to the code corresponding to next IR of qemu_st */
>      tcg_out_jmp(s, (tcg_target_long)l->raddr);
>  }
>  

Beside the small nitpicking above, it looks fine to me.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers
  2013-08-15 15:54   ` Aurelien Jarno
@ 2013-08-15 20:44     ` Richard Henderson
  2013-08-16  8:35       ` Aurelien Jarno
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-15 20:44 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 08/15/2013 08:54 AM, Aurelien Jarno wrote:
>> >  #  define GETRA() ((uintptr_t)__builtin_return_address(0))
>> > -#  define GETPC_LDST() ((uintptr_t)(GETRA() + 7 + \
>> > -                                    *(int32_t *)((void *)GETRA() + 3) - 1))
>> > +/* The return address argument for ldst is passed directly.  */
>> > +#  define GETPC_LDST()  (abort(), 0)
> Why an abort here, while in the arm version, you adds support for
> not defining GETPC_LDST?
> 

GETPC_LDST is for the original helpers, when called from TCG.

In the arm case, TCG still uses the original helpers, so GETPC_LDST is used.

In the i386, TCG never uses the original helpers, so GETPC_LDST should never be
used.  We could do like arm and completely drop the check, I suppose.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument
  2013-08-15 15:54   ` Aurelien Jarno
@ 2013-08-15 20:45     ` Richard Henderson
  2013-08-16  8:35       ` Aurelien Jarno
  0 siblings, 1 reply; 21+ messages in thread
From: Richard Henderson @ 2013-08-15 20:45 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: qemu-devel

On 08/15/2013 08:54 AM, Aurelien Jarno wrote:
> Why removing this st*_cmmu versions? There might be a good reason, but
> it should be indicated in the patch description.

Though the prototypes existed, the bodies were never generated.  We already
ifdef out the store functions in the template when invoked for reading for code.


r~

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers
  2013-08-15 20:44     ` Richard Henderson
@ 2013-08-16  8:35       ` Aurelien Jarno
  0 siblings, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-16  8:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Aug 15, 2013 at 01:44:08PM -0700, Richard Henderson wrote:
> On 08/15/2013 08:54 AM, Aurelien Jarno wrote:
> >> >  #  define GETRA() ((uintptr_t)__builtin_return_address(0))
> >> > -#  define GETPC_LDST() ((uintptr_t)(GETRA() + 7 + \
> >> > -                                    *(int32_t *)((void *)GETRA() + 3) - 1))
> >> > +/* The return address argument for ldst is passed directly.  */
> >> > +#  define GETPC_LDST()  (abort(), 0)
> > Why an abort here, while in the arm version, you adds support for
> > not defining GETPC_LDST?
> > 
> 
> GETPC_LDST is for the original helpers, when called from TCG.
> 
> In the arm case, TCG still uses the original helpers, so GETPC_LDST is used.
> 
> In the i386, TCG never uses the original helpers, so GETPC_LDST should never be
> used.  We could do like arm and completely drop the check, I suppose.

Ok, I haven't looked in details at the arm patches (i am currently
trying them), and I haven't really seen the difference. Then just ignore
this comment.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument
  2013-08-15 20:45     ` Richard Henderson
@ 2013-08-16  8:35       ` Aurelien Jarno
  0 siblings, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-16  8:35 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Thu, Aug 15, 2013 at 01:45:31PM -0700, Richard Henderson wrote:
> On 08/15/2013 08:54 AM, Aurelien Jarno wrote:
> > Why removing this st*_cmmu versions? There might be a good reason, but
> > it should be indicated in the patch description.
> 
> Though the prototypes existed, the bodies were never generated.  We already
> ifdef out the store functions in the template when invoked for reading for code.
> 

Fine then. Could you mention that in the commit message?

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st Richard Henderson
@ 2013-08-16  8:36   ` Aurelien Jarno
  2013-08-16  8:55   ` Andreas Färber
  1 sibling, 0 replies; 21+ messages in thread
From: Aurelien Jarno @ 2013-08-16  8:36 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, Aug 05, 2013 at 08:07:25AM -1000, Richard Henderson wrote:
> Instead of using a branch-call-branch sequence, arrange for a
> call-branch sequence, using the ARM's conditional call insn.
> This reduces the size of the slow-path within the TB, and makes
> the GETPC_EXT implementation identical for TCG and not-TCG.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/exec-all.h |  23 +----
>  tcg/arm/tcg-target.c    | 269 +++++++++++++++++++++++-------------------------
>  2 files changed, 133 insertions(+), 159 deletions(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index b70028a..b3402a1 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -333,23 +333,8 @@ extern uintptr_t tci_tb_ptr;
>  #  define GETRA() ((uintptr_t)__builtin_return_address(0))
>  #  define GETPC_LDST() ((uintptr_t) ((*(int32_t *)(GETRA() - 4)) - 1))
>  # elif defined(__arm__)
> -/* We define two insns between the return address and the branch back to
> -   straight-line.  Find and decode that branch insn.  */
> -#  define GETRA()       ((uintptr_t)__builtin_return_address(0))
> -#  define GETPC_LDST()  tcg_getpc_ldst(GETRA())
> -static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
> -{
> -    int32_t b;
> -    ra += 8;                    /* skip the two insns */
> -    b = *(int32_t *)ra;         /* load the branch insn */
> -    b = (b << 8) >> (8 - 2);    /* extract the displacement */
> -    ra += 8;                    /* branches are relative to pc+8 */
> -    ra += b;                    /* apply the displacement */
> -    ra -= 4;                    /* return a pointer into the current opcode,
> -                                   not the start of the next opcode  */
> -    return ra;
> -}
> -#elif defined(__aarch64__)
> +#  define GETPC_EXT()  GETPC()
> +# elif defined(__aarch64__)
>  #  define GETRA()       ((uintptr_t)__builtin_return_address(0))
>  #  define GETPC_LDST()  tcg_getpc_ldst(GETRA())
>  static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
> @@ -367,7 +352,9 @@ static inline uintptr_t tcg_getpc_ldst(uintptr_t ra)
>  #  error "CONFIG_QEMU_LDST_OPTIMIZATION needs GETPC_LDST() implementation!"
>  # endif
>  bool is_tcg_gen_code(uintptr_t pc_ptr);
> -# define GETPC_EXT() (is_tcg_gen_code(GETRA()) ? GETPC_LDST() : GETPC())
> +# ifndef GETPC_EXT
> +#  define GETPC_EXT() (is_tcg_gen_code(GETRA()) ? GETPC_LDST() : GETPC())
> +# endif
>  #else
>  # define GETPC_EXT() GETPC()
>  #endif
> diff --git a/tcg/arm/tcg-target.c b/tcg/arm/tcg-target.c
> index 9a14a20..89917d6 100644
> --- a/tcg/arm/tcg-target.c
> +++ b/tcg/arm/tcg-target.c
> @@ -166,10 +166,26 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          break;
>  
>      case 'r':
> +#ifndef CONFIG_SOFTMMU
> +    case 'a':
> +    case 'b':
> +#endif
>          ct->ct |= TCG_CT_REG;
>          tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
>          break;
>  
> +#ifdef CONFIG_SOFTMMU
> +    /* qemu_ld data registers -- softmmu uses the return registers only.  */
> +    case 'a':
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set32(ct->u.regs, 0, 1);
> +        break;
> +    case 'b':
> +        ct->ct |= TCG_CT_REG;
> +        tcg_regset_set32(ct->u.regs, 0, 2);
> +        break;
> +#endif
> +
>      /* qemu_ld address */
>      case 'l':
>          ct->ct |= TCG_CT_REG;
> @@ -182,15 +198,6 @@ static int target_parse_constraint(TCGArgConstraint *ct, const char **pct_str)
>          tcg_regset_reset_reg(ct->u.regs, TCG_REG_R2);
>  #endif
>          break;
> -    case 'L':
> -        ct->ct |= TCG_CT_REG;
> -        tcg_regset_set32(ct->u.regs, 0, (1 << TCG_TARGET_NB_REGS) - 1);
> -#ifdef CONFIG_SOFTMMU
> -        /* r1 is still needed to load data_reg or data_reg2,
> -           so don't use it. */
> -        tcg_regset_reset_reg(ct->u.regs, TCG_REG_R1);
> -#endif
> -        break;
>  
>      /* qemu_st address & data_reg */
>      case 's':
> @@ -382,13 +389,17 @@ static inline void tcg_out_b_noaddr(TCGContext *s, int cond)
>      /* We pay attention here to not modify the branch target by skipping
>         the corresponding bytes. This ensure that caches and memory are
>         kept coherent during retranslation. */
> -#ifdef HOST_WORDS_BIGENDIAN
> -    tcg_out8(s, (cond << 4) | 0x0a);
> -    s->code_ptr += 3;
> -#else
>      s->code_ptr += 3;
>      tcg_out8(s, (cond << 4) | 0x0a);
> -#endif
> +}
> +
> +static inline void tcg_out_bl_noaddr(TCGContext *s, int cond)
> +{
> +    /* We pay attention here to not modify the branch target by skipping
> +       the corresponding bytes. This ensure that caches and memory are
> +       kept coherent during retranslation. */
> +    s->code_ptr += 3;
> +    tcg_out8(s, (cond << 4) | 0x0b);
>  }
>  
>  static inline void tcg_out_bl(TCGContext *s, int cond, int32_t offset)
> @@ -1002,34 +1013,27 @@ static inline void tcg_out_st8(TCGContext *s, int cond,
>          tcg_out_st8_12(s, cond, rd, rn, offset);
>  }
>  
> -/* The _goto case is normally between TBs within the same code buffer,
> - * and with the code buffer limited to 16MB we shouldn't need the long
> - * case.
> - *
> - * .... except to the prologue that is in its own buffer.
> +/* The _goto case is normally between TBs within the same code buffer, and
> + * with the code buffer limited to 16MB we wouldn't need the long case.
> + * But we also use it for the tail-call to the qemu_ld/st helpers, which does.
>   */
>  static inline void tcg_out_goto(TCGContext *s, int cond, uint32_t addr)
>  {
> -    int32_t val;
> +    int32_t disp = addr - (tcg_target_long) s->code_ptr;
>  
> -    if (addr & 1) {
> -        /* goto to a Thumb destination isn't supported */
> -        tcg_abort();
> +    if ((addr & 1) == 0 && disp - 8 < 0x01fffffd && disp - 8 > -0x01fffffd) {
> +        tcg_out_b(s, cond, disp);
> +        return;
>      }
>  
> -    val = addr - (tcg_target_long) s->code_ptr;
> -    if (val - 8 < 0x01fffffd && val - 8 > -0x01fffffd)
> -        tcg_out_b(s, cond, val);
> -    else {
> -        if (cond == COND_AL) {
> -            tcg_out_ld32_12(s, COND_AL, TCG_REG_PC, TCG_REG_PC, -4);
> -            tcg_out32(s, addr);
> -        } else {
> -            tcg_out_movi32(s, cond, TCG_REG_TMP, val - 8);
> -            tcg_out_dat_reg(s, cond, ARITH_ADD,
> -                            TCG_REG_PC, TCG_REG_PC,
> -                            TCG_REG_TMP, SHIFT_IMM_LSL(0));
> +    tcg_out_movi32(s, cond, TCG_REG_TMP, addr);
> +    if (use_armv5t_instructions) {
> +        tcg_out_bx(s, cond, TCG_REG_TMP);
> +    } else {
> +        if (addr & 1) {
> +            tcg_abort();
>          }
> +        tcg_out_mov_reg(s, cond, TCG_REG_PC, TCG_REG_TMP);
>      }
>  }
>  
> @@ -1087,17 +1091,44 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, int label_index)
>  
>  #include "exec/softmmu_defs.h"
>  
> +static uint32_t arm_ldbu_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (uint8_t)helper_ret_ldb_mmu(env, addr, idx, GETPC());
> +}
> +
> +static uint32_t arm_ldbs_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (int8_t)helper_ret_ldb_mmu(env, addr, idx, GETPC());
> +}
> +
> +static uint32_t arm_ldwu_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (uint16_t)helper_ret_ldw_mmu(env, addr, idx, GETPC());
> +}
> +
> +static uint32_t arm_ldws_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (int16_t)helper_ret_ldw_mmu(env, addr, idx, GETPC());
> +}
> +
>  /* helper signature: helper_ld_mmu(CPUState *env, target_ulong addr,
> -   int mmu_idx) */
> -static const void * const qemu_ld_helpers[4] = {
> -    helper_ldb_mmu,
> -    helper_ldw_mmu,
> + *                                 int mmu_idx)
> + */
> +static const void * const qemu_ld_helpers[8] = {
> +    arm_ldbu_mmu,
> +    arm_ldwu_mmu,
> +    helper_ldl_mmu,
> +    helper_ldq_mmu,
> +
> +    arm_ldbs_mmu,
> +    arm_ldws_mmu,
>      helper_ldl_mmu,
>      helper_ldq_mmu,
>  };
>  
>  /* helper signature: helper_st_mmu(CPUState *env, target_ulong addr,
> -   uintxx_t val, int mmu_idx) */
> + *                                 uintxx_t val, int mmu_idx)
> + */
>  static const void * const qemu_st_helpers[4] = {
>      helper_stb_mmu,
>      helper_stw_mmu,
> @@ -1260,8 +1291,7 @@ static void add_qemu_ldst_label(TCGContext *s, int is_ld, int opc,
>  
>  static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>  {
> -    TCGReg argreg, data_reg, data_reg2;
> -    uint8_t *start;
> +    TCGReg argreg;
>  
>      reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
>  
> @@ -1272,59 +1302,18 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>          argreg = tcg_out_arg_reg32(s, argreg, lb->addrlo_reg);
>      }
>      argreg = tcg_out_arg_imm32(s, argreg, lb->mem_index);
> -    tcg_out_call(s, (tcg_target_long) qemu_ld_helpers[lb->opc & 3]);
> -
> -    data_reg = lb->datalo_reg;
> -    data_reg2 = lb->datahi_reg;
> -
> -    start = s->code_ptr;
> -    switch (lb->opc) {
> -    case 0 | 4:
> -        tcg_out_ext8s(s, COND_AL, data_reg, TCG_REG_R0);
> -        break;
> -    case 1 | 4:
> -        tcg_out_ext16s(s, COND_AL, data_reg, TCG_REG_R0);
> -        break;
> -    case 0:
> -    case 1:
> -    case 2:
> -    default:
> -        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
> -        break;
> -    case 3:
> -        tcg_out_mov_reg(s, COND_AL, data_reg, TCG_REG_R0);
> -        tcg_out_mov_reg(s, COND_AL, data_reg2, TCG_REG_R1);
> -        break;
> -    }
>  
> -    /* For GETPC_LDST in exec-all.h, we architect exactly 2 insns between
> -       the call and the branch back to straight-line code.  Note that the
> -       moves above could be elided by register allocation, nor do we know
> -       which code alternative we chose for extension.  */
> -    switch (s->code_ptr - start) {
> -    case 0:
> -        tcg_out_nop(s);
> -        /* FALLTHRU */
> -    case 4:
> -        tcg_out_nop(s);
> -        /* FALLTHRU */
> -    case 8:
> -        break;
> -    default:
> -        abort();
> -    }
> -
> -    tcg_out_goto(s, COND_AL, (tcg_target_long)lb->raddr);
> +    /* Tail-call to the helper, which will return to the fast path.  */
> +    tcg_out_goto(s, COND_AL, (tcg_target_long) qemu_ld_helpers[lb->opc]);
>  }
>  
>  static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>  {
> -    TCGReg argreg, data_reg, data_reg2;
> +    TCGReg argreg;
>  
>      reloc_pc24(lb->label_ptr[0], (tcg_target_long)s->code_ptr);
>  
> -    argreg = TCG_REG_R0;
> -    argreg = tcg_out_arg_reg32(s, argreg, TCG_AREG0);
> +    argreg = tcg_out_arg_reg32(s, TCG_REG_R0, TCG_AREG0);
>      if (TARGET_LONG_BITS == 64) {
>          argreg = tcg_out_arg_reg64(s, argreg, lb->addrlo_reg, lb->addrhi_reg);
>      } else {
> @@ -1349,13 +1338,9 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *lb)
>      }
>  
>      argreg = tcg_out_arg_imm32(s, argreg, lb->mem_index);
> -    tcg_out_call(s, (tcg_target_long) qemu_st_helpers[lb->opc & 3]);
>  
> -    /* For GETPC_LDST in exec-all.h, we architect exactly 2 insns between
> -       the call and the branch back to straight-line code.  */
> -    tcg_out_nop(s);
> -    tcg_out_nop(s);
> -    tcg_out_goto(s, COND_AL, (tcg_target_long)lb->raddr);
> +    /* Tail-call to the helper, which will return to the fast path.  */
> +    tcg_out_goto(s, COND_AL, (tcg_target_long) qemu_st_helpers[lb->opc & 3]);
>  }
>  #endif /* SOFTMMU */
>  
> @@ -1385,57 +1370,58 @@ static void tcg_out_qemu_ld(TCGContext *s, const TCGArg *args, int opc)
>      tcg_out_tlb_read(s, addr_reg, addr_reg2, s_bits,
>                       offsetof(CPUArchState, tlb_table[mem_index][0].addr_read));
>  
> -    label_ptr = s->code_ptr;
> -    tcg_out_b_noaddr(s, COND_NE);
> -
>      tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2,
>                      offsetof(CPUTLBEntry, addend)
>                      - offsetof(CPUTLBEntry, addr_read));
>  
>      switch (opc) {
>      case 0:
> -        tcg_out_ld8_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +        tcg_out_ld8_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          break;
>      case 0 | 4:
> -        tcg_out_ld8s_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +        tcg_out_ld8s_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          break;
>      case 1:
> -        tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +        tcg_out_ld16u_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          if (bswap) {
> -            tcg_out_bswap16(s, COND_AL, data_reg, data_reg);
> +            tcg_out_bswap16(s, COND_EQ, data_reg, data_reg);
>          }
>          break;
>      case 1 | 4:
>          if (bswap) {
> -            tcg_out_ld16u_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> -            tcg_out_bswap16s(s, COND_AL, data_reg, data_reg);
> +            tcg_out_ld16u_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
> +            tcg_out_bswap16s(s, COND_EQ, data_reg, data_reg);
>          } else {
> -            tcg_out_ld16s_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +            tcg_out_ld16s_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          }
>          break;
>      case 2:
>      default:
> -        tcg_out_ld32_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +        tcg_out_ld32_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          if (bswap) {
> -            tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
> +            tcg_out_bswap32(s, COND_EQ, data_reg, data_reg);
>          }
>          break;
>      case 3:
>          if (bswap) {
> -            tcg_out_ld32_rwb(s, COND_AL, data_reg2, TCG_REG_R1, addr_reg);
> -            tcg_out_ld32_12(s, COND_AL, data_reg, TCG_REG_R1, 4);
> -            tcg_out_bswap32(s, COND_AL, data_reg2, data_reg2);
> -            tcg_out_bswap32(s, COND_AL, data_reg, data_reg);
> +            tcg_out_ld32_rwb(s, COND_EQ, data_reg2, TCG_REG_R1, addr_reg);
> +            tcg_out_ld32_12(s, COND_EQ, data_reg, TCG_REG_R1, 4);
> +            tcg_out_bswap32(s, COND_EQ, data_reg2, data_reg2);
> +            tcg_out_bswap32(s, COND_EQ, data_reg, data_reg);
>          } else if (use_armv6_instructions
>                     && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
> -            tcg_out_ldrd_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +            tcg_out_ldrd_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          } else {
> -            tcg_out_ld32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
> -            tcg_out_ld32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
> +            tcg_out_ld32_rwb(s, COND_EQ, data_reg, TCG_REG_R1, addr_reg);
> +            tcg_out_ld32_12(s, COND_EQ, data_reg2, TCG_REG_R1, 4);
>          }
>          break;
>      }
>  
> +    /* The conditional call must come last, as we're going to return here.  */
> +    label_ptr = s->code_ptr;
> +    tcg_out_bl_noaddr(s, COND_NE);
> +
>      add_qemu_ldst_label(s, 1, opc, data_reg, data_reg2, addr_reg, addr_reg2,
>                          mem_index, s->code_ptr, label_ptr);
>  #else /* !CONFIG_SOFTMMU */
> @@ -1531,50 +1517,51 @@ static void tcg_out_qemu_st(TCGContext *s, const TCGArg *args, int opc)
>                       offsetof(CPUArchState,
>                                tlb_table[mem_index][0].addr_write));
>  
> -    label_ptr = s->code_ptr;
> -    tcg_out_b_noaddr(s, COND_NE);
> -
>      tcg_out_ld32_12(s, COND_AL, TCG_REG_R1, TCG_REG_R2,
>                      offsetof(CPUTLBEntry, addend)
>                      - offsetof(CPUTLBEntry, addr_write));
>  
>      switch (opc) {
>      case 0:
> -        tcg_out_st8_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +        tcg_out_st8_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          break;
>      case 1:
>          if (bswap) {
> -            tcg_out_bswap16st(s, COND_AL, TCG_REG_R0, data_reg);
> -            tcg_out_st16_r(s, COND_AL, TCG_REG_R0, addr_reg, TCG_REG_R1);
> +            tcg_out_bswap16st(s, COND_EQ, TCG_REG_R0, data_reg);
> +            tcg_out_st16_r(s, COND_EQ, TCG_REG_R0, addr_reg, TCG_REG_R1);
>          } else {
> -            tcg_out_st16_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +            tcg_out_st16_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          }
>          break;
>      case 2:
>      default:
>          if (bswap) {
> -            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
> -            tcg_out_st32_r(s, COND_AL, TCG_REG_R0, addr_reg, TCG_REG_R1);
> +            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg);
> +            tcg_out_st32_r(s, COND_EQ, TCG_REG_R0, addr_reg, TCG_REG_R1);
>          } else {
> -            tcg_out_st32_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +            tcg_out_st32_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          }
>          break;
>      case 3:
>          if (bswap) {
> -            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg2);
> -            tcg_out_st32_rwb(s, COND_AL, TCG_REG_R0, TCG_REG_R1, addr_reg);
> -            tcg_out_bswap32(s, COND_AL, TCG_REG_R0, data_reg);
> -            tcg_out_st32_12(s, COND_AL, TCG_REG_R0, TCG_REG_R1, 4);
> +            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg2);
> +            tcg_out_st32_rwb(s, COND_EQ, TCG_REG_R0, TCG_REG_R1, addr_reg);
> +            tcg_out_bswap32(s, COND_EQ, TCG_REG_R0, data_reg);
> +            tcg_out_st32_12(s, COND_EQ, TCG_REG_R0, TCG_REG_R1, 4);
>          } else if (use_armv6_instructions
>                     && (data_reg & 1) == 0 && data_reg2 == data_reg + 1) {
> -            tcg_out_strd_r(s, COND_AL, data_reg, addr_reg, TCG_REG_R1);
> +            tcg_out_strd_r(s, COND_EQ, data_reg, addr_reg, TCG_REG_R1);
>          } else {
> -            tcg_out_st32_rwb(s, COND_AL, data_reg, TCG_REG_R1, addr_reg);
> -            tcg_out_st32_12(s, COND_AL, data_reg2, TCG_REG_R1, 4);
> +            tcg_out_st32_rwb(s, COND_EQ, data_reg, TCG_REG_R1, addr_reg);
> +            tcg_out_st32_12(s, COND_EQ, data_reg2, TCG_REG_R1, 4);
>          }
>          break;
>      }
>  
> +    /* The conditional call must come last, as we're going to return here.  */
> +    label_ptr = s->code_ptr;
> +    tcg_out_bl_noaddr(s, COND_NE);
> +
>      add_qemu_ldst_label(s, 0, opc, data_reg, data_reg2, addr_reg, addr_reg2,
>                          mem_index, s->code_ptr, label_ptr);
>  #else /* !CONFIG_SOFTMMU */
> @@ -2026,24 +2013,24 @@ static const TCGTargetOpDef arm_op_defs[] = {
>      { INDEX_op_setcond2_i32, { "r", "r", "r", "rIN", "rIN" } },
>  
>  #if TARGET_LONG_BITS == 32
> -    { INDEX_op_qemu_ld8u, { "r", "l" } },
> -    { INDEX_op_qemu_ld8s, { "r", "l" } },
> -    { INDEX_op_qemu_ld16u, { "r", "l" } },
> -    { INDEX_op_qemu_ld16s, { "r", "l" } },
> -    { INDEX_op_qemu_ld32, { "r", "l" } },
> -    { INDEX_op_qemu_ld64, { "L", "L", "l" } },
> +    { INDEX_op_qemu_ld8u, { "a", "l" } },
> +    { INDEX_op_qemu_ld8s, { "a", "l" } },
> +    { INDEX_op_qemu_ld16u, { "a", "l" } },
> +    { INDEX_op_qemu_ld16s, { "a", "l" } },
> +    { INDEX_op_qemu_ld32, { "a", "l" } },
> +    { INDEX_op_qemu_ld64, { "a", "b", "l" } },
>  
>      { INDEX_op_qemu_st8, { "s", "s" } },
>      { INDEX_op_qemu_st16, { "s", "s" } },
>      { INDEX_op_qemu_st32, { "s", "s" } },
>      { INDEX_op_qemu_st64, { "s", "s", "s" } },
>  #else
> -    { INDEX_op_qemu_ld8u, { "r", "l", "l" } },
> -    { INDEX_op_qemu_ld8s, { "r", "l", "l" } },
> -    { INDEX_op_qemu_ld16u, { "r", "l", "l" } },
> -    { INDEX_op_qemu_ld16s, { "r", "l", "l" } },
> -    { INDEX_op_qemu_ld32, { "r", "l", "l" } },
> -    { INDEX_op_qemu_ld64, { "L", "L", "l", "l" } },
> +    { INDEX_op_qemu_ld8u, { "a", "l", "l" } },
> +    { INDEX_op_qemu_ld8s, { "a", "l", "l" } },
> +    { INDEX_op_qemu_ld16u, { "a", "l", "l" } },
> +    { INDEX_op_qemu_ld16s, { "a", "l", "l" } },
> +    { INDEX_op_qemu_ld32, { "a", "l", "l" } },
> +    { INDEX_op_qemu_ld64, { "a", "b", "l", "l" } },
>  
>      { INDEX_op_qemu_st8, { "s", "s", "s" } },
>      { INDEX_op_qemu_st16, { "s", "s", "s" } },

I am not able to build at least the mips-softmmu target with this patch
applied:

| /home/aurel32/qemu/tcg/arm/tcg-target.c: In function ‘arm_ldbu_mmu’:
| /home/aurel32/qemu/tcg/arm/tcg-target.c:1096:5: error: passing argument 1 of ‘helper_ret_ldb_mmu’ from incompatible pointer type [-Werror]
| /home/aurel32/qemu/include/exec/softmmu_defs.h:12:9: note: expected ‘struct CPUMIPSState *’ but argument is of type ‘struct CPUState *’
| /home/aurel32/qemu/tcg/arm/tcg-target.c: In function ‘arm_ldbs_mmu’:
| /home/aurel32/qemu/tcg/arm/tcg-target.c:1101:5: error: passing argument 1 of ‘helper_ret_ldb_mmu’ from incompatible pointer type [-Werror]
| /home/aurel32/qemu/include/exec/softmmu_defs.h:12:9: note: expected ‘struct CPUMIPSState *’ but argument is of type ‘struct CPUState *’
| /home/aurel32/qemu/tcg/arm/tcg-target.c: In function ‘arm_ldwu_mmu’:
| /home/aurel32/qemu/tcg/arm/tcg-target.c:1106:5: error: passing argument 1 of ‘helper_ret_ldw_mmu’ from incompatible pointer type [-Werror]
| /home/aurel32/qemu/include/exec/softmmu_defs.h:14:10: note: expected ‘struct CPUMIPSState *’ but argument is of type ‘struct CPUState *’
| /home/aurel32/qemu/tcg/arm/tcg-target.c: In function ‘arm_ldws_mmu’:
| /home/aurel32/qemu/tcg/arm/tcg-target.c:1111:5: error: passing argument 1 of ‘helper_ret_ldw_mmu’ from incompatible pointer type [-Werror]
| /home/aurel32/qemu/include/exec/softmmu_defs.h:14:10: note: expected ‘struct CPUMIPSState *’ but argument is of type ‘struct CPUState *’
| In file included from /home/aurel32/qemu/tcg/tcg.c:198:0:
| /home/aurel32/qemu/tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_st_slow_path’:
| /home/aurel32/qemu/tcg/arm/tcg-target.c:1323:5: error: ‘data_reg’ undeclared (first use in this function)
| /home/aurel32/qemu/tcg/arm/tcg-target.c:1323:5: note: each undeclared identifier is reported only once for each function it appears in
| /home/aurel32/qemu/tcg/arm/tcg-target.c:1324:5: error: ‘data_reg2’ undeclared (first use in this function)


-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st
  2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st Richard Henderson
  2013-08-16  8:36   ` Aurelien Jarno
@ 2013-08-16  8:55   ` Andreas Färber
  1 sibling, 0 replies; 21+ messages in thread
From: Andreas Färber @ 2013-08-16  8:55 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, Aurelien Jarno

Am 05.08.2013 20:07, schrieb Richard Henderson:
> Instead of using a branch-call-branch sequence, arrange for a
> call-branch sequence, using the ARM's conditional call insn.
> This reduces the size of the slow-path within the TB, and makes
> the GETPC_EXT implementation identical for TCG and not-TCG.
> 
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
>  include/exec/exec-all.h |  23 +----
>  tcg/arm/tcg-target.c    | 269 +++++++++++++++++++++++-------------------------
>  2 files changed, 133 insertions(+), 159 deletions(-)
> 
> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index b70028a..b3402a1 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
[...]
> @@ -1087,17 +1091,44 @@ static inline void tcg_out_goto_label(TCGContext *s, int cond, int label_index)
>  
>  #include "exec/softmmu_defs.h"
>  
> +static uint32_t arm_ldbu_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (uint8_t)helper_ret_ldb_mmu(env, addr, idx, GETPC());
> +}
> +
> +static uint32_t arm_ldbs_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (int8_t)helper_ret_ldb_mmu(env, addr, idx, GETPC());
> +}
> +
> +static uint32_t arm_ldwu_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (uint16_t)helper_ret_ldw_mmu(env, addr, idx, GETPC());
> +}
> +
> +static uint32_t arm_ldws_mmu(CPUState *env, target_ulong addr, int idx)
> +{
> +    return (int16_t)helper_ret_ldw_mmu(env, addr, idx, GETPC());
> +}

Either CPUState *cpu or CPUArchState *env. I assume you meant the latter
when passing it to a helper?

Andreas

-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2013-08-16  8:56 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-08-05 18:07 [Qemu-devel] [PATCH for-next 0/8] Improve tcg ldst optimization Richard Henderson
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 1/8] tcg-i386: Add and use tcg_out64 Richard Henderson
2013-08-15 15:54   ` Aurelien Jarno
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 2/8] tcg-i386: Try pc-relative lea for constant formation Richard Henderson
2013-08-15 15:54   ` Aurelien Jarno
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 3/8] tcg-i386: Tidy qemu_ld/st slow path Richard Henderson
2013-08-15 15:54   ` Aurelien Jarno
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 4/8] tcg: Add mmu helpers that take a return address argument Richard Henderson
2013-08-15 15:54   ` Aurelien Jarno
2013-08-15 20:45     ` Richard Henderson
2013-08-16  8:35       ` Aurelien Jarno
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 5/8] tcg: Tidy softmmu_template.h Richard Henderson
2013-08-15 15:54   ` Aurelien Jarno
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 6/8] tcg-i386: Use new return-argument ld/st helpers Richard Henderson
2013-08-15 15:54   ` Aurelien Jarno
2013-08-15 20:44     ` Richard Henderson
2013-08-16  8:35       ` Aurelien Jarno
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 7/8] tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64 Richard Henderson
2013-08-05 18:07 ` [Qemu-devel] [PATCH for-next 8/8] tcg-arm: Rearrange slow-path qemu_ld/st Richard Henderson
2013-08-16  8:36   ` Aurelien Jarno
2013-08-16  8:55   ` Andreas Färber

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).