* [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
@ 2009-03-28 21:30 Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h Nathan Froyd
` (4 more replies)
0 siblings, 5 replies; 13+ messages in thread
From: Nathan Froyd @ 2009-03-28 21:30 UTC (permalink / raw)
To: qemu-devel
For PPC guests, I noticed that we create TCG slots for all the potential
kinds of registers (float, Altivec, SPE), even if the chip doesn't have
instructions to access those registers.
This patch series tweaks the initialization routine to create the TCG
values for registers necessary for particular classes of instructions
only if the emulated chip supports those instructions. The first couple
of patches are simply busywork of moving things around; the last patch
is where all the action is at.
I am not a TCG expert, but there are several loops in TCG over all
globals and it seems like those loops would go faster if they didn't
have to consider registers that would never be touched. If this patch
series makes no difference in TCG's performance, then I'd be glad to
have an explanation of why that's the case.
-Nathan
^ permalink raw reply [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h
2009-03-28 21:30 [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU Nathan Froyd
@ 2009-03-28 21:30 ` Nathan Froyd
2009-03-28 22:58 ` Aurelien Jarno
2009-03-28 21:30 ` [Qemu-devel] [PATCH 2/4] move ppc_def_t definition " Nathan Froyd
` (3 subsequent siblings)
4 siblings, 1 reply; 13+ messages in thread
From: Nathan Froyd @ 2009-03-28 21:30 UTC (permalink / raw)
To: qemu-devel
It makes more sense to put the flags in the header file than in the
middle of translate.c.
Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
---
target-ppc/cpu.h | 138 ++++++++++++++++++++++++++++++++++++++++++++++++
target-ppc/translate.c | 138 ------------------------------------------------
2 files changed, 138 insertions(+), 138 deletions(-)
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 87b3460..e11af60 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -283,6 +283,144 @@ enum powerpc_input_t {
#define PPC_INPUT(env) (env->bus_model)
/*****************************************************************************/
+/* PowerPC Instructions types definitions */
+enum {
+ PPC_NONE = 0x0000000000000000ULL,
+ /* PowerPC base instructions set */
+ PPC_INSNS_BASE = 0x0000000000000001ULL,
+ /* integer operations instructions */
+#define PPC_INTEGER PPC_INSNS_BASE
+ /* flow control instructions */
+#define PPC_FLOW PPC_INSNS_BASE
+ /* virtual memory instructions */
+#define PPC_MEM PPC_INSNS_BASE
+ /* ld/st with reservation instructions */
+#define PPC_RES PPC_INSNS_BASE
+ /* spr/msr access instructions */
+#define PPC_MISC PPC_INSNS_BASE
+ /* Deprecated instruction sets */
+ /* Original POWER instruction set */
+ PPC_POWER = 0x0000000000000002ULL,
+ /* POWER2 instruction set extension */
+ PPC_POWER2 = 0x0000000000000004ULL,
+ /* Power RTC support */
+ PPC_POWER_RTC = 0x0000000000000008ULL,
+ /* Power-to-PowerPC bridge (601) */
+ PPC_POWER_BR = 0x0000000000000010ULL,
+ /* 64 bits PowerPC instruction set */
+ PPC_64B = 0x0000000000000020ULL,
+ /* New 64 bits extensions (PowerPC 2.0x) */
+ PPC_64BX = 0x0000000000000040ULL,
+ /* 64 bits hypervisor extensions */
+ PPC_64H = 0x0000000000000080ULL,
+ /* New wait instruction (PowerPC 2.0x) */
+ PPC_WAIT = 0x0000000000000100ULL,
+ /* Time base mftb instruction */
+ PPC_MFTB = 0x0000000000000200ULL,
+
+ /* Fixed-point unit extensions */
+ /* PowerPC 602 specific */
+ PPC_602_SPEC = 0x0000000000000400ULL,
+ /* isel instruction */
+ PPC_ISEL = 0x0000000000000800ULL,
+ /* popcntb instruction */
+ PPC_POPCNTB = 0x0000000000001000ULL,
+ /* string load / store */
+ PPC_STRING = 0x0000000000002000ULL,
+
+ /* Floating-point unit extensions */
+ /* Optional floating point instructions */
+ PPC_FLOAT = 0x0000000000010000ULL,
+ /* New floating-point extensions (PowerPC 2.0x) */
+ PPC_FLOAT_EXT = 0x0000000000020000ULL,
+ PPC_FLOAT_FSQRT = 0x0000000000040000ULL,
+ PPC_FLOAT_FRES = 0x0000000000080000ULL,
+ PPC_FLOAT_FRSQRTE = 0x0000000000100000ULL,
+ PPC_FLOAT_FRSQRTES = 0x0000000000200000ULL,
+ PPC_FLOAT_FSEL = 0x0000000000400000ULL,
+ PPC_FLOAT_STFIWX = 0x0000000000800000ULL,
+
+ /* Vector/SIMD extensions */
+ /* Altivec support */
+ PPC_ALTIVEC = 0x0000000001000000ULL,
+ /* PowerPC 2.03 SPE extension */
+ PPC_SPE = 0x0000000002000000ULL,
+ /* PowerPC 2.03 SPE single-precision floating-point extension */
+ PPC_SPE_SINGLE = 0x0000000004000000ULL,
+ /* PowerPC 2.03 SPE double-precision floating-point extension */
+ PPC_SPE_DOUBLE = 0x0000000008000000ULL,
+
+ /* Optional memory control instructions */
+ PPC_MEM_TLBIA = 0x0000000010000000ULL,
+ PPC_MEM_TLBIE = 0x0000000020000000ULL,
+ PPC_MEM_TLBSYNC = 0x0000000040000000ULL,
+ /* sync instruction */
+ PPC_MEM_SYNC = 0x0000000080000000ULL,
+ /* eieio instruction */
+ PPC_MEM_EIEIO = 0x0000000100000000ULL,
+
+ /* Cache control instructions */
+ PPC_CACHE = 0x0000000200000000ULL,
+ /* icbi instruction */
+ PPC_CACHE_ICBI = 0x0000000400000000ULL,
+ /* dcbz instruction with fixed cache line size */
+ PPC_CACHE_DCBZ = 0x0000000800000000ULL,
+ /* dcbz instruction with tunable cache line size */
+ PPC_CACHE_DCBZT = 0x0000001000000000ULL,
+ /* dcba instruction */
+ PPC_CACHE_DCBA = 0x0000002000000000ULL,
+ /* Freescale cache locking instructions */
+ PPC_CACHE_LOCK = 0x0000004000000000ULL,
+
+ /* MMU related extensions */
+ /* external control instructions */
+ PPC_EXTERN = 0x0000010000000000ULL,
+ /* segment register access instructions */
+ PPC_SEGMENT = 0x0000020000000000ULL,
+ /* PowerPC 6xx TLB management instructions */
+ PPC_6xx_TLB = 0x0000040000000000ULL,
+ /* PowerPC 74xx TLB management instructions */
+ PPC_74xx_TLB = 0x0000080000000000ULL,
+ /* PowerPC 40x TLB management instructions */
+ PPC_40x_TLB = 0x0000100000000000ULL,
+ /* segment register access instructions for PowerPC 64 "bridge" */
+ PPC_SEGMENT_64B = 0x0000200000000000ULL,
+ /* SLB management */
+ PPC_SLBI = 0x0000400000000000ULL,
+
+ /* Embedded PowerPC dedicated instructions */
+ PPC_WRTEE = 0x0001000000000000ULL,
+ /* PowerPC 40x exception model */
+ PPC_40x_EXCP = 0x0002000000000000ULL,
+ /* PowerPC 405 Mac instructions */
+ PPC_405_MAC = 0x0004000000000000ULL,
+ /* PowerPC 440 specific instructions */
+ PPC_440_SPEC = 0x0008000000000000ULL,
+ /* BookE (embedded) PowerPC specification */
+ PPC_BOOKE = 0x0010000000000000ULL,
+ /* mfapidi instruction */
+ PPC_MFAPIDI = 0x0020000000000000ULL,
+ /* tlbiva instruction */
+ PPC_TLBIVA = 0x0040000000000000ULL,
+ /* tlbivax instruction */
+ PPC_TLBIVAX = 0x0080000000000000ULL,
+ /* PowerPC 4xx dedicated instructions */
+ PPC_4xx_COMMON = 0x0100000000000000ULL,
+ /* PowerPC 40x ibct instructions */
+ PPC_40x_ICBT = 0x0200000000000000ULL,
+ /* rfmci is not implemented in all BookE PowerPC */
+ PPC_RFMCI = 0x0400000000000000ULL,
+ /* rfdi instruction */
+ PPC_RFDI = 0x0800000000000000ULL,
+ /* DCR accesses */
+ PPC_DCR = 0x1000000000000000ULL,
+ /* DCR extended accesse */
+ PPC_DCRX = 0x2000000000000000ULL,
+ /* user-mode DCR access, implemented in PowerPC 460 */
+ PPC_DCRUX = 0x4000000000000000ULL,
+};
+
+/*****************************************************************************/
typedef struct ppc_def_t ppc_def_t;
typedef struct opc_handler_t opc_handler_t;
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 952ee99..b89d42f 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -450,144 +450,6 @@ static always_inline target_ulong MASK (uint32_t start, uint32_t end)
}
/*****************************************************************************/
-/* PowerPC Instructions types definitions */
-enum {
- PPC_NONE = 0x0000000000000000ULL,
- /* PowerPC base instructions set */
- PPC_INSNS_BASE = 0x0000000000000001ULL,
- /* integer operations instructions */
-#define PPC_INTEGER PPC_INSNS_BASE
- /* flow control instructions */
-#define PPC_FLOW PPC_INSNS_BASE
- /* virtual memory instructions */
-#define PPC_MEM PPC_INSNS_BASE
- /* ld/st with reservation instructions */
-#define PPC_RES PPC_INSNS_BASE
- /* spr/msr access instructions */
-#define PPC_MISC PPC_INSNS_BASE
- /* Deprecated instruction sets */
- /* Original POWER instruction set */
- PPC_POWER = 0x0000000000000002ULL,
- /* POWER2 instruction set extension */
- PPC_POWER2 = 0x0000000000000004ULL,
- /* Power RTC support */
- PPC_POWER_RTC = 0x0000000000000008ULL,
- /* Power-to-PowerPC bridge (601) */
- PPC_POWER_BR = 0x0000000000000010ULL,
- /* 64 bits PowerPC instruction set */
- PPC_64B = 0x0000000000000020ULL,
- /* New 64 bits extensions (PowerPC 2.0x) */
- PPC_64BX = 0x0000000000000040ULL,
- /* 64 bits hypervisor extensions */
- PPC_64H = 0x0000000000000080ULL,
- /* New wait instruction (PowerPC 2.0x) */
- PPC_WAIT = 0x0000000000000100ULL,
- /* Time base mftb instruction */
- PPC_MFTB = 0x0000000000000200ULL,
-
- /* Fixed-point unit extensions */
- /* PowerPC 602 specific */
- PPC_602_SPEC = 0x0000000000000400ULL,
- /* isel instruction */
- PPC_ISEL = 0x0000000000000800ULL,
- /* popcntb instruction */
- PPC_POPCNTB = 0x0000000000001000ULL,
- /* string load / store */
- PPC_STRING = 0x0000000000002000ULL,
-
- /* Floating-point unit extensions */
- /* Optional floating point instructions */
- PPC_FLOAT = 0x0000000000010000ULL,
- /* New floating-point extensions (PowerPC 2.0x) */
- PPC_FLOAT_EXT = 0x0000000000020000ULL,
- PPC_FLOAT_FSQRT = 0x0000000000040000ULL,
- PPC_FLOAT_FRES = 0x0000000000080000ULL,
- PPC_FLOAT_FRSQRTE = 0x0000000000100000ULL,
- PPC_FLOAT_FRSQRTES = 0x0000000000200000ULL,
- PPC_FLOAT_FSEL = 0x0000000000400000ULL,
- PPC_FLOAT_STFIWX = 0x0000000000800000ULL,
-
- /* Vector/SIMD extensions */
- /* Altivec support */
- PPC_ALTIVEC = 0x0000000001000000ULL,
- /* PowerPC 2.03 SPE extension */
- PPC_SPE = 0x0000000002000000ULL,
- /* PowerPC 2.03 SPE single-precision floating-point extension */
- PPC_SPE_SINGLE = 0x0000000004000000ULL,
- /* PowerPC 2.03 SPE double-precision floating-point extension */
- PPC_SPE_DOUBLE = 0x0000000008000000ULL,
-
- /* Optional memory control instructions */
- PPC_MEM_TLBIA = 0x0000000010000000ULL,
- PPC_MEM_TLBIE = 0x0000000020000000ULL,
- PPC_MEM_TLBSYNC = 0x0000000040000000ULL,
- /* sync instruction */
- PPC_MEM_SYNC = 0x0000000080000000ULL,
- /* eieio instruction */
- PPC_MEM_EIEIO = 0x0000000100000000ULL,
-
- /* Cache control instructions */
- PPC_CACHE = 0x0000000200000000ULL,
- /* icbi instruction */
- PPC_CACHE_ICBI = 0x0000000400000000ULL,
- /* dcbz instruction with fixed cache line size */
- PPC_CACHE_DCBZ = 0x0000000800000000ULL,
- /* dcbz instruction with tunable cache line size */
- PPC_CACHE_DCBZT = 0x0000001000000000ULL,
- /* dcba instruction */
- PPC_CACHE_DCBA = 0x0000002000000000ULL,
- /* Freescale cache locking instructions */
- PPC_CACHE_LOCK = 0x0000004000000000ULL,
-
- /* MMU related extensions */
- /* external control instructions */
- PPC_EXTERN = 0x0000010000000000ULL,
- /* segment register access instructions */
- PPC_SEGMENT = 0x0000020000000000ULL,
- /* PowerPC 6xx TLB management instructions */
- PPC_6xx_TLB = 0x0000040000000000ULL,
- /* PowerPC 74xx TLB management instructions */
- PPC_74xx_TLB = 0x0000080000000000ULL,
- /* PowerPC 40x TLB management instructions */
- PPC_40x_TLB = 0x0000100000000000ULL,
- /* segment register access instructions for PowerPC 64 "bridge" */
- PPC_SEGMENT_64B = 0x0000200000000000ULL,
- /* SLB management */
- PPC_SLBI = 0x0000400000000000ULL,
-
- /* Embedded PowerPC dedicated instructions */
- PPC_WRTEE = 0x0001000000000000ULL,
- /* PowerPC 40x exception model */
- PPC_40x_EXCP = 0x0002000000000000ULL,
- /* PowerPC 405 Mac instructions */
- PPC_405_MAC = 0x0004000000000000ULL,
- /* PowerPC 440 specific instructions */
- PPC_440_SPEC = 0x0008000000000000ULL,
- /* BookE (embedded) PowerPC specification */
- PPC_BOOKE = 0x0010000000000000ULL,
- /* mfapidi instruction */
- PPC_MFAPIDI = 0x0020000000000000ULL,
- /* tlbiva instruction */
- PPC_TLBIVA = 0x0040000000000000ULL,
- /* tlbivax instruction */
- PPC_TLBIVAX = 0x0080000000000000ULL,
- /* PowerPC 4xx dedicated instructions */
- PPC_4xx_COMMON = 0x0100000000000000ULL,
- /* PowerPC 40x ibct instructions */
- PPC_40x_ICBT = 0x0200000000000000ULL,
- /* rfmci is not implemented in all BookE PowerPC */
- PPC_RFMCI = 0x0400000000000000ULL,
- /* rfdi instruction */
- PPC_RFDI = 0x0800000000000000ULL,
- /* DCR accesses */
- PPC_DCR = 0x1000000000000000ULL,
- /* DCR extended accesse */
- PPC_DCRX = 0x2000000000000000ULL,
- /* user-mode DCR access, implemented in PowerPC 460 */
- PPC_DCRUX = 0x4000000000000000ULL,
-};
-
-/*****************************************************************************/
/* PowerPC instructions table */
#if HOST_LONG_BITS == 64
#define OPC_ALIGN 8
--
1.6.0.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 2/4] move ppc_def_t definition to cpu.h
2009-03-28 21:30 [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h Nathan Froyd
@ 2009-03-28 21:30 ` Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 3/4] pass the cpu definition to ppc_translate_init Nathan Froyd
` (2 subsequent siblings)
4 siblings, 0 replies; 13+ messages in thread
From: Nathan Froyd @ 2009-03-28 21:30 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
---
target-ppc/cpu.h | 15 +++++++++++++++
target-ppc/translate_init.c | 15 ---------------
2 files changed, 15 insertions(+), 15 deletions(-)
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index e11af60..52180b6 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -433,6 +433,21 @@ typedef struct ppc_dcr_t ppc_dcr_t;
typedef union ppc_avr_t ppc_avr_t;
typedef union ppc_tlb_t ppc_tlb_t;
+struct ppc_def_t {
+ const char *name;
+ uint32_t pvr;
+ uint32_t svr;
+ uint64_t insns_flags;
+ uint64_t msr_mask;
+ powerpc_mmu_t mmu_model;
+ powerpc_excp_t excp_model;
+ powerpc_input_t bus_model;
+ uint32_t flags;
+ int bfd_mach;
+ void (*init_proc)(CPUPPCState *env);
+ int (*check_pow)(CPUPPCState *env);
+};
+
/* SPR access micro-ops generations callbacks */
struct ppc_spr_t {
void (*uea_read)(void *opaque, int gpr_num, int spr_num);
diff --git a/target-ppc/translate_init.c b/target-ppc/translate_init.c
index 56d8d93..3593192 100644
--- a/target-ppc/translate_init.c
+++ b/target-ppc/translate_init.c
@@ -33,21 +33,6 @@
#define TODO_USER_ONLY 1
#endif
-struct ppc_def_t {
- const char *name;
- uint32_t pvr;
- uint32_t svr;
- uint64_t insns_flags;
- uint64_t msr_mask;
- powerpc_mmu_t mmu_model;
- powerpc_excp_t excp_model;
- powerpc_input_t bus_model;
- uint32_t flags;
- int bfd_mach;
- void (*init_proc)(CPUPPCState *env);
- int (*check_pow)(CPUPPCState *env);
-};
-
/* For user-mode emulation, we don't emulate any IRQ controller */
#if defined(CONFIG_USER_ONLY)
#define PPC_IRQ_INIT_FN(name) \
--
1.6.0.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 3/4] pass the cpu definition to ppc_translate_init
2009-03-28 21:30 [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 2/4] move ppc_def_t definition " Nathan Froyd
@ 2009-03-28 21:30 ` Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 4/4] create TCG slots for registers based on CPU Nathan Froyd
2009-03-28 22:54 ` [Qemu-devel] [PATCH 0/4] target-ppc: " Aurelien Jarno
4 siblings, 0 replies; 13+ messages in thread
From: Nathan Froyd @ 2009-03-28 21:30 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
---
target-ppc/cpu.h | 2 +-
target-ppc/helper.c | 2 +-
target-ppc/translate.c | 2 +-
3 files changed, 3 insertions(+), 3 deletions(-)
diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
index 52180b6..dcd087b 100644
--- a/target-ppc/cpu.h
+++ b/target-ppc/cpu.h
@@ -846,7 +846,7 @@ struct mmu_ctx_t {
/*****************************************************************************/
CPUPPCState *cpu_ppc_init (const char *cpu_model);
-void ppc_translate_init(void);
+void ppc_translate_init(const ppc_def_t *def);
int cpu_ppc_exec (CPUPPCState *s);
void cpu_ppc_close (CPUPPCState *s);
/* you can call this signal handler from your SIGBUS and SIGSEGV
diff --git a/target-ppc/helper.c b/target-ppc/helper.c
index 80b53eb..d9c4742 100644
--- a/target-ppc/helper.c
+++ b/target-ppc/helper.c
@@ -2824,7 +2824,7 @@ CPUPPCState *cpu_ppc_init (const char *cpu_model)
env = qemu_mallocz(sizeof(CPUPPCState));
cpu_exec_init(env);
- ppc_translate_init();
+ ppc_translate_init(def);
env->cpu_model_str = cpu_model;
cpu_ppc_register_internal(env, def);
cpu_ppc_reset(env);
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index b89d42f..412c8d0 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -78,7 +78,7 @@ static TCGv_i32 cpu_access_type;
#include "gen-icount.h"
-void ppc_translate_init(void)
+void ppc_translate_init(const ppc_def_t *def)
{
int i;
char* p;
--
1.6.0.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* [Qemu-devel] [PATCH 4/4] create TCG slots for registers based on CPU
2009-03-28 21:30 [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU Nathan Froyd
` (2 preceding siblings ...)
2009-03-28 21:30 ` [Qemu-devel] [PATCH 3/4] pass the cpu definition to ppc_translate_init Nathan Froyd
@ 2009-03-28 21:30 ` Nathan Froyd
2009-03-28 22:54 ` [Qemu-devel] [PATCH 0/4] target-ppc: " Aurelien Jarno
4 siblings, 0 replies; 13+ messages in thread
From: Nathan Froyd @ 2009-03-28 21:30 UTC (permalink / raw)
To: qemu-devel
There's no point in creating floating-point registers if the chip we're
emulating doesn't have them. Likewise for Altivec and SPE registers.
Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
---
target-ppc/translate.c | 52 +++++++++++++++++++++++++++--------------------
1 files changed, 30 insertions(+), 22 deletions(-)
diff --git a/target-ppc/translate.c b/target-ppc/translate.c
index 412c8d0..8170718 100644
--- a/target-ppc/translate.c
+++ b/target-ppc/translate.c
@@ -104,36 +104,42 @@ void ppc_translate_init(const ppc_def_t *def)
offsetof(CPUState, gpr[i]), p);
p += (i < 10) ? 3 : 4;
#if !defined(TARGET_PPC64)
- sprintf(p, "r%dH", i);
- cpu_gprh[i] = tcg_global_mem_new_i32(TCG_AREG0,
- offsetof(CPUState, gprh[i]), p);
- p += (i < 10) ? 4 : 5;
+ if (def->insns_flags & PPC_SPE) {
+ sprintf(p, "r%dH", i);
+ cpu_gprh[i] = tcg_global_mem_new_i32(TCG_AREG0,
+ offsetof(CPUState, gprh[i]), p);
+ p += (i < 10) ? 4 : 5;
+ }
#endif
- sprintf(p, "fp%d", i);
- cpu_fpr[i] = tcg_global_mem_new_i64(TCG_AREG0,
- offsetof(CPUState, fpr[i]), p);
- p += (i < 10) ? 4 : 5;
+ if (def->insns_flags & PPC_FLOAT) {
+ sprintf(p, "fp%d", i);
+ cpu_fpr[i] = tcg_global_mem_new_i64(TCG_AREG0,
+ offsetof(CPUState, fpr[i]), p);
+ p += (i < 10) ? 4 : 5;
+ }
- sprintf(p, "avr%dH", i);
+ if (def->insns_flags & PPC_ALTIVEC) {
+ sprintf(p, "avr%dH", i);
#ifdef WORDS_BIGENDIAN
- cpu_avrh[i] = tcg_global_mem_new_i64(TCG_AREG0,
- offsetof(CPUState, avr[i].u64[0]), p);
+ cpu_avrh[i] = tcg_global_mem_new_i64(TCG_AREG0,
+ offsetof(CPUState, avr[i].u64[0]), p);
#else
- cpu_avrh[i] = tcg_global_mem_new_i64(TCG_AREG0,
- offsetof(CPUState, avr[i].u64[1]), p);
+ cpu_avrh[i] = tcg_global_mem_new_i64(TCG_AREG0,
+ offsetof(CPUState, avr[i].u64[1]), p);
#endif
- p += (i < 10) ? 6 : 7;
+ p += (i < 10) ? 6 : 7;
- sprintf(p, "avr%dL", i);
+ sprintf(p, "avr%dL", i);
#ifdef WORDS_BIGENDIAN
- cpu_avrl[i] = tcg_global_mem_new_i64(TCG_AREG0,
- offsetof(CPUState, avr[i].u64[1]), p);
+ cpu_avrl[i] = tcg_global_mem_new_i64(TCG_AREG0,
+ offsetof(CPUState, avr[i].u64[1]), p);
#else
- cpu_avrl[i] = tcg_global_mem_new_i64(TCG_AREG0,
- offsetof(CPUState, avr[i].u64[0]), p);
+ cpu_avrl[i] = tcg_global_mem_new_i64(TCG_AREG0,
+ offsetof(CPUState, avr[i].u64[0]), p);
#endif
- p += (i < 10) ? 6 : 7;
+ p += (i < 10) ? 6 : 7;
+ }
}
cpu_nip = tcg_global_mem_new(TCG_AREG0,
@@ -154,8 +160,10 @@ void ppc_translate_init(const ppc_def_t *def)
cpu_reserve = tcg_global_mem_new(TCG_AREG0,
offsetof(CPUState, reserve), "reserve");
- cpu_fpscr = tcg_global_mem_new_i32(TCG_AREG0,
- offsetof(CPUState, fpscr), "fpscr");
+ if (def->insns_flags & PPC_FLOAT) {
+ cpu_fpscr = tcg_global_mem_new_i32(TCG_AREG0,
+ offsetof(CPUState, fpscr), "fpscr");
+ }
cpu_access_type = tcg_global_mem_new_i32(TCG_AREG0,
offsetof(CPUState, access_type), "access_type");
--
1.6.0.5
^ permalink raw reply related [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
2009-03-28 21:30 [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU Nathan Froyd
` (3 preceding siblings ...)
2009-03-28 21:30 ` [Qemu-devel] [PATCH 4/4] create TCG slots for registers based on CPU Nathan Froyd
@ 2009-03-28 22:54 ` Aurelien Jarno
2009-03-29 0:18 ` Nathan Froyd
4 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2009-03-28 22:54 UTC (permalink / raw)
To: qemu-devel
On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote:
> For PPC guests, I noticed that we create TCG slots for all the potential
> kinds of registers (float, Altivec, SPE), even if the chip doesn't have
> instructions to access those registers.
>
> This patch series tweaks the initialization routine to create the TCG
> values for registers necessary for particular classes of instructions
> only if the emulated chip supports those instructions. The first couple
> of patches are simply busywork of moving things around; the last patch
> is where all the action is at.
>
> I am not a TCG expert, but there are several loops in TCG over all
> globals and it seems like those loops would go faster if they didn't
> have to consider registers that would never be touched. If this patch
> series makes no difference in TCG's performance, then I'd be glad to
> have an explanation of why that's the case.
Do you actually have run a benchmark with those changes? TCG is
sometimes a bit strange, and some optimizations does not change the
execution speed, while others improve it a lot. It is very difficult to
predict what will give a gain or not.
Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation
or a compilation in system emulation.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h
2009-03-28 21:30 ` [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h Nathan Froyd
@ 2009-03-28 22:58 ` Aurelien Jarno
2009-03-28 23:07 ` Nathan Froyd
0 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2009-03-28 22:58 UTC (permalink / raw)
To: qemu-devel
On Sat, Mar 28, 2009 at 02:30:14PM -0700, Nathan Froyd wrote:
> It makes more sense to put the flags in the header file than in the
> middle of translate.c.
If those definition are not used outside of translate.c, it makes no
sense to move them out to a header file included at multiple places.
OTOH, I agree that having them in the middle of translate.c is a bit
strange. Maybe we should move them at the top of the file along with
other definitions? In that case, GEN_HANDLER* and EXTRACT_HELPER are
also good candidates for the move.
> Signed-off-by: Nathan Froyd <froydnj@codesourcery.com>
> ---
> target-ppc/cpu.h | 138 ++++++++++++++++++++++++++++++++++++++++++++++++
> target-ppc/translate.c | 138 ------------------------------------------------
> 2 files changed, 138 insertions(+), 138 deletions(-)
>
> diff --git a/target-ppc/cpu.h b/target-ppc/cpu.h
> index 87b3460..e11af60 100644
> --- a/target-ppc/cpu.h
> +++ b/target-ppc/cpu.h
> @@ -283,6 +283,144 @@ enum powerpc_input_t {
> #define PPC_INPUT(env) (env->bus_model)
>
> /*****************************************************************************/
> +/* PowerPC Instructions types definitions */
> +enum {
> + PPC_NONE = 0x0000000000000000ULL,
> + /* PowerPC base instructions set */
> + PPC_INSNS_BASE = 0x0000000000000001ULL,
> + /* integer operations instructions */
> +#define PPC_INTEGER PPC_INSNS_BASE
> + /* flow control instructions */
> +#define PPC_FLOW PPC_INSNS_BASE
> + /* virtual memory instructions */
> +#define PPC_MEM PPC_INSNS_BASE
> + /* ld/st with reservation instructions */
> +#define PPC_RES PPC_INSNS_BASE
> + /* spr/msr access instructions */
> +#define PPC_MISC PPC_INSNS_BASE
> + /* Deprecated instruction sets */
> + /* Original POWER instruction set */
> + PPC_POWER = 0x0000000000000002ULL,
> + /* POWER2 instruction set extension */
> + PPC_POWER2 = 0x0000000000000004ULL,
> + /* Power RTC support */
> + PPC_POWER_RTC = 0x0000000000000008ULL,
> + /* Power-to-PowerPC bridge (601) */
> + PPC_POWER_BR = 0x0000000000000010ULL,
> + /* 64 bits PowerPC instruction set */
> + PPC_64B = 0x0000000000000020ULL,
> + /* New 64 bits extensions (PowerPC 2.0x) */
> + PPC_64BX = 0x0000000000000040ULL,
> + /* 64 bits hypervisor extensions */
> + PPC_64H = 0x0000000000000080ULL,
> + /* New wait instruction (PowerPC 2.0x) */
> + PPC_WAIT = 0x0000000000000100ULL,
> + /* Time base mftb instruction */
> + PPC_MFTB = 0x0000000000000200ULL,
> +
> + /* Fixed-point unit extensions */
> + /* PowerPC 602 specific */
> + PPC_602_SPEC = 0x0000000000000400ULL,
> + /* isel instruction */
> + PPC_ISEL = 0x0000000000000800ULL,
> + /* popcntb instruction */
> + PPC_POPCNTB = 0x0000000000001000ULL,
> + /* string load / store */
> + PPC_STRING = 0x0000000000002000ULL,
> +
> + /* Floating-point unit extensions */
> + /* Optional floating point instructions */
> + PPC_FLOAT = 0x0000000000010000ULL,
> + /* New floating-point extensions (PowerPC 2.0x) */
> + PPC_FLOAT_EXT = 0x0000000000020000ULL,
> + PPC_FLOAT_FSQRT = 0x0000000000040000ULL,
> + PPC_FLOAT_FRES = 0x0000000000080000ULL,
> + PPC_FLOAT_FRSQRTE = 0x0000000000100000ULL,
> + PPC_FLOAT_FRSQRTES = 0x0000000000200000ULL,
> + PPC_FLOAT_FSEL = 0x0000000000400000ULL,
> + PPC_FLOAT_STFIWX = 0x0000000000800000ULL,
> +
> + /* Vector/SIMD extensions */
> + /* Altivec support */
> + PPC_ALTIVEC = 0x0000000001000000ULL,
> + /* PowerPC 2.03 SPE extension */
> + PPC_SPE = 0x0000000002000000ULL,
> + /* PowerPC 2.03 SPE single-precision floating-point extension */
> + PPC_SPE_SINGLE = 0x0000000004000000ULL,
> + /* PowerPC 2.03 SPE double-precision floating-point extension */
> + PPC_SPE_DOUBLE = 0x0000000008000000ULL,
> +
> + /* Optional memory control instructions */
> + PPC_MEM_TLBIA = 0x0000000010000000ULL,
> + PPC_MEM_TLBIE = 0x0000000020000000ULL,
> + PPC_MEM_TLBSYNC = 0x0000000040000000ULL,
> + /* sync instruction */
> + PPC_MEM_SYNC = 0x0000000080000000ULL,
> + /* eieio instruction */
> + PPC_MEM_EIEIO = 0x0000000100000000ULL,
> +
> + /* Cache control instructions */
> + PPC_CACHE = 0x0000000200000000ULL,
> + /* icbi instruction */
> + PPC_CACHE_ICBI = 0x0000000400000000ULL,
> + /* dcbz instruction with fixed cache line size */
> + PPC_CACHE_DCBZ = 0x0000000800000000ULL,
> + /* dcbz instruction with tunable cache line size */
> + PPC_CACHE_DCBZT = 0x0000001000000000ULL,
> + /* dcba instruction */
> + PPC_CACHE_DCBA = 0x0000002000000000ULL,
> + /* Freescale cache locking instructions */
> + PPC_CACHE_LOCK = 0x0000004000000000ULL,
> +
> + /* MMU related extensions */
> + /* external control instructions */
> + PPC_EXTERN = 0x0000010000000000ULL,
> + /* segment register access instructions */
> + PPC_SEGMENT = 0x0000020000000000ULL,
> + /* PowerPC 6xx TLB management instructions */
> + PPC_6xx_TLB = 0x0000040000000000ULL,
> + /* PowerPC 74xx TLB management instructions */
> + PPC_74xx_TLB = 0x0000080000000000ULL,
> + /* PowerPC 40x TLB management instructions */
> + PPC_40x_TLB = 0x0000100000000000ULL,
> + /* segment register access instructions for PowerPC 64 "bridge" */
> + PPC_SEGMENT_64B = 0x0000200000000000ULL,
> + /* SLB management */
> + PPC_SLBI = 0x0000400000000000ULL,
> +
> + /* Embedded PowerPC dedicated instructions */
> + PPC_WRTEE = 0x0001000000000000ULL,
> + /* PowerPC 40x exception model */
> + PPC_40x_EXCP = 0x0002000000000000ULL,
> + /* PowerPC 405 Mac instructions */
> + PPC_405_MAC = 0x0004000000000000ULL,
> + /* PowerPC 440 specific instructions */
> + PPC_440_SPEC = 0x0008000000000000ULL,
> + /* BookE (embedded) PowerPC specification */
> + PPC_BOOKE = 0x0010000000000000ULL,
> + /* mfapidi instruction */
> + PPC_MFAPIDI = 0x0020000000000000ULL,
> + /* tlbiva instruction */
> + PPC_TLBIVA = 0x0040000000000000ULL,
> + /* tlbivax instruction */
> + PPC_TLBIVAX = 0x0080000000000000ULL,
> + /* PowerPC 4xx dedicated instructions */
> + PPC_4xx_COMMON = 0x0100000000000000ULL,
> + /* PowerPC 40x ibct instructions */
> + PPC_40x_ICBT = 0x0200000000000000ULL,
> + /* rfmci is not implemented in all BookE PowerPC */
> + PPC_RFMCI = 0x0400000000000000ULL,
> + /* rfdi instruction */
> + PPC_RFDI = 0x0800000000000000ULL,
> + /* DCR accesses */
> + PPC_DCR = 0x1000000000000000ULL,
> + /* DCR extended accesse */
> + PPC_DCRX = 0x2000000000000000ULL,
> + /* user-mode DCR access, implemented in PowerPC 460 */
> + PPC_DCRUX = 0x4000000000000000ULL,
> +};
> +
> +/*****************************************************************************/
> typedef struct ppc_def_t ppc_def_t;
> typedef struct opc_handler_t opc_handler_t;
>
> diff --git a/target-ppc/translate.c b/target-ppc/translate.c
> index 952ee99..b89d42f 100644
> --- a/target-ppc/translate.c
> +++ b/target-ppc/translate.c
> @@ -450,144 +450,6 @@ static always_inline target_ulong MASK (uint32_t start, uint32_t end)
> }
>
> /*****************************************************************************/
> -/* PowerPC Instructions types definitions */
> -enum {
> - PPC_NONE = 0x0000000000000000ULL,
> - /* PowerPC base instructions set */
> - PPC_INSNS_BASE = 0x0000000000000001ULL,
> - /* integer operations instructions */
> -#define PPC_INTEGER PPC_INSNS_BASE
> - /* flow control instructions */
> -#define PPC_FLOW PPC_INSNS_BASE
> - /* virtual memory instructions */
> -#define PPC_MEM PPC_INSNS_BASE
> - /* ld/st with reservation instructions */
> -#define PPC_RES PPC_INSNS_BASE
> - /* spr/msr access instructions */
> -#define PPC_MISC PPC_INSNS_BASE
> - /* Deprecated instruction sets */
> - /* Original POWER instruction set */
> - PPC_POWER = 0x0000000000000002ULL,
> - /* POWER2 instruction set extension */
> - PPC_POWER2 = 0x0000000000000004ULL,
> - /* Power RTC support */
> - PPC_POWER_RTC = 0x0000000000000008ULL,
> - /* Power-to-PowerPC bridge (601) */
> - PPC_POWER_BR = 0x0000000000000010ULL,
> - /* 64 bits PowerPC instruction set */
> - PPC_64B = 0x0000000000000020ULL,
> - /* New 64 bits extensions (PowerPC 2.0x) */
> - PPC_64BX = 0x0000000000000040ULL,
> - /* 64 bits hypervisor extensions */
> - PPC_64H = 0x0000000000000080ULL,
> - /* New wait instruction (PowerPC 2.0x) */
> - PPC_WAIT = 0x0000000000000100ULL,
> - /* Time base mftb instruction */
> - PPC_MFTB = 0x0000000000000200ULL,
> -
> - /* Fixed-point unit extensions */
> - /* PowerPC 602 specific */
> - PPC_602_SPEC = 0x0000000000000400ULL,
> - /* isel instruction */
> - PPC_ISEL = 0x0000000000000800ULL,
> - /* popcntb instruction */
> - PPC_POPCNTB = 0x0000000000001000ULL,
> - /* string load / store */
> - PPC_STRING = 0x0000000000002000ULL,
> -
> - /* Floating-point unit extensions */
> - /* Optional floating point instructions */
> - PPC_FLOAT = 0x0000000000010000ULL,
> - /* New floating-point extensions (PowerPC 2.0x) */
> - PPC_FLOAT_EXT = 0x0000000000020000ULL,
> - PPC_FLOAT_FSQRT = 0x0000000000040000ULL,
> - PPC_FLOAT_FRES = 0x0000000000080000ULL,
> - PPC_FLOAT_FRSQRTE = 0x0000000000100000ULL,
> - PPC_FLOAT_FRSQRTES = 0x0000000000200000ULL,
> - PPC_FLOAT_FSEL = 0x0000000000400000ULL,
> - PPC_FLOAT_STFIWX = 0x0000000000800000ULL,
> -
> - /* Vector/SIMD extensions */
> - /* Altivec support */
> - PPC_ALTIVEC = 0x0000000001000000ULL,
> - /* PowerPC 2.03 SPE extension */
> - PPC_SPE = 0x0000000002000000ULL,
> - /* PowerPC 2.03 SPE single-precision floating-point extension */
> - PPC_SPE_SINGLE = 0x0000000004000000ULL,
> - /* PowerPC 2.03 SPE double-precision floating-point extension */
> - PPC_SPE_DOUBLE = 0x0000000008000000ULL,
> -
> - /* Optional memory control instructions */
> - PPC_MEM_TLBIA = 0x0000000010000000ULL,
> - PPC_MEM_TLBIE = 0x0000000020000000ULL,
> - PPC_MEM_TLBSYNC = 0x0000000040000000ULL,
> - /* sync instruction */
> - PPC_MEM_SYNC = 0x0000000080000000ULL,
> - /* eieio instruction */
> - PPC_MEM_EIEIO = 0x0000000100000000ULL,
> -
> - /* Cache control instructions */
> - PPC_CACHE = 0x0000000200000000ULL,
> - /* icbi instruction */
> - PPC_CACHE_ICBI = 0x0000000400000000ULL,
> - /* dcbz instruction with fixed cache line size */
> - PPC_CACHE_DCBZ = 0x0000000800000000ULL,
> - /* dcbz instruction with tunable cache line size */
> - PPC_CACHE_DCBZT = 0x0000001000000000ULL,
> - /* dcba instruction */
> - PPC_CACHE_DCBA = 0x0000002000000000ULL,
> - /* Freescale cache locking instructions */
> - PPC_CACHE_LOCK = 0x0000004000000000ULL,
> -
> - /* MMU related extensions */
> - /* external control instructions */
> - PPC_EXTERN = 0x0000010000000000ULL,
> - /* segment register access instructions */
> - PPC_SEGMENT = 0x0000020000000000ULL,
> - /* PowerPC 6xx TLB management instructions */
> - PPC_6xx_TLB = 0x0000040000000000ULL,
> - /* PowerPC 74xx TLB management instructions */
> - PPC_74xx_TLB = 0x0000080000000000ULL,
> - /* PowerPC 40x TLB management instructions */
> - PPC_40x_TLB = 0x0000100000000000ULL,
> - /* segment register access instructions for PowerPC 64 "bridge" */
> - PPC_SEGMENT_64B = 0x0000200000000000ULL,
> - /* SLB management */
> - PPC_SLBI = 0x0000400000000000ULL,
> -
> - /* Embedded PowerPC dedicated instructions */
> - PPC_WRTEE = 0x0001000000000000ULL,
> - /* PowerPC 40x exception model */
> - PPC_40x_EXCP = 0x0002000000000000ULL,
> - /* PowerPC 405 Mac instructions */
> - PPC_405_MAC = 0x0004000000000000ULL,
> - /* PowerPC 440 specific instructions */
> - PPC_440_SPEC = 0x0008000000000000ULL,
> - /* BookE (embedded) PowerPC specification */
> - PPC_BOOKE = 0x0010000000000000ULL,
> - /* mfapidi instruction */
> - PPC_MFAPIDI = 0x0020000000000000ULL,
> - /* tlbiva instruction */
> - PPC_TLBIVA = 0x0040000000000000ULL,
> - /* tlbivax instruction */
> - PPC_TLBIVAX = 0x0080000000000000ULL,
> - /* PowerPC 4xx dedicated instructions */
> - PPC_4xx_COMMON = 0x0100000000000000ULL,
> - /* PowerPC 40x ibct instructions */
> - PPC_40x_ICBT = 0x0200000000000000ULL,
> - /* rfmci is not implemented in all BookE PowerPC */
> - PPC_RFMCI = 0x0400000000000000ULL,
> - /* rfdi instruction */
> - PPC_RFDI = 0x0800000000000000ULL,
> - /* DCR accesses */
> - PPC_DCR = 0x1000000000000000ULL,
> - /* DCR extended accesse */
> - PPC_DCRX = 0x2000000000000000ULL,
> - /* user-mode DCR access, implemented in PowerPC 460 */
> - PPC_DCRUX = 0x4000000000000000ULL,
> -};
> -
> -/*****************************************************************************/
> /* PowerPC instructions table */
> #if HOST_LONG_BITS == 64
> #define OPC_ALIGN 8
> --
> 1.6.0.5
>
>
>
>
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h
2009-03-28 22:58 ` Aurelien Jarno
@ 2009-03-28 23:07 ` Nathan Froyd
2009-03-28 23:14 ` Aurelien Jarno
0 siblings, 1 reply; 13+ messages in thread
From: Nathan Froyd @ 2009-03-28 23:07 UTC (permalink / raw)
To: qemu-devel
On Sat, Mar 28, 2009 at 11:58:05PM +0100, Aurelien Jarno wrote:
> On Sat, Mar 28, 2009 at 02:30:14PM -0700, Nathan Froyd wrote:
> > It makes more sense to put the flags in the header file than in the
> > middle of translate.c.
>
> If those definition are not used outside of translate.c, it makes no
> sense to move them out to a header file included at multiple places.
>
> OTOH, I agree that having them in the middle of translate.c is a bit
> strange. Maybe we should move them at the top of the file along with
> other definitions? In that case, GEN_HANDLER* and EXTRACT_HELPER are
> also good candidates for the move.
Whether they are placed at the top of translate.c or in cpu.h makes no
difference for this patch. I have an implementation of linux-user
signal handling for PPC32 that I'd like to submit soon, though. It
requires knowing what capabilities the CPU supports so it can figure out
what registers to save while setting up the signal frame. (Saving the
floating point registers makes no difference modulo speed, but the save
areas for the Altivec and SPE registers overlap.) So in the interest of
not having to move them twice, I'm lobbying for placing them in cpu.h.
I think many of the GEN_HANDLER macros are localized enough that they
should remain where they are...my kingdom for a MACROLET.
-Nathan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h
2009-03-28 23:07 ` Nathan Froyd
@ 2009-03-28 23:14 ` Aurelien Jarno
0 siblings, 0 replies; 13+ messages in thread
From: Aurelien Jarno @ 2009-03-28 23:14 UTC (permalink / raw)
To: qemu-devel
On Sat, Mar 28, 2009 at 04:07:52PM -0700, Nathan Froyd wrote:
> On Sat, Mar 28, 2009 at 11:58:05PM +0100, Aurelien Jarno wrote:
> > On Sat, Mar 28, 2009 at 02:30:14PM -0700, Nathan Froyd wrote:
> > > It makes more sense to put the flags in the header file than in the
> > > middle of translate.c.
> >
> > If those definition are not used outside of translate.c, it makes no
> > sense to move them out to a header file included at multiple places.
> >
> > OTOH, I agree that having them in the middle of translate.c is a bit
> > strange. Maybe we should move them at the top of the file along with
> > other definitions? In that case, GEN_HANDLER* and EXTRACT_HELPER are
> > also good candidates for the move.
>
> Whether they are placed at the top of translate.c or in cpu.h makes no
> difference for this patch. I have an implementation of linux-user
> signal handling for PPC32 that I'd like to submit soon, though. It
> requires knowing what capabilities the CPU supports so it can figure out
> what registers to save while setting up the signal frame. (Saving the
> floating point registers makes no difference modulo speed, but the save
> areas for the Altivec and SPE registers overlap.) So in the interest of
> not having to move them twice, I'm lobbying for placing them in cpu.h.
>
I see. Then it's probably better to skip this patch in this patch series,
and to add it in the series wrt linux-user changes.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
2009-03-28 22:54 ` [Qemu-devel] [PATCH 0/4] target-ppc: " Aurelien Jarno
@ 2009-03-29 0:18 ` Nathan Froyd
2009-03-29 13:34 ` Aurelien Jarno
0 siblings, 1 reply; 13+ messages in thread
From: Nathan Froyd @ 2009-03-29 0:18 UTC (permalink / raw)
To: qemu-devel
On Sat, Mar 28, 2009 at 11:54:43PM +0100, Aurelien Jarno wrote:
> On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote:
> > I am not a TCG expert, but there are several loops in TCG over all
> > globals and it seems like those loops would go faster if they didn't
> > have to consider registers that would never be touched. If this patch
> > series makes no difference in TCG's performance, then I'd be glad to
> > have an explanation of why that's the case.
>
> Do you actually have run a benchmark with those changes? TCG is
> sometimes a bit strange, and some optimizations does not change the
> execution speed, while others improve it a lot. It is very difficult to
> predict what will give a gain or not.
>
> Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation
> or a compilation in system emulation.
Benchmarking? Pffft. ;)
A benchmarking session with qemu-ppc and bzip2/bunzip2 on ~400MB files
and a 603e emulated CPU suggests that these changes are not terribly
beneficial (maybe 1% improvement, if that). I don't imagine that a
similarly stressful benchmark in system emulation would be much
different. Consider the patch series withdrawn.
-Nathan
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
2009-03-29 0:18 ` Nathan Froyd
@ 2009-03-29 13:34 ` Aurelien Jarno
2009-03-29 14:42 ` Aurelien Jarno
0 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2009-03-29 13:34 UTC (permalink / raw)
To: qemu-devel
On Sat, Mar 28, 2009 at 05:18:34PM -0700, Nathan Froyd wrote:
> On Sat, Mar 28, 2009 at 11:54:43PM +0100, Aurelien Jarno wrote:
> > On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote:
> > > I am not a TCG expert, but there are several loops in TCG over all
> > > globals and it seems like those loops would go faster if they didn't
> > > have to consider registers that would never be touched. If this patch
> > > series makes no difference in TCG's performance, then I'd be glad to
> > > have an explanation of why that's the case.
> >
> > Do you actually have run a benchmark with those changes? TCG is
> > sometimes a bit strange, and some optimizations does not change the
> > execution speed, while others improve it a lot. It is very difficult to
> > predict what will give a gain or not.
> >
> > Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation
> > or a compilation in system emulation.
>
> Benchmarking? Pffft. ;)
>
> A benchmarking session with qemu-ppc and bzip2/bunzip2 on ~400MB files
> and a 603e emulated CPU suggests that these changes are not terribly
> beneficial (maybe 1% improvement, if that). I don't imagine that a
> similarly stressful benchmark in system emulation would be much
> different. Consider the patch series withdrawn.
>
I have done a few profiling on qemu-system-ppc and qemu-system-mips. You
are actually right that the loop on the TCG variables lists takes time.
This is mainly due to the call of save_globals() for TCG functions marked
as TCG_OPF_CALL_CLOBBER.
However it looks like it should be better to address this comment first
before trying to reduce the number of TCG variables:
/* XXX: for load/store we could do that only for the slow path
(i.e. when a memory callback is called) */
However for the PowerPC target, what really kills the performance is the
call to ppc_store_sr(), basically done by the Linux kernel for each
context switch. In the chip the SR register selection is done before the
TLB, while we emulated both the SR and the TLB with the QEMU TLB, this
means we have to do a tlb_flush(env, 1) each time. This is time
expensive, and also kills the performance as it has to be filled again.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
2009-03-29 13:34 ` Aurelien Jarno
@ 2009-03-29 14:42 ` Aurelien Jarno
2009-03-29 14:57 ` Aurelien Jarno
0 siblings, 1 reply; 13+ messages in thread
From: Aurelien Jarno @ 2009-03-29 14:42 UTC (permalink / raw)
To: qemu-devel
On Sun, Mar 29, 2009 at 03:34:53PM +0200, Aurelien Jarno wrote:
> On Sat, Mar 28, 2009 at 05:18:34PM -0700, Nathan Froyd wrote:
> > On Sat, Mar 28, 2009 at 11:54:43PM +0100, Aurelien Jarno wrote:
> > > On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote:
> > > > I am not a TCG expert, but there are several loops in TCG over all
> > > > globals and it seems like those loops would go faster if they didn't
> > > > have to consider registers that would never be touched. If this patch
> > > > series makes no difference in TCG's performance, then I'd be glad to
> > > > have an explanation of why that's the case.
> > >
> > > Do you actually have run a benchmark with those changes? TCG is
> > > sometimes a bit strange, and some optimizations does not change the
> > > execution speed, while others improve it a lot. It is very difficult to
> > > predict what will give a gain or not.
> > >
> > > Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation
> > > or a compilation in system emulation.
> >
> > Benchmarking? Pffft. ;)
> >
> > A benchmarking session with qemu-ppc and bzip2/bunzip2 on ~400MB files
> > and a 603e emulated CPU suggests that these changes are not terribly
> > beneficial (maybe 1% improvement, if that). I don't imagine that a
> > similarly stressful benchmark in system emulation would be much
> > different. Consider the patch series withdrawn.
> >
>
> I have done a few profiling on qemu-system-ppc and qemu-system-mips. You
> are actually right that the loop on the TCG variables lists takes time.
> This is mainly due to the call of save_globals() for TCG functions marked
> as TCG_OPF_CALL_CLOBBER.
>
> However it looks like it should be better to address this comment first
> before trying to reduce the number of TCG variables:
>
> /* XXX: for load/store we could do that only for the slow path
> (i.e. when a memory callback is called) */
>
Thinking a bit more I think we should avoid mapping FPU registers as
global TCG variables. Those variables are mostly modified by helpers
(except for move and load/store), and they will be written back to
memory before the call to the helper. This means TCG can't delay the
memory accesses, so there is very few (or no) difference in the
generated code if the FPU register is accessed through a global TCG
variable or through tcg_gen_ld_tl().
I have done the test with qemu-system-mips, and I have found a gain
around 1% in speed.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU
2009-03-29 14:42 ` Aurelien Jarno
@ 2009-03-29 14:57 ` Aurelien Jarno
0 siblings, 0 replies; 13+ messages in thread
From: Aurelien Jarno @ 2009-03-29 14:57 UTC (permalink / raw)
To: qemu-devel
On Sun, Mar 29, 2009 at 04:42:50PM +0200, Aurelien Jarno wrote:
> On Sun, Mar 29, 2009 at 03:34:53PM +0200, Aurelien Jarno wrote:
> > On Sat, Mar 28, 2009 at 05:18:34PM -0700, Nathan Froyd wrote:
> > > On Sat, Mar 28, 2009 at 11:54:43PM +0100, Aurelien Jarno wrote:
> > > > On Sat, Mar 28, 2009 at 02:30:13PM -0700, Nathan Froyd wrote:
> > > > > I am not a TCG expert, but there are several loops in TCG over all
> > > > > globals and it seems like those loops would go faster if they didn't
> > > > > have to consider registers that would never be touched. If this patch
> > > > > series makes no difference in TCG's performance, then I'd be glad to
> > > > > have an explanation of why that's the case.
> > > >
> > > > Do you actually have run a benchmark with those changes? TCG is
> > > > sometimes a bit strange, and some optimizations does not change the
> > > > execution speed, while others improve it a lot. It is very difficult to
> > > > predict what will give a gain or not.
> > > >
> > > > Suggestions of benchmarks: gzip/bzip2 on a big file using user emulation
> > > > or a compilation in system emulation.
> > >
> > > Benchmarking? Pffft. ;)
> > >
> > > A benchmarking session with qemu-ppc and bzip2/bunzip2 on ~400MB files
> > > and a 603e emulated CPU suggests that these changes are not terribly
> > > beneficial (maybe 1% improvement, if that). I don't imagine that a
> > > similarly stressful benchmark in system emulation would be much
> > > different. Consider the patch series withdrawn.
> > >
> >
> > I have done a few profiling on qemu-system-ppc and qemu-system-mips. You
> > are actually right that the loop on the TCG variables lists takes time.
> > This is mainly due to the call of save_globals() for TCG functions marked
> > as TCG_OPF_CALL_CLOBBER.
> >
> > However it looks like it should be better to address this comment first
> > before trying to reduce the number of TCG variables:
> >
> > /* XXX: for load/store we could do that only for the slow path
> > (i.e. when a memory callback is called) */
> >
>
> Thinking a bit more I think we should avoid mapping FPU registers as
> global TCG variables. Those variables are mostly modified by helpers
> (except for move and load/store), and they will be written back to
> memory before the call to the helper. This means TCG can't delay the
> memory accesses, so there is very few (or no) difference in the
> generated code if the FPU register is accessed through a global TCG
> variable or through tcg_gen_ld_tl().
>
> I have done the test with qemu-system-mips, and I have found a gain
> around 1% in speed.
>
My measurements were wrong, the gain is around 9%.
--
Aurelien Jarno GPG: 1024D/F1BCDB73
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2009-03-29 14:57 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-03-28 21:30 [Qemu-devel] [PATCH 0/4] target-ppc: create TCG slots for registers based on CPU Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 1/4] move PPC insn flags to cpu.h Nathan Froyd
2009-03-28 22:58 ` Aurelien Jarno
2009-03-28 23:07 ` Nathan Froyd
2009-03-28 23:14 ` Aurelien Jarno
2009-03-28 21:30 ` [Qemu-devel] [PATCH 2/4] move ppc_def_t definition " Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 3/4] pass the cpu definition to ppc_translate_init Nathan Froyd
2009-03-28 21:30 ` [Qemu-devel] [PATCH 4/4] create TCG slots for registers based on CPU Nathan Froyd
2009-03-28 22:54 ` [Qemu-devel] [PATCH 0/4] target-ppc: " Aurelien Jarno
2009-03-29 0:18 ` Nathan Froyd
2009-03-29 13:34 ` Aurelien Jarno
2009-03-29 14:42 ` Aurelien Jarno
2009-03-29 14:57 ` Aurelien Jarno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).