qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] target-sparc register window handling
@ 2012-10-06  0:00 Richard Henderson
  2012-10-06  0:00 ` [Qemu-devel] [PATCH] target-sparc: Use TCG registers for windowed registers Richard Henderson
  2012-10-06 10:15 ` [Qemu-devel] [PATCH] target-sparc register window handling Aurelien Jarno
  0 siblings, 2 replies; 6+ messages in thread
From: Richard Henderson @ 2012-10-06  0:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

This applies with or without the sparc-compare patch set I
recently sent, and it works with the same set of tests.

I've not had time to do true benchmarking on this, but it
does reduce the size of the generated code:

BEFORE
Translation buffer state:
gen code size       509344/33431552
TB count            2196/262144
TB avg target size  22 max=252 bytes
TB avg host size    231 bytes (expansion ratio: 10.3)
cross page TB count 0 (0%)
direct jump count   1153 (52%) (2 jumps=628 28%)

AFTER
Translation buffer state:
gen code size       418064/33431552
TB count            2196/262144
TB avg target size  22 max=252 bytes
TB avg host size    190 bytes (expansion ratio: 8.4)
cross page TB count 0 (0%)
direct jump count   1153 (52%) (2 jumps=628 28%)


r~


Richard Henderson (1):
  target-sparc: Use TCG registers for windowed registers

 gdbstub.c                          |   4 +-
 linux-user/main.c                  |  17 ++--
 linux-user/signal.c                |  78 +++++++--------
 linux-user/sparc/target_signal.h   |   2 +-
 linux-user/sparc64/target_signal.h |   2 +-
 monitor.c                          |   2 +-
 target-sparc/cpu.c                 |   3 +-
 target-sparc/cpu.h                 |  20 ++--
 target-sparc/int32_helper.c        |   4 +-
 target-sparc/ldst_helper.c         |  12 +--
 target-sparc/translate.c           | 188 +++++++++++++++++--------------------
 target-sparc/win_helper.c          |  56 ++++++-----
 12 files changed, 190 insertions(+), 198 deletions(-)

-- 
1.7.11.4

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [Qemu-devel] [PATCH] target-sparc: Use TCG registers for windowed registers
  2012-10-06  0:00 [Qemu-devel] [PATCH] target-sparc register window handling Richard Henderson
@ 2012-10-06  0:00 ` Richard Henderson
  2012-10-06 10:15 ` [Qemu-devel] [PATCH] target-sparc register window handling Aurelien Jarno
  1 sibling, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2012-10-06  0:00 UTC (permalink / raw)
  To: qemu-devel; +Cc: Blue Swirl

The use of direct loads/stores from regwptr meant that the optimizer
could not be effective with normal computation.  Instead use memcpy
on register window changes to move data into and out of "tcg space".

Signed-off-by: Richard Henderson <rth@twiddle.net>
---
 gdbstub.c                          |   4 +-
 linux-user/main.c                  |  17 ++--
 linux-user/signal.c                |  78 +++++++--------
 linux-user/sparc/target_signal.h   |   2 +-
 linux-user/sparc64/target_signal.h |   2 +-
 monitor.c                          |   2 +-
 target-sparc/cpu.c                 |   3 +-
 target-sparc/cpu.h                 |  20 ++--
 target-sparc/int32_helper.c        |   4 +-
 target-sparc/ldst_helper.c         |  12 +--
 target-sparc/translate.c           | 188 +++++++++++++++++--------------------
 target-sparc/win_helper.c          |  56 ++++++-----
 12 files changed, 190 insertions(+), 198 deletions(-)

diff --git a/gdbstub.c b/gdbstub.c
index d02ec75..349faeb 100644
--- a/gdbstub.c
+++ b/gdbstub.c
@@ -808,7 +808,7 @@ static int cpu_gdb_read_register(CPUSPARCState *env, uint8_t *mem_buf, int n)
     }
     if (n < 32) {
         /* register window */
-        GET_REGA(env->regwptr[n - 8]);
+        GET_REGA(env->wregs[n - 8]);
     }
 #if defined(TARGET_ABI32) || !defined(TARGET_SPARC64)
     if (n < 64) {
@@ -876,7 +876,7 @@ static int cpu_gdb_write_register(CPUSPARCState *env, uint8_t *mem_buf, int n)
         env->gregs[n] = tmp;
     } else if (n < 32) {
         /* register window */
-        env->regwptr[n - 8] = tmp;
+        env->wregs[n - 8] = tmp;
     }
 #if defined(TARGET_ABI32) || !defined(TARGET_SPARC64)
     else if (n < 64) {
diff --git a/linux-user/main.c b/linux-user/main.c
index 9f3476b..57c9805 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -991,12 +991,7 @@ error:
    can be found at http://www.sics.se/~psm/sparcstack.html */
 static inline int get_reg_index(CPUSPARCState *env, int cwp, int index)
 {
-    index = (index + cwp * 16) % (16 * env->nwindows);
-    /* wrap handling : if cwp is on the last window, then we use the
-       registers 'after' the end */
-    if (index < 8 && env->cwp == env->nwindows - 1)
-        index += 16 * env->nwindows;
-    return index;
+    return (index + cwp * 16) % (16 * env->nwindows);
 }
 
 /* save the register window 'cwp1' */
@@ -1123,9 +1118,9 @@ void cpu_loop (CPUSPARCState *env)
         case 0x16d:
 #endif
             ret = do_syscall (env, env->gregs[1],
-                              env->regwptr[0], env->regwptr[1],
-                              env->regwptr[2], env->regwptr[3],
-                              env->regwptr[4], env->regwptr[5],
+                              env->wregs[0], env->wregs[1],
+                              env->wregs[2], env->wregs[3],
+                              env->wregs[4], env->wregs[5],
                               0, 0);
             if ((abi_ulong)ret >= (abi_ulong)(-515)) {
 #if defined(TARGET_SPARC64) && !defined(TARGET_ABI32)
@@ -1141,7 +1136,7 @@ void cpu_loop (CPUSPARCState *env)
                 env->psr &= ~PSR_CARRY;
 #endif
             }
-            env->regwptr[0] = ret;
+            env->wregs[0] = ret;
             /* next instruction */
             env->pc = env->npc;
             env->npc = env->npc + 4;
@@ -3755,7 +3750,7 @@ int main(int argc, char **argv, char **envp)
         for(i = 0; i < 8; i++)
             env->gregs[i] = regs->u_regs[i];
         for(i = 0; i < 8; i++)
-            env->regwptr[i] = regs->u_regs[i + 8];
+            env->wregs[i] = regs->u_regs[i + 8];
     }
 #elif defined(TARGET_PPC)
     {
diff --git a/linux-user/signal.c b/linux-user/signal.c
index 7869147..712281c 100644
--- a/linux-user/signal.c
+++ b/linux-user/signal.c
@@ -1895,7 +1895,7 @@ static inline abi_ulong get_sigframe(struct target_sigaction *sa,
 {
 	abi_ulong sp;
 
-	sp = env->regwptr[UREG_FP];
+	sp = env->wregs[UREG_FP];
 
 	/* This is the X/Open sanctioned signal stack switching.  */
 	if (sa->sa_flags & TARGET_SA_ONSTACK) {
@@ -1919,7 +1919,7 @@ setup___siginfo(__siginfo_t *si, CPUSPARCState *env, abi_ulong mask)
 		err |= __put_user(env->gregs[i], &si->si_regs.u_regs[i]);
 	}
 	for (i=0; i < 8; i++) {
-		err |= __put_user(env->regwptr[UREG_I0 + i], &si->si_regs.u_regs[i+8]);
+		err |= __put_user(env->wregs[UREG_I0 + i], &si->si_regs.u_regs[i+8]);
 	}
 	err |= __put_user(mask, &si->si_mask);
 	return err;
@@ -1933,12 +1933,12 @@ setup_sigcontext(struct target_sigcontext *sc, /*struct _fpstate *fpstate,*/
 	int err = 0;
 
 	err |= __put_user(mask, &sc->sigc_mask);
-	err |= __put_user(env->regwptr[UREG_SP], &sc->sigc_sp);
+	err |= __put_user(env->wregs[UREG_SP], &sc->sigc_sp);
 	err |= __put_user(env->pc, &sc->sigc_pc);
 	err |= __put_user(env->npc, &sc->sigc_npc);
 	err |= __put_user(env->psr, &sc->sigc_psr);
 	err |= __put_user(env->gregs[1], &sc->sigc_g1);
-	err |= __put_user(env->regwptr[UREG_O0], &sc->sigc_o0);
+	err |= __put_user(env->wregs[UREG_O0], &sc->sigc_o0);
 
 	return err;
 }
@@ -1963,7 +1963,7 @@ static void setup_frame(int sig, struct target_sigaction *ka,
         if (!sf)
 		goto sigsegv;
                 
-	//fprintf(stderr, "sf: %x pc %x fp %x sp %x\n", sf, env->pc, env->regwptr[UREG_FP], env->regwptr[UREG_SP]);
+	//fprintf(stderr, "sf: %x pc %x fp %x sp %x\n", sf, env->pc, env->wregs[UREG_FP], env->wregs[UREG_SP]);
 #if 0
 	if (invalid_frame_pointer(sf, sigframe_size))
 		goto sigill_and_return;
@@ -1981,20 +1981,20 @@ static void setup_frame(int sig, struct target_sigaction *ka,
 	}
 
 	for (i = 0; i < 8; i++) {
-	  	err |= __put_user(env->regwptr[i + UREG_L0], &sf->ss.locals[i]);
+	  	err |= __put_user(env->wregs[i + UREG_L0], &sf->ss.locals[i]);
 	}
 	for (i = 0; i < 8; i++) {
-	  	err |= __put_user(env->regwptr[i + UREG_I0], &sf->ss.ins[i]);
+	  	err |= __put_user(env->wregs[i + UREG_I0], &sf->ss.ins[i]);
 	}
 	if (err)
 		goto sigsegv;
 
 	/* 3. signal handler back-trampoline and parameters */
-	env->regwptr[UREG_FP] = sf_addr;
-	env->regwptr[UREG_I0] = sig;
-	env->regwptr[UREG_I1] = sf_addr + 
+	env->wregs[UREG_FP] = sf_addr;
+	env->wregs[UREG_I0] = sig;
+	env->wregs[UREG_I1] = sf_addr + 
                 offsetof(struct target_signal_frame, info);
-	env->regwptr[UREG_I2] = sf_addr + 
+	env->wregs[UREG_I2] = sf_addr + 
                 offsetof(struct target_signal_frame, info);
 
 	/* 4. signal handler */
@@ -2002,11 +2002,11 @@ static void setup_frame(int sig, struct target_sigaction *ka,
 	env->npc = (env->pc + 4);
 	/* 5. return to kernel instructions */
 	if (ka->sa_restorer)
-		env->regwptr[UREG_I7] = ka->sa_restorer;
+		env->wregs[UREG_I7] = ka->sa_restorer;
 	else {
                 uint32_t val32;
 
-		env->regwptr[UREG_I7] = sf_addr + 
+		env->wregs[UREG_I7] = sf_addr + 
                         offsetof(struct target_signal_frame, insns) - 2 * 4;
 
 		/* mov __NR_sigreturn, %g1 */
@@ -2088,12 +2088,12 @@ long do_sigreturn(CPUSPARCState *env)
         sigset_t host_set;
         int err, i;
 
-        sf_addr = env->regwptr[UREG_FP];
+        sf_addr = env->wregs[UREG_FP];
         if (!lock_user_struct(VERIFY_READ, sf, sf_addr, 1))
                 goto segv_and_exit;
 #if 0
 	fprintf(stderr, "sigreturn\n");
-	fprintf(stderr, "sf: %x pc %x fp %x sp %x\n", sf, env->pc, env->regwptr[UREG_FP], env->regwptr[UREG_SP]);
+	fprintf(stderr, "sf: %x pc %x fp %x sp %x\n", sf, env->pc, env->wregs[UREG_FP], env->wregs[UREG_SP]);
 #endif
 	//cpu_dump_state(env, stderr, fprintf, 0);
 
@@ -2122,7 +2122,7 @@ long do_sigreturn(CPUSPARCState *env)
 		err |= __get_user(env->gregs[i], &sf->info.si_regs.u_regs[i]);
 	}
 	for (i=0; i < 8; i++) {
-		err |= __get_user(env->regwptr[i + UREG_I0], &sf->info.si_regs.u_regs[i+8]);
+		err |= __get_user(env->wregs[i + UREG_I0], &sf->info.si_regs.u_regs[i+8]);
 	}
 
         /* FIXME: implement FPU save/restore:
@@ -2145,7 +2145,7 @@ long do_sigreturn(CPUSPARCState *env)
         if (err)
                 goto segv_and_exit;
         unlock_user_struct(sf, sf_addr, 0);
-        return env->regwptr[0];
+        return env->wregs[0];
 
 segv_and_exit:
         unlock_user_struct(sf, sf_addr, 0);
@@ -2237,7 +2237,7 @@ void sparc64_set_context(CPUSPARCState *env)
     int err;
     unsigned int i;
 
-    ucp_addr = env->regwptr[UREG_I0];
+    ucp_addr = env->wregs[UREG_I0];
     if (!lock_user_struct(VERIFY_READ, ucp, ucp_addr, 1))
         goto do_sigsegv;
     grp  = &ucp->tuc_mcontext.mc_gregs;
@@ -2245,7 +2245,7 @@ void sparc64_set_context(CPUSPARCState *env)
     err |= __get_user(npc, &((*grp)[MC_NPC]));
     if (err || ((pc | npc) & 3))
         goto do_sigsegv;
-    if (env->regwptr[UREG_I1]) {
+    if (env->wregs[UREG_I1]) {
         target_sigset_t target_set;
         sigset_t set;
 
@@ -2279,19 +2279,19 @@ void sparc64_set_context(CPUSPARCState *env)
     err |= __get_user(env->gregs[5], (&(*grp)[MC_G5]));
     err |= __get_user(env->gregs[6], (&(*grp)[MC_G6]));
     err |= __get_user(env->gregs[7], (&(*grp)[MC_G7]));
-    err |= __get_user(env->regwptr[UREG_I0], (&(*grp)[MC_O0]));
-    err |= __get_user(env->regwptr[UREG_I1], (&(*grp)[MC_O1]));
-    err |= __get_user(env->regwptr[UREG_I2], (&(*grp)[MC_O2]));
-    err |= __get_user(env->regwptr[UREG_I3], (&(*grp)[MC_O3]));
-    err |= __get_user(env->regwptr[UREG_I4], (&(*grp)[MC_O4]));
-    err |= __get_user(env->regwptr[UREG_I5], (&(*grp)[MC_O5]));
-    err |= __get_user(env->regwptr[UREG_I6], (&(*grp)[MC_O6]));
-    err |= __get_user(env->regwptr[UREG_I7], (&(*grp)[MC_O7]));
+    err |= __get_user(env->wregs[UREG_I0], (&(*grp)[MC_O0]));
+    err |= __get_user(env->wregs[UREG_I1], (&(*grp)[MC_O1]));
+    err |= __get_user(env->wregs[UREG_I2], (&(*grp)[MC_O2]));
+    err |= __get_user(env->wregs[UREG_I3], (&(*grp)[MC_O3]));
+    err |= __get_user(env->wregs[UREG_I4], (&(*grp)[MC_O4]));
+    err |= __get_user(env->wregs[UREG_I5], (&(*grp)[MC_O5]));
+    err |= __get_user(env->wregs[UREG_I6], (&(*grp)[MC_O6]));
+    err |= __get_user(env->wregs[UREG_I7], (&(*grp)[MC_O7]));
 
     err |= __get_user(fp, &(ucp->tuc_mcontext.mc_fp));
     err |= __get_user(i7, &(ucp->tuc_mcontext.mc_i7));
 
-    w_addr = TARGET_STACK_BIAS+env->regwptr[UREG_I6];
+    w_addr = TARGET_STACK_BIAS+env->wregs[UREG_I6];
     if (put_user(fp, w_addr + offsetof(struct target_reg_window, ins[6]), 
                  abi_ulong) != 0)
         goto do_sigsegv;
@@ -2339,7 +2339,7 @@ void sparc64_get_context(CPUSPARCState *env)
     target_sigset_t target_set;
     sigset_t set;
 
-    ucp_addr = env->regwptr[UREG_I0];
+    ucp_addr = env->wregs[UREG_I0];
     if (!lock_user_struct(VERIFY_WRITE, ucp, ucp_addr, 0))
         goto do_sigsegv;
     
@@ -2380,16 +2380,16 @@ void sparc64_get_context(CPUSPARCState *env)
     err |= __put_user(env->gregs[5], &((*grp)[MC_G5]));
     err |= __put_user(env->gregs[6], &((*grp)[MC_G6]));
     err |= __put_user(env->gregs[7], &((*grp)[MC_G7]));
-    err |= __put_user(env->regwptr[UREG_I0], &((*grp)[MC_O0]));
-    err |= __put_user(env->regwptr[UREG_I1], &((*grp)[MC_O1]));
-    err |= __put_user(env->regwptr[UREG_I2], &((*grp)[MC_O2]));
-    err |= __put_user(env->regwptr[UREG_I3], &((*grp)[MC_O3]));
-    err |= __put_user(env->regwptr[UREG_I4], &((*grp)[MC_O4]));
-    err |= __put_user(env->regwptr[UREG_I5], &((*grp)[MC_O5]));
-    err |= __put_user(env->regwptr[UREG_I6], &((*grp)[MC_O6]));
-    err |= __put_user(env->regwptr[UREG_I7], &((*grp)[MC_O7]));
-
-    w_addr = TARGET_STACK_BIAS+env->regwptr[UREG_I6];
+    err |= __put_user(env->wregs[UREG_I0], &((*grp)[MC_O0]));
+    err |= __put_user(env->wregs[UREG_I1], &((*grp)[MC_O1]));
+    err |= __put_user(env->wregs[UREG_I2], &((*grp)[MC_O2]));
+    err |= __put_user(env->wregs[UREG_I3], &((*grp)[MC_O3]));
+    err |= __put_user(env->wregs[UREG_I4], &((*grp)[MC_O4]));
+    err |= __put_user(env->wregs[UREG_I5], &((*grp)[MC_O5]));
+    err |= __put_user(env->wregs[UREG_I6], &((*grp)[MC_O6]));
+    err |= __put_user(env->wregs[UREG_I7], &((*grp)[MC_O7]));
+
+    w_addr = TARGET_STACK_BIAS+env->wregs[UREG_I6];
     fp = i7 = 0;
     if (get_user(fp, w_addr + offsetof(struct target_reg_window, ins[6]), 
                  abi_ulong) != 0)
diff --git a/linux-user/sparc/target_signal.h b/linux-user/sparc/target_signal.h
index c7de300..10bf508 100644
--- a/linux-user/sparc/target_signal.h
+++ b/linux-user/sparc/target_signal.h
@@ -30,7 +30,7 @@ typedef struct target_sigaltstack {
 
 static inline abi_ulong get_sp_from_cpustate(CPUSPARCState *state)
 {
-    return state->regwptr[UREG_FP];
+    return state->wregs[UREG_FP];
 }
 
 #endif /* TARGET_SIGNAL_H */
diff --git a/linux-user/sparc64/target_signal.h b/linux-user/sparc64/target_signal.h
index c7de300..10bf508 100644
--- a/linux-user/sparc64/target_signal.h
+++ b/linux-user/sparc64/target_signal.h
@@ -30,7 +30,7 @@ typedef struct target_sigaltstack {
 
 static inline abi_ulong get_sp_from_cpustate(CPUSPARCState *state)
 {
-    return state->regwptr[UREG_FP];
+    return state->wregs[UREG_FP];
 }
 
 #endif /* TARGET_SIGNAL_H */
diff --git a/monitor.c b/monitor.c
index a0e3ffb..2fbc3ce 100644
--- a/monitor.c
+++ b/monitor.c
@@ -2795,7 +2795,7 @@ static target_long monitor_get_psr (const struct MonitorDef *md, int val)
 static target_long monitor_get_reg(const struct MonitorDef *md, int val)
 {
     CPUArchState *env = mon_get_cpu();
-    return env->regwptr[val];
+    return env->wregs[val];
 }
 #endif
 
diff --git a/target-sparc/cpu.c b/target-sparc/cpu.c
index 71cc9e8..2263904 100644
--- a/target-sparc/cpu.c
+++ b/target-sparc/cpu.c
@@ -43,7 +43,6 @@ static void sparc_cpu_reset(CPUState *s)
 #ifndef TARGET_SPARC64
     env->wim = 1;
 #endif
-    env->regwptr = env->regbase + (env->cwp * 16);
     CC_OP = CC_OP_FLAGS;
 #if defined(CONFIG_USER_ONLY)
 #ifdef TARGET_SPARC64
@@ -809,7 +808,7 @@ void cpu_dump_state(CPUSPARCState *env, FILE *f, fprintf_function cpu_fprintf,
                             x == 0 ? 'o' : (x == 1 ? 'l' : 'i'),
                             i, i + REGS_PER_LINE - 1);
             }
-            cpu_fprintf(f, TARGET_FMT_lx " ", env->regwptr[i + x * 8]);
+            cpu_fprintf(f, TARGET_FMT_lx " ", env->wregs[i + x * 8]);
             if (i % REGS_PER_LINE == REGS_PER_LINE - 1) {
                 cpu_fprintf(f, "\n");
             }
diff --git a/target-sparc/cpu.h b/target-sparc/cpu.h
index 214d01d..4572c4d 100644
--- a/target-sparc/cpu.h
+++ b/target-sparc/cpu.h
@@ -381,11 +381,11 @@ void cpu_get_timer(struct QEMUFile *f, CPUTimer *s);
 typedef struct CPUSPARCState CPUSPARCState;
 
 struct CPUSPARCState {
-    target_ulong gregs[8]; /* general registers */
-    target_ulong *regwptr; /* pointer to current register window */
-    target_ulong pc;       /* program counter */
-    target_ulong npc;      /* next program counter */
-    target_ulong y;        /* multiply/divide register */
+    target_ulong gregs[8];  /* general registers */
+    target_ulong wregs[24]; /* current register window */
+    target_ulong pc;        /* program counter */
+    target_ulong npc;       /* next program counter */
+    target_ulong y;         /* multiply/divide register */
 
     /* emulator internal flags handling */
     target_ulong cc_src, cc_src2;
@@ -416,8 +416,7 @@ struct CPUSPARCState {
     int      psref;    /* enable fpu */
 #endif
     int interrupt_index;
-    /* NOTE: we allow 8 more registers to handle wrapping */
-    target_ulong regbase[MAX_NWINDOWS * 16 + 8];
+    target_ulong regbase[MAX_NWINDOWS * 16];
 
     CPU_COMMON
 
@@ -693,9 +692,10 @@ static inline int cpu_pil_allowed(CPUSPARCState *env1, int pil)
 #if defined(CONFIG_USER_ONLY)
 static inline void cpu_clone_regs(CPUSPARCState *env, target_ulong newsp)
 {
-    if (newsp)
-        env->regwptr[22] = newsp;
-    env->regwptr[0] = 0;
+    if (newsp) {
+        env->wregs[22] = newsp;
+    }
+    env->wregs[0] = 0;
     /* FIXME: Do we also need to clear CF?  */
     /* XXXXX */
     printf ("HELPME: %s:%d\n", __FILE__, __LINE__);
diff --git a/target-sparc/int32_helper.c b/target-sparc/int32_helper.c
index 9ac5aac..6fd43d8 100644
--- a/target-sparc/int32_helper.c
+++ b/target-sparc/int32_helper.c
@@ -111,8 +111,8 @@ void do_interrupt(CPUSPARCState *env)
     env->psret = 0;
     cwp = cpu_cwp_dec(env, env->cwp - 1);
     cpu_set_cwp(env, cwp);
-    env->regwptr[9] = env->pc;
-    env->regwptr[10] = env->npc;
+    env->wregs[9] = env->pc;
+    env->wregs[10] = env->npc;
     env->psrps = env->psrs;
     env->psrs = 1;
     env->tbr = (env->tbr & TBR_BASE_MASK) | (intno << 4);
diff --git a/target-sparc/ldst_helper.c b/target-sparc/ldst_helper.c
index 2ca9a5c..e80bd3a 100644
--- a/target-sparc/ldst_helper.c
+++ b/target-sparc/ldst_helper.c
@@ -2053,11 +2053,11 @@ void helper_ldda_asi(CPUSPARCState *env, target_ulong addr, int asi, int rd)
                 bswap64s(&env->gregs[rd + 1]);
             }
         } else {
-            env->regwptr[rd] = cpu_ldq_nucleus(env, addr);
-            env->regwptr[rd + 1] = cpu_ldq_nucleus(env, addr + 8);
+            env->wregs[rd - 8] = cpu_ldq_nucleus(env, addr);
+            env->wregs[rd - 8 + 1] = cpu_ldq_nucleus(env, addr + 8);
             if (asi == 0x2c) {
-                bswap64s(&env->regwptr[rd]);
-                bswap64s(&env->regwptr[rd + 1]);
+                bswap64s(&env->wregs[rd - 8]);
+                bswap64s(&env->wregs[rd - 8 + 1]);
             }
         }
         break;
@@ -2070,8 +2070,8 @@ void helper_ldda_asi(CPUSPARCState *env, target_ulong addr, int asi, int rd)
             env->gregs[rd] = helper_ld_asi(env, addr, asi, 4, 0);
             env->gregs[rd + 1] = helper_ld_asi(env, addr + 4, asi, 4, 0);
         } else {
-            env->regwptr[rd] = helper_ld_asi(env, addr, asi, 4, 0);
-            env->regwptr[rd + 1] = helper_ld_asi(env, addr + 4, asi, 4, 0);
+            env->wregs[rd - 8] = helper_ld_asi(env, addr, asi, 4, 0);
+            env->wregs[rd - 8 + 1] = helper_ld_asi(env, addr + 4, asi, 4, 0);
         }
         break;
     }
diff --git a/target-sparc/translate.c b/target-sparc/translate.c
index 472eb51..9f3d039 100644
--- a/target-sparc/translate.c
+++ b/target-sparc/translate.c
@@ -39,11 +39,11 @@
                          according to jump_pc[T2] */
 
 /* global register indexes */
-static TCGv_ptr cpu_env, cpu_regwptr;
+static TCGv_ptr cpu_env;
 static TCGv cpu_cc_src, cpu_cc_src2, cpu_cc_dst;
 static TCGv_i32 cpu_cc_op;
 static TCGv_i32 cpu_psr;
-static TCGv cpu_fsr, cpu_pc, cpu_npc, cpu_gregs[8];
+static TCGv cpu_fsr, cpu_pc, cpu_npc, cpu_regs[32];
 static TCGv cpu_y;
 #ifndef CONFIG_USER_ONLY
 static TCGv cpu_tbr;
@@ -265,23 +265,17 @@ static inline void gen_address_mask(DisasContext *dc, TCGv addr)
 
 static inline void gen_movl_reg_TN(int reg, TCGv tn)
 {
-    if (reg == 0)
+    if (reg == 0) {
         tcg_gen_movi_tl(tn, 0);
-    else if (reg < 8)
-        tcg_gen_mov_tl(tn, cpu_gregs[reg]);
-    else {
-        tcg_gen_ld_tl(tn, cpu_regwptr, (reg - 8) * sizeof(target_ulong));
+    } else {
+        tcg_gen_mov_tl(tn, cpu_regs[reg]);
     }
 }
 
 static inline void gen_movl_TN_reg(int reg, TCGv tn)
 {
-    if (reg == 0)
-        return;
-    else if (reg < 8)
-        tcg_gen_mov_tl(cpu_gregs[reg], tn);
-    else {
-        tcg_gen_st_tl(tn, cpu_regwptr, (reg - 8) * sizeof(target_ulong));
+    if (reg != 0) {
+        tcg_gen_mov_tl(cpu_regs[reg], tn);
     }
 }
 
@@ -2185,10 +2179,8 @@ static inline TCGv get_src1(unsigned int insn, TCGv def)
     rs1 = GET_FIELD(insn, 13, 17);
     if (rs1 == 0) {
         tcg_gen_movi_tl(def, 0);
-    } else if (rs1 < 8) {
-        r_rs1 = cpu_gregs[rs1];
     } else {
-        tcg_gen_ld_tl(def, cpu_regwptr, (rs1 - 8) * sizeof(target_ulong));
+        r_rs1 = cpu_regs[rs1];
     }
     return r_rs1;
 }
@@ -2204,10 +2196,8 @@ static inline TCGv get_src2(unsigned int insn, TCGv def)
         unsigned int rs2 = GET_FIELD(insn, 27, 31);
         if (rs2 == 0) {
             tcg_gen_movi_tl(def, 0);
-        } else if (rs2 < 8) {
-            r_rs2 = cpu_gregs[rs2];
         } else {
-            tcg_gen_ld_tl(def, cpu_regwptr, (rs2 - 8) * sizeof(target_ulong));
+            r_rs2 = cpu_regs[rs2];
         }
     }
     return r_rs2;
@@ -5377,17 +5367,13 @@ void gen_intermediate_code_init(CPUSPARCState *env)
 {
     unsigned int i;
     static int inited;
-    static const char * const gregnames[8] = {
-        NULL, // g0 not used
-        "g1",
-        "g2",
-        "g3",
-        "g4",
-        "g5",
-        "g6",
-        "g7",
+    static const char regnames[32][4] = {
+        "g0", "g1", "g2", "g3", "g4", "g5", "g6", "g7",
+        "o0", "o1", "o2", "o3", "o4", "o5", "o6", "o7",
+        "l0", "l1", "l2", "l3", "l4", "l5", "l6", "l7",
+        "i0", "i1", "i2", "i3", "i4", "i5", "i6", "i7",
     };
-    static const char * const fregnames[32] = {
+    static const char fregnames[32][4] = {
         "f0", "f2", "f4", "f6", "f8", "f10", "f12", "f14",
         "f16", "f18", "f20", "f22", "f24", "f26", "f28", "f30",
         "f32", "f34", "f36", "f38", "f40", "f42", "f44", "f46",
@@ -5395,88 +5381,90 @@ void gen_intermediate_code_init(CPUSPARCState *env)
     };
 
     /* init various static tables */
-    if (!inited) {
-        inited = 1;
+    if (inited) {
+        return;
+    }
+    inited = 1;
 
-        cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
-        cpu_regwptr = tcg_global_mem_new_ptr(TCG_AREG0,
-                                             offsetof(CPUSPARCState, regwptr),
-                                             "regwptr");
+    cpu_env = tcg_global_reg_new_ptr(TCG_AREG0, "env");
 #ifdef TARGET_SPARC64
-        cpu_xcc = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, xcc),
-                                         "xcc");
-        cpu_asi = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, asi),
-                                         "asi");
-        cpu_fprs = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, fprs),
-                                          "fprs");
-        cpu_gsr = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, gsr),
+    cpu_xcc = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, xcc),
+                                     "xcc");
+    cpu_asi = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, asi),
+                                     "asi");
+    cpu_fprs = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, fprs),
+                                      "fprs");
+    cpu_gsr = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, gsr),
                                      "gsr");
-        cpu_tick_cmpr = tcg_global_mem_new(TCG_AREG0,
-                                           offsetof(CPUSPARCState, tick_cmpr),
-                                           "tick_cmpr");
-        cpu_stick_cmpr = tcg_global_mem_new(TCG_AREG0,
-                                            offsetof(CPUSPARCState, stick_cmpr),
-                                            "stick_cmpr");
-        cpu_hstick_cmpr = tcg_global_mem_new(TCG_AREG0,
-                                             offsetof(CPUSPARCState, hstick_cmpr),
-                                             "hstick_cmpr");
-        cpu_hintp = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, hintp),
-                                       "hintp");
-        cpu_htba = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, htba),
-                                      "htba");
-        cpu_hver = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, hver),
-                                      "hver");
-        cpu_ssr = tcg_global_mem_new(TCG_AREG0,
-                                     offsetof(CPUSPARCState, ssr), "ssr");
-        cpu_ver = tcg_global_mem_new(TCG_AREG0,
-                                     offsetof(CPUSPARCState, version), "ver");
-        cpu_softint = tcg_global_mem_new_i32(TCG_AREG0,
-                                             offsetof(CPUSPARCState, softint),
-                                             "softint");
+    cpu_tick_cmpr = tcg_global_mem_new(TCG_AREG0,
+                                       offsetof(CPUSPARCState, tick_cmpr),
+                                       "tick_cmpr");
+    cpu_stick_cmpr = tcg_global_mem_new(TCG_AREG0,
+                                        offsetof(CPUSPARCState, stick_cmpr),
+                                        "stick_cmpr");
+    cpu_hstick_cmpr = tcg_global_mem_new(TCG_AREG0,
+                                         offsetof(CPUSPARCState, hstick_cmpr),
+                                         "hstick_cmpr");
+    cpu_hintp = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, hintp),
+                                   "hintp");
+    cpu_htba = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, htba),
+                                  "htba");
+    cpu_hver = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, hver),
+                                  "hver");
+    cpu_ssr = tcg_global_mem_new(TCG_AREG0,
+                                 offsetof(CPUSPARCState, ssr), "ssr");
+    cpu_ver = tcg_global_mem_new(TCG_AREG0,
+                                 offsetof(CPUSPARCState, version), "ver");
+    cpu_softint = tcg_global_mem_new_i32(TCG_AREG0,
+                                         offsetof(CPUSPARCState, softint),
+                                         "softint");
 #else
-        cpu_wim = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, wim),
-                                     "wim");
+    cpu_wim = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, wim),
+                                 "wim");
 #endif
-        cpu_cond = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, cond),
-                                      "cond");
-        cpu_cc_src = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, cc_src),
-                                        "cc_src");
-        cpu_cc_src2 = tcg_global_mem_new(TCG_AREG0,
-                                         offsetof(CPUSPARCState, cc_src2),
-                                         "cc_src2");
-        cpu_cc_dst = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, cc_dst),
-                                        "cc_dst");
-        cpu_cc_op = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, cc_op),
-                                           "cc_op");
-        cpu_psr = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, psr),
-                                         "psr");
-        cpu_fsr = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, fsr),
-                                     "fsr");
-        cpu_pc = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, pc),
-                                    "pc");
-        cpu_npc = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, npc),
-                                     "npc");
-        cpu_y = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, y), "y");
+    cpu_cond = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, cond),
+                                  "cond");
+    cpu_cc_src = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, cc_src),
+                                    "cc_src");
+    cpu_cc_src2 = tcg_global_mem_new(TCG_AREG0,
+                                     offsetof(CPUSPARCState, cc_src2),
+                                     "cc_src2");
+    cpu_cc_dst = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, cc_dst),
+                                    "cc_dst");
+    cpu_cc_op = tcg_global_mem_new_i32(TCG_AREG0,
+                                       offsetof(CPUSPARCState, cc_op),
+                                       "cc_op");
+    cpu_psr = tcg_global_mem_new_i32(TCG_AREG0, offsetof(CPUSPARCState, psr),
+                                     "psr");
+    cpu_fsr = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, fsr),
+                                 "fsr");
+    cpu_pc = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, pc),
+                                "pc");
+    cpu_npc = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, npc),
+                                 "npc");
+    cpu_y = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, y), "y");
 #ifndef CONFIG_USER_ONLY
-        cpu_tbr = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, tbr),
-                                     "tbr");
+    cpu_tbr = tcg_global_mem_new(TCG_AREG0, offsetof(CPUSPARCState, tbr),
+                                 "tbr");
 #endif
-        for (i = 1; i < 8; i++) {
-            cpu_gregs[i] = tcg_global_mem_new(TCG_AREG0,
-                                              offsetof(CPUSPARCState, gregs[i]),
-                                              gregnames[i]);
-        }
-        for (i = 0; i < TARGET_DPREGS; i++) {
-            cpu_fpr[i] = tcg_global_mem_new_i64(TCG_AREG0,
-                                                offsetof(CPUSPARCState, fpr[i]),
-                                                fregnames[i]);
+    for (i = 1; i < 32; i++) {
+        int off;
+        if (i < 8) {
+            off = offsetof(CPUSPARCState, gregs[i]);
+        } else {
+            off = offsetof(CPUSPARCState, wregs[i - 8]);
         }
+        cpu_regs[i] = tcg_global_mem_new(TCG_AREG0, off, regnames[i]);
+    }
+    for (i = 0; i < TARGET_DPREGS; i++) {
+        cpu_fpr[i] = tcg_global_mem_new_i64(TCG_AREG0,
+                                            offsetof(CPUSPARCState, fpr[i]),
+                                            fregnames[i]);
+    }
 
-        /* register helpers */
-
+    /* register helpers */
 #define GEN_HELPER 2
 #include "helper.h"
-    }
 }
 
 void restore_state_to_opc(CPUSPARCState *env, TranslationBlock *tb, int pc_pos)
diff --git a/target-sparc/win_helper.c b/target-sparc/win_helper.c
index 3e82eb7..a0db286 100644
--- a/target-sparc/win_helper.c
+++ b/target-sparc/win_helper.c
@@ -21,31 +21,45 @@
 #include "helper.h"
 #include "trace.h"
 
-static inline void memcpy32(target_ulong *dst, const target_ulong *src)
+static inline void memcpy8(target_ulong *dst, const target_ulong *src)
 {
-    dst[0] = src[0];
-    dst[1] = src[1];
-    dst[2] = src[2];
-    dst[3] = src[3];
-    dst[4] = src[4];
-    dst[5] = src[5];
-    dst[6] = src[6];
-    dst[7] = src[7];
+    memcpy(dst, src, 8 * sizeof(target_ulong));
+}
+
+static inline void memcpy16(target_ulong *dst, const target_ulong *src)
+{
+    memcpy(dst, src, 16 * sizeof(target_ulong));
+}
+
+static inline void memcpy24(target_ulong *dst, const target_ulong *src)
+{
+    memcpy(dst, src, 24 * sizeof(target_ulong));
 }
 
 void cpu_set_cwp(CPUSPARCState *env, int new_cwp)
 {
-    /* put the modified wrap registers at their proper location */
-    if (env->cwp == env->nwindows - 1) {
-        memcpy32(env->regbase, env->regbase + env->nwindows * 16);
+    int old_cwp = env->cwp;
+    int nwindows = env->nwindows;
+
+    /* ??? Re-add special casing for simple inc/dec of the window.  */
+
+    /* Put the current window back into its proper location.  */
+    if (old_cwp == nwindows - 1) {
+        memcpy16(env->regbase + old_cwp*16, env->wregs);
+        memcpy8(env->regbase, env->wregs + 16);
+    } else {
+        memcpy24(env->regbase + old_cwp*16, env->wregs);
     }
-    env->cwp = new_cwp;
 
-    /* put the wrap registers at their temporary location */
-    if (new_cwp == env->nwindows - 1) {
-        memcpy32(env->regbase + env->nwindows * 16, env->regbase);
+    /* Copy the current window back to where TCG can find it.  */
+    if (new_cwp == nwindows - 1) {
+        memcpy16(env->wregs, env->regbase + new_cwp*16);
+        memcpy8(env->wregs + 16, env->regbase);
+    } else {
+        memcpy24(env->wregs, env->regbase + new_cwp*16);
     }
-    env->regwptr = env->regbase + (new_cwp * 16);
+
+    env->cwp = new_cwp;
 }
 
 target_ulong cpu_get_psr(CPUSPARCState *env)
@@ -117,8 +131,6 @@ void helper_rett(CPUSPARCState *env)
     env->psrs = env->psrps;
 }
 
-/* XXX: use another pointer for %iN registers to avoid slow wrapping
-   handling ? */
 void helper_save(CPUSPARCState *env)
 {
     uint32_t cwp;
@@ -156,8 +168,6 @@ target_ulong helper_rdpsr(CPUSPARCState *env)
 }
 
 #else
-/* XXX: use another pointer for %iN registers to avoid slow wrapping
-   handling ? */
 void helper_save(CPUSPARCState *env)
 {
     uint32_t cwp;
@@ -317,8 +327,8 @@ void cpu_change_pstate(CPUSPARCState *env, uint32_t new_pstate)
         /* Switch global register bank */
         src = get_gregset(env, new_pstate_regs);
         dst = get_gregset(env, pstate_regs);
-        memcpy32(dst, env->gregs);
-        memcpy32(env->gregs, src);
+        memcpy8(dst, env->gregs);
+        memcpy8(env->gregs, src);
     } else {
         trace_win_helper_no_switch_pstate(new_pstate_regs);
     }
-- 
1.7.11.4

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc register window handling
  2012-10-06  0:00 [Qemu-devel] [PATCH] target-sparc register window handling Richard Henderson
  2012-10-06  0:00 ` [Qemu-devel] [PATCH] target-sparc: Use TCG registers for windowed registers Richard Henderson
@ 2012-10-06 10:15 ` Aurelien Jarno
  2012-10-08  6:17   ` Aurelien Jarno
  1 sibling, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2012-10-06 10:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Blue Swirl, qemu-devel

On Fri, Oct 05, 2012 at 05:00:04PM -0700, Richard Henderson wrote:
> This applies with or without the sparc-compare patch set I
> recently sent, and it works with the same set of tests.
> 
> I've not had time to do true benchmarking on this, but it
> does reduce the size of the generated code:

Experience proves that there is not a direct relation between size of
the generated code and the resulting emulation speed, sometimes smaller
code means slower emulation (when the code generation/optimization takes
too much time), and sometimes bigger code might means faster emulation
(think about replacing some helpers by TCG code).

As from the user point of view what is important is the emulation speed,
I think benchmarks (even simple ones like measuring the boot time of a
guest) are essential for this kind of patch.

> BEFORE
> Translation buffer state:
> gen code size       509344/33431552
> TB count            2196/262144
> TB avg target size  22 max=252 bytes
> TB avg host size    231 bytes (expansion ratio: 10.3)
> cross page TB count 0 (0%)
> direct jump count   1153 (52%) (2 jumps=628 28%)
> 
> AFTER
> Translation buffer state:
> gen code size       418064/33431552
> TB count            2196/262144
> TB avg target size  22 max=252 bytes
> TB avg host size    190 bytes (expansion ratio: 8.4)
> cross page TB count 0 (0%)
> direct jump count   1153 (52%) (2 jumps=628 28%)
> 
> 
> r~
> 
> 
> Richard Henderson (1):
>   target-sparc: Use TCG registers for windowed registers
> 
>  gdbstub.c                          |   4 +-
>  linux-user/main.c                  |  17 ++--
>  linux-user/signal.c                |  78 +++++++--------
>  linux-user/sparc/target_signal.h   |   2 +-
>  linux-user/sparc64/target_signal.h |   2 +-
>  monitor.c                          |   2 +-
>  target-sparc/cpu.c                 |   3 +-
>  target-sparc/cpu.h                 |  20 ++--
>  target-sparc/int32_helper.c        |   4 +-
>  target-sparc/ldst_helper.c         |  12 +--
>  target-sparc/translate.c           | 188 +++++++++++++++++--------------------
>  target-sparc/win_helper.c          |  56 ++++++-----
>  12 files changed, 190 insertions(+), 198 deletions(-)
> 
> -- 
> 1.7.11.4
> 
> 
> 

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc register window handling
  2012-10-06 10:15 ` [Qemu-devel] [PATCH] target-sparc register window handling Aurelien Jarno
@ 2012-10-08  6:17   ` Aurelien Jarno
  2012-10-09 16:53     ` Richard Henderson
  0 siblings, 1 reply; 6+ messages in thread
From: Aurelien Jarno @ 2012-10-08  6:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Blue Swirl, qemu-devel

On Sat, Oct 06, 2012 at 12:15:16PM +0200, Aurelien Jarno wrote:
> On Fri, Oct 05, 2012 at 05:00:04PM -0700, Richard Henderson wrote:
> > This applies with or without the sparc-compare patch set I
> > recently sent, and it works with the same set of tests.
> > 
> > I've not had time to do true benchmarking on this, but it
> > does reduce the size of the generated code:
> 
> Experience proves that there is not a direct relation between size of
> the generated code and the resulting emulation speed, sometimes smaller
> code means slower emulation (when the code generation/optimization takes
> too much time), and sometimes bigger code might means faster emulation
> (think about replacing some helpers by TCG code).
> 
> As from the user point of view what is important is the emulation speed,
> I think benchmarks (even simple ones like measuring the boot time of a
> guest) are essential for this kind of patch.
> 

For what it worth, I measure a 4% slow down booting a sparc64 guest on a
Core-i5 2500 machine. I guess the memcpy() of the register windows is
more expensive that the gain on the TCG side, though it should be
probably be confirmed using some profiling tools.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc register window handling
  2012-10-08  6:17   ` Aurelien Jarno
@ 2012-10-09 16:53     ` Richard Henderson
  2012-10-09 17:06       ` Aurelien Jarno
  0 siblings, 1 reply; 6+ messages in thread
From: Richard Henderson @ 2012-10-09 16:53 UTC (permalink / raw)
  To: Aurelien Jarno; +Cc: Blue Swirl, qemu-devel

On 10/07/2012 11:17 PM, Aurelien Jarno wrote:
> For what it worth, I measure a 4% slow down booting a sparc64 guest on a
> Core-i5 2500 machine. I guess the memcpy() of the register windows is
> more expensive that the gain on the TCG side, though it should be
> probably be confirmed using some profiling tools.

Confirmed with userland testing (running cc1plus on large input),
and I see much more than 4% slow down -- more like 15%.  I can reduce
this by adding save/restore window inc/dec optimization within the
cpu_set_cwp routine, but that only gets me back to 5% slow down.

A bit disappointing; I guess other avenues will want exploring.

Patch withdrawn.


r~

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [Qemu-devel] [PATCH] target-sparc register window handling
  2012-10-09 16:53     ` Richard Henderson
@ 2012-10-09 17:06       ` Aurelien Jarno
  0 siblings, 0 replies; 6+ messages in thread
From: Aurelien Jarno @ 2012-10-09 17:06 UTC (permalink / raw)
  To: Richard Henderson; +Cc: Blue Swirl, qemu-devel

On Tue, Oct 09, 2012 at 09:53:32AM -0700, Richard Henderson wrote:
> On 10/07/2012 11:17 PM, Aurelien Jarno wrote:
> > For what it worth, I measure a 4% slow down booting a sparc64 guest on a
> > Core-i5 2500 machine. I guess the memcpy() of the register windows is
> > more expensive that the gain on the TCG side, though it should be
> > probably be confirmed using some profiling tools.
> 
> Confirmed with userland testing (running cc1plus on large input),
> and I see much more than 4% slow down -- more like 15%.  I can reduce

Not surprising. System mode spends a lot of time outside of the TCG
code, and when doing TCG code, it spends most of its time in the
qemu_ld/st code. User mode has a very simple qemu_ld/st code.

-- 
Aurelien Jarno                          GPG: 1024D/F1BCDB73
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-10-09 17:07 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-10-06  0:00 [Qemu-devel] [PATCH] target-sparc register window handling Richard Henderson
2012-10-06  0:00 ` [Qemu-devel] [PATCH] target-sparc: Use TCG registers for windowed registers Richard Henderson
2012-10-06 10:15 ` [Qemu-devel] [PATCH] target-sparc register window handling Aurelien Jarno
2012-10-08  6:17   ` Aurelien Jarno
2012-10-09 16:53     ` Richard Henderson
2012-10-09 17:06       ` Aurelien Jarno

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).