* [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation
@ 2016-06-24 3:48 Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register Richard Henderson
` (10 more replies)
0 siblings, 11 replies; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
I was unhappy about the complexity of the second try.
Better to convert to normal temps, allowing in rare
occasions, spilling the "globals" to the stack in order
to satisfy register allocation.
I can no longer provoke an allocation failure on i686.
Hopefully this fixes the OpenBSD case that Mark mentioned
re the second attempt.
r~
Richard Henderson (9):
tcg: Fix name for high-half register
tcg: Optimize spills of constants
tcg: Require liveness analysis
tcg: Compress liveness data to 16 bits
tcg: Reorg TCGOp chaining
tcg: Fold life data into TCGOp
tcg: Compress dead_temps and mem_temps into a single array
tcg: Include liveness info in the dumps
tcg: Lower indirect registers in a separate pass
include/exec/gen-icount.h | 2 +-
include/qemu/log.h | 1 +
tcg/aarch64/tcg-target.inc.c | 10 +
tcg/arm/tcg-target.inc.c | 6 +
tcg/i386/tcg-target.inc.c | 21 +-
tcg/ia64/tcg-target.inc.c | 10 +
tcg/mips/tcg-target.inc.c | 10 +
tcg/optimize.c | 37 +--
tcg/ppc/tcg-target.inc.c | 6 +
tcg/s390/tcg-target.inc.c | 6 +
tcg/sparc/tcg-target.inc.c | 10 +
tcg/tcg-op.c | 2 +-
tcg/tcg.c | 690 ++++++++++++++++++++++++++++---------------
tcg/tcg.h | 50 ++--
tcg/tci/tcg-target.inc.c | 6 +
util/log.c | 5 +-
16 files changed, 563 insertions(+), 309 deletions(-)
--
2.5.5
^ permalink raw reply [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 2/9] tcg: Optimize spills of constants Richard Henderson
` (9 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/tcg.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 254427b..154ffe8 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -557,7 +557,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
ts2->mem_offset = offset + (1 - bigendian) * 4;
pstrcpy(buf, sizeof(buf), name);
pstrcat(buf, sizeof(buf), "_1");
- ts->name = strdup(buf);
+ ts2->name = strdup(buf);
} else {
ts->base_type = type;
ts->type = type;
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 2/9] tcg: Optimize spills of constants
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 3/9] tcg: Require liveness analysis Richard Henderson
` (8 subsequent siblings)
10 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
While we can store constants via constrants on INDEX_op_st_i32 et al,
we weren't able to spill constants to backing store.
Add a new backend interface, tcg_out_sti, which may store the constant
(and is allowed to fail). Rearrange the temp_* helpers so that we only
attempt to directly store a constant when the temp is becoming dead/free.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/aarch64/tcg-target.inc.c | 10 +++
tcg/arm/tcg-target.inc.c | 6 ++
tcg/i386/tcg-target.inc.c | 21 ++++--
tcg/ia64/tcg-target.inc.c | 10 +++
tcg/mips/tcg-target.inc.c | 10 +++
tcg/ppc/tcg-target.inc.c | 6 ++
tcg/s390/tcg-target.inc.c | 6 ++
tcg/sparc/tcg-target.inc.c | 10 +++
tcg/tcg.c | 159 ++++++++++++++++++++++++-------------------
tcg/tci/tcg-target.inc.c | 6 ++
10 files changed, 166 insertions(+), 78 deletions(-)
diff --git a/tcg/aarch64/tcg-target.inc.c b/tcg/aarch64/tcg-target.inc.c
index 1447f7c..5ac0091 100644
--- a/tcg/aarch64/tcg-target.inc.c
+++ b/tcg/aarch64/tcg-target.inc.c
@@ -716,6 +716,16 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
arg, arg1, arg2);
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ if (val == 0) {
+ tcg_out_st(s, type, TCG_REG_XZR, base, ofs);
+ return true;
+ }
+ return false;
+}
+
static inline void tcg_out_bfm(TCGContext *s, TCGType ext, TCGReg rd,
TCGReg rn, unsigned int a, unsigned int b)
{
diff --git a/tcg/arm/tcg-target.inc.c b/tcg/arm/tcg-target.inc.c
index f9f54c6..172feba 100644
--- a/tcg/arm/tcg-target.inc.c
+++ b/tcg/arm/tcg-target.inc.c
@@ -2046,6 +2046,12 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
tcg_out_st32(s, COND_AL, arg, arg1, arg2);
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ return false;
+}
+
static inline void tcg_out_mov(TCGContext *s, TCGType type,
TCGReg ret, TCGReg arg)
{
diff --git a/tcg/i386/tcg-target.inc.c b/tcg/i386/tcg-target.inc.c
index 317484c..bc34535 100644
--- a/tcg/i386/tcg-target.inc.c
+++ b/tcg/i386/tcg-target.inc.c
@@ -710,12 +710,19 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
tcg_out_modrm_offset(s, opc, arg, arg1, arg2);
}
-static inline void tcg_out_sti(TCGContext *s, TCGType type, TCGReg base,
- tcg_target_long ofs, tcg_target_long val)
+static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
{
- int opc = OPC_MOVL_EvIz + (type == TCG_TYPE_I64 ? P_REXW : 0);
- tcg_out_modrm_offset(s, opc, 0, base, ofs);
+ int rexw = 0;
+ if (TCG_TARGET_REG_BITS == 64 && type == TCG_TYPE_I64) {
+ if (val != (int32_t)val) {
+ return false;
+ }
+ rexw = P_REXW;
+ }
+ tcg_out_modrm_offset(s, OPC_MOVL_EvIz | rexw, 0, base, ofs);
tcg_out32(s, val);
+ return true;
}
static void tcg_out_shifti(TCGContext *s, int subopc, int reg, int count)
@@ -1321,10 +1328,10 @@ static void tcg_out_qemu_ld_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
ofs += 4;
}
- tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, oi);
+ tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
ofs += 4;
- tcg_out_sti(s, TCG_TYPE_PTR, TCG_REG_ESP, ofs, (uintptr_t)l->raddr);
+ tcg_out_sti(s, TCG_TYPE_PTR, (uintptr_t)l->raddr, TCG_REG_ESP, ofs);
} else {
tcg_out_mov(s, TCG_TYPE_PTR, tcg_target_call_iarg_regs[0], TCG_AREG0);
/* The second argument is already loaded with addrlo. */
@@ -1413,7 +1420,7 @@ static void tcg_out_qemu_st_slow_path(TCGContext *s, TCGLabelQemuLdst *l)
ofs += 4;
}
- tcg_out_sti(s, TCG_TYPE_I32, TCG_REG_ESP, ofs, oi);
+ tcg_out_sti(s, TCG_TYPE_I32, oi, TCG_REG_ESP, ofs);
ofs += 4;
retaddr = TCG_REG_EAX;
diff --git a/tcg/ia64/tcg-target.inc.c b/tcg/ia64/tcg-target.inc.c
index 395223e..c91f392 100644
--- a/tcg/ia64/tcg-target.inc.c
+++ b/tcg/ia64/tcg-target.inc.c
@@ -973,6 +973,16 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
}
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ if (val == 0) {
+ tcg_out_st(s, type, TCG_REG_R0, base, ofs);
+ return true;
+ }
+ return false;
+}
+
static inline void tcg_out_alu(TCGContext *s, uint64_t opc_a1, uint64_t opc_a3,
TCGReg ret, TCGArg arg1, int const_arg1,
TCGArg arg2, int const_arg2)
diff --git a/tcg/mips/tcg-target.inc.c b/tcg/mips/tcg-target.inc.c
index 50e98ea..2f9be48 100644
--- a/tcg/mips/tcg-target.inc.c
+++ b/tcg/mips/tcg-target.inc.c
@@ -576,6 +576,16 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
tcg_out_ldst(s, OPC_SW, arg, arg1, arg2);
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ if (val == 0) {
+ tcg_out_st(s, type, TCG_REG_ZERO, base, ofs);
+ return true;
+ }
+ return false;
+}
+
static inline void tcg_out_addi(TCGContext *s, TCGReg reg, TCGArg val)
{
if (val == (int16_t)val) {
diff --git a/tcg/ppc/tcg-target.inc.c b/tcg/ppc/tcg-target.inc.c
index da10052..dba954c 100644
--- a/tcg/ppc/tcg-target.inc.c
+++ b/tcg/ppc/tcg-target.inc.c
@@ -857,6 +857,12 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
tcg_out_mem_long(s, opi, opx, arg, arg1, arg2);
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ return false;
+}
+
static void tcg_out_cmp(TCGContext *s, int cond, TCGArg arg1, TCGArg arg2,
int const_arg2, int cr, TCGType type)
{
diff --git a/tcg/s390/tcg-target.inc.c b/tcg/s390/tcg-target.inc.c
index e0a60e6..314c183 100644
--- a/tcg/s390/tcg-target.inc.c
+++ b/tcg/s390/tcg-target.inc.c
@@ -798,6 +798,12 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg data,
}
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ return false;
+}
+
/* load data from an absolute host address */
static void tcg_out_ld_abs(TCGContext *s, TCGType type, TCGReg dest, void *abs)
{
diff --git a/tcg/sparc/tcg-target.inc.c b/tcg/sparc/tcg-target.inc.c
index 9938a50..8e98172 100644
--- a/tcg/sparc/tcg-target.inc.c
+++ b/tcg/sparc/tcg-target.inc.c
@@ -504,6 +504,16 @@ static inline void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg,
tcg_out_ldst(s, arg, arg1, arg2, (type == TCG_TYPE_I32 ? STW : STX));
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ if (val == 0) {
+ tcg_out_st(s, type, TCG_REG_G0, base, ofs);
+ return true;
+ }
+ return false;
+}
+
static void tcg_out_ld_ptr(TCGContext *s, TCGReg ret, uintptr_t arg)
{
tcg_out_movi(s, TCG_TYPE_PTR, ret, arg & ~0x3ff);
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 154ffe8..44de991 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -108,6 +108,8 @@ static void tcg_out_op(TCGContext *s, TCGOpcode opc, const TCGArg *args,
const int *const_args);
static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
intptr_t arg2);
+static bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs);
static void tcg_out_call(TCGContext *s, tcg_insn_unit *target);
static int tcg_target_const_match(tcg_target_long val, TCGType type,
const TCGArgConstraint *arg_ct);
@@ -1680,35 +1682,89 @@ static void temp_allocate_frame(TCGContext *s, int temp)
static void temp_load(TCGContext *, TCGTemp *, TCGRegSet, TCGRegSet);
-/* sync register 'reg' by saving it to the corresponding temporary */
-static void tcg_reg_sync(TCGContext *s, TCGReg reg, TCGRegSet allocated_regs)
+/* Mark a temporary as free or dead. If 'free_or_dead' is negative,
+ mark it free; otherwise mark it dead. */
+static void temp_free_or_dead(TCGContext *s, TCGTemp *ts, int free_or_dead)
{
- TCGTemp *ts = s->reg_to_temp[reg];
+ if (ts->fixed_reg) {
+ return;
+ }
+ if (ts->val_type == TEMP_VAL_REG) {
+ s->reg_to_temp[ts->reg] = NULL;
+ }
+ ts->val_type = (free_or_dead < 0
+ || ts->temp_local
+ || temp_idx(s, ts) < s->nb_globals
+ ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
+}
- tcg_debug_assert(ts->val_type == TEMP_VAL_REG);
- if (!ts->mem_coherent && !ts->fixed_reg) {
+/* Mark a temporary as dead. */
+static inline void temp_dead(TCGContext *s, TCGTemp *ts)
+{
+ temp_free_or_dead(s, ts, 1);
+}
+
+/* Sync a temporary to memory. 'allocated_regs' is used in case a temporary
+ registers needs to be allocated to store a constant. If 'free_or_dead'
+ is non-zero, subsequently release the temporary; if it is positive, the
+ temp is dead; if it is negative, the temp is free. */
+static void temp_sync(TCGContext *s, TCGTemp *ts,
+ TCGRegSet allocated_regs, int free_or_dead)
+{
+ if (ts->fixed_reg) {
+ return;
+ }
+ if (!ts->mem_coherent) {
if (!ts->mem_allocated) {
temp_allocate_frame(s, temp_idx(s, ts));
- } else if (ts->indirect_reg) {
- tcg_regset_set_reg(allocated_regs, ts->reg);
+ }
+ if (ts->indirect_reg) {
+ if (ts->val_type == TEMP_VAL_REG) {
+ tcg_regset_set_reg(allocated_regs, ts->reg);
+ }
temp_load(s, ts->mem_base,
tcg_target_available_regs[TCG_TYPE_PTR],
allocated_regs);
}
- tcg_out_st(s, ts->type, reg, ts->mem_base->reg, ts->mem_offset);
+ switch (ts->val_type) {
+ case TEMP_VAL_CONST:
+ /* If we're going to free the temp immediately, then we won't
+ require it later in a register, so attempt to store the
+ constant to memory directly. */
+ if (free_or_dead
+ && tcg_out_sti(s, ts->type, ts->val,
+ ts->mem_base->reg, ts->mem_offset)) {
+ break;
+ }
+ temp_load(s, ts, tcg_target_available_regs[ts->type],
+ allocated_regs);
+ /* fallthrough */
+
+ case TEMP_VAL_REG:
+ tcg_out_st(s, ts->type, ts->reg,
+ ts->mem_base->reg, ts->mem_offset);
+ break;
+
+ case TEMP_VAL_MEM:
+ break;
+
+ case TEMP_VAL_DEAD:
+ default:
+ tcg_abort();
+ }
+ ts->mem_coherent = 1;
+ }
+ if (free_or_dead) {
+ temp_free_or_dead(s, ts, free_or_dead);
}
- ts->mem_coherent = 1;
}
/* free register 'reg' by spilling the corresponding temporary if necessary */
static void tcg_reg_free(TCGContext *s, TCGReg reg, TCGRegSet allocated_regs)
{
TCGTemp *ts = s->reg_to_temp[reg];
-
if (ts != NULL) {
- tcg_reg_sync(s, reg, allocated_regs);
- ts->val_type = TEMP_VAL_MEM;
- s->reg_to_temp[reg] = NULL;
+ temp_sync(s, ts, allocated_regs, -1);
}
}
@@ -1756,7 +1812,6 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
case TEMP_VAL_CONST:
reg = tcg_reg_alloc(s, desired_regs, allocated_regs, ts->indirect_base);
tcg_out_movi(s, ts->type, reg, ts->val);
- ts->mem_coherent = 0;
break;
case TEMP_VAL_MEM:
reg = tcg_reg_alloc(s, desired_regs, allocated_regs, ts->indirect_base);
@@ -1778,41 +1833,6 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
s->reg_to_temp[reg] = ts;
}
-/* mark a temporary as dead. */
-static inline void temp_dead(TCGContext *s, TCGTemp *ts)
-{
- if (ts->fixed_reg) {
- return;
- }
- if (ts->val_type == TEMP_VAL_REG) {
- s->reg_to_temp[ts->reg] = NULL;
- }
- ts->val_type = (temp_idx(s, ts) < s->nb_globals || ts->temp_local
- ? TEMP_VAL_MEM : TEMP_VAL_DEAD);
-}
-
-/* sync a temporary to memory. 'allocated_regs' is used in case a
- temporary registers needs to be allocated to store a constant. */
-static void temp_sync(TCGContext *s, TCGTemp *ts, TCGRegSet allocated_regs)
-{
- if (ts->fixed_reg) {
- return;
- }
- switch (ts->val_type) {
- case TEMP_VAL_CONST:
- temp_load(s, ts, tcg_target_available_regs[ts->type], allocated_regs);
- /* fallthrough */
- case TEMP_VAL_REG:
- tcg_reg_sync(s, ts->reg, allocated_regs);
- break;
- case TEMP_VAL_DEAD:
- case TEMP_VAL_MEM:
- break;
- default:
- tcg_abort();
- }
-}
-
/* save a temporary to memory. 'allocated_regs' is used in case a
temporary registers needs to be allocated to store a constant. */
static inline void temp_save(TCGContext *s, TCGTemp *ts,
@@ -1827,8 +1847,7 @@ static inline void temp_save(TCGContext *s, TCGTemp *ts,
return;
}
#endif
- temp_sync(s, ts, allocated_regs);
- temp_dead(s, ts);
+ temp_sync(s, ts, allocated_regs, 1);
}
/* save globals to their canonical location and assume they can be
@@ -1861,7 +1880,7 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
continue;
}
#endif
- temp_sync(s, ts, allocated_regs);
+ temp_sync(s, ts, allocated_regs, 0);
}
}
@@ -1905,21 +1924,21 @@ static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
val = args[1];
if (ots->fixed_reg) {
- /* for fixed registers, we do not do any constant
- propagation */
+ /* For fixed registers, we do not do any constant propagation. */
tcg_out_movi(s, ots->type, ots->reg, val);
- } else {
- /* The movi is not explicitly generated here */
- if (ots->val_type == TEMP_VAL_REG) {
- s->reg_to_temp[ots->reg] = NULL;
- }
- ots->val_type = TEMP_VAL_CONST;
- ots->val = val;
+ return;
}
- if (NEED_SYNC_ARG(0)) {
- temp_sync(s, ots, s->reserved_regs);
+
+ /* The movi is not explicitly generated here. */
+ if (ots->val_type == TEMP_VAL_REG) {
+ s->reg_to_temp[ots->reg] = NULL;
}
- if (IS_DEAD_ARG(0)) {
+ ots->val_type = TEMP_VAL_CONST;
+ ots->val = val;
+ ots->mem_coherent = 0;
+ if (NEED_SYNC_ARG(0)) {
+ temp_sync(s, ots, s->reserved_regs, IS_DEAD_ARG(0));
+ } else if (IS_DEAD_ARG(0)) {
temp_dead(s, ots);
}
}
@@ -2004,7 +2023,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
ots->mem_coherent = 0;
s->reg_to_temp[ots->reg] = ots;
if (NEED_SYNC_ARG(0)) {
- tcg_reg_sync(s, ots->reg, allocated_regs);
+ temp_sync(s, ots, allocated_regs, 0);
}
}
}
@@ -2163,9 +2182,8 @@ static void tcg_reg_alloc_op(TCGContext *s,
tcg_out_mov(s, ts->type, ts->reg, reg);
}
if (NEED_SYNC_ARG(i)) {
- tcg_reg_sync(s, reg, allocated_regs);
- }
- if (IS_DEAD_ARG(i)) {
+ temp_sync(s, ts, allocated_regs, IS_DEAD_ARG(i));
+ } else if (IS_DEAD_ARG(i)) {
temp_dead(s, ts);
}
}
@@ -2298,9 +2316,8 @@ static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
ts->mem_coherent = 0;
s->reg_to_temp[reg] = ts;
if (NEED_SYNC_ARG(i)) {
- tcg_reg_sync(s, reg, allocated_regs);
- }
- if (IS_DEAD_ARG(i)) {
+ temp_sync(s, ts, allocated_regs, IS_DEAD_ARG(i));
+ } else if (IS_DEAD_ARG(i)) {
temp_dead(s, ts);
}
}
diff --git a/tcg/tci/tcg-target.inc.c b/tcg/tci/tcg-target.inc.c
index fa74d52..3c47ea7 100644
--- a/tcg/tci/tcg-target.inc.c
+++ b/tcg/tci/tcg-target.inc.c
@@ -834,6 +834,12 @@ static void tcg_out_st(TCGContext *s, TCGType type, TCGReg arg, TCGReg arg1,
old_code_ptr[1] = s->code_ptr - old_code_ptr;
}
+static inline bool tcg_out_sti(TCGContext *s, TCGType type, TCGArg val,
+ TCGReg base, intptr_t ofs)
+{
+ return false;
+}
+
/* Test if a constant matches the constraint. */
static int tcg_target_const_match(tcg_target_long val, TCGType type,
const TCGArgConstraint *arg_ct)
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 3/9] tcg: Require liveness analysis
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 2/9] tcg: Optimize spills of constants Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits Richard Henderson
` (7 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/tcg.c | 20 --------------------
1 file changed, 20 deletions(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 44de991..64060c6 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -23,7 +23,6 @@
*/
/* define it to use liveness analysis (better code) */
-#define USE_LIVENESS_ANALYSIS
#define USE_TCG_OPTIMIZATIONS
#include "qemu/osdep.h"
@@ -1306,7 +1305,6 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
#endif
}
-#ifdef USE_LIVENESS_ANALYSIS
/* liveness analysis: end of function: all temps are dead, and globals
should be in memory. */
static inline void tcg_la_func_end(TCGContext *s, uint8_t *dead_temps,
@@ -1575,18 +1573,6 @@ static void tcg_liveness_analysis(TCGContext *s)
}
}
}
-#else
-/* dummy liveness analysis */
-static void tcg_liveness_analysis(TCGContext *s)
-{
- int nb_ops = s->gen_next_op_idx;
-
- s->op_dead_args = tcg_malloc(nb_ops * sizeof(uint16_t));
- memset(s->op_dead_args, 0, nb_ops * sizeof(uint16_t));
- s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t));
- memset(s->op_sync_args, 0, nb_ops * sizeof(uint8_t));
-}
-#endif
#ifdef CONFIG_DEBUG_TCG
static void dump_regs(TCGContext *s)
@@ -1838,7 +1824,6 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
static inline void temp_save(TCGContext *s, TCGTemp *ts,
TCGRegSet allocated_regs)
{
-#ifdef USE_LIVENESS_ANALYSIS
/* ??? Liveness does not yet incorporate indirect bases. */
if (!ts->indirect_base) {
/* The liveness analysis already ensures that globals are back
@@ -1846,7 +1831,6 @@ static inline void temp_save(TCGContext *s, TCGTemp *ts,
tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
return;
}
-#endif
temp_sync(s, ts, allocated_regs, 1);
}
@@ -1871,7 +1855,6 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
for (i = 0; i < s->nb_globals; i++) {
TCGTemp *ts = &s->temps[i];
-#ifdef USE_LIVENESS_ANALYSIS
/* ??? Liveness does not yet incorporate indirect bases. */
if (!ts->indirect_base) {
tcg_debug_assert(ts->val_type != TEMP_VAL_REG
@@ -1879,7 +1862,6 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
|| ts->mem_coherent);
continue;
}
-#endif
temp_sync(s, ts, allocated_regs, 0);
}
}
@@ -1895,7 +1877,6 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
if (ts->temp_local) {
temp_save(s, ts, allocated_regs);
} else {
-#ifdef USE_LIVENESS_ANALYSIS
/* ??? Liveness does not yet incorporate indirect bases. */
if (!ts->indirect_base) {
/* The liveness analysis already ensures that temps are dead.
@@ -1903,7 +1884,6 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
tcg_debug_assert(ts->val_type == TEMP_VAL_DEAD);
continue;
}
-#endif
temp_dead(s, ts);
}
}
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (2 preceding siblings ...)
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 3/9] tcg: Require liveness analysis Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining Richard Henderson
` (6 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
This reduces both memory usage and per-insn cacheline usage
during code generation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/tcg.c | 58 ++++++++++++++++++++++------------------------------------
tcg/tcg.h | 16 ++++++++++------
2 files changed, 32 insertions(+), 42 deletions(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 64060c6..400e69c 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1329,7 +1329,7 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps,
}
}
-/* Liveness analysis : update the opc_dead_args array to tell if a
+/* Liveness analysis : update the opc_arg_life array to tell if a
given input arguments is dead. Instructions updating dead
temporaries are removed. */
static void tcg_liveness_analysis(TCGContext *s)
@@ -1338,9 +1338,8 @@ static void tcg_liveness_analysis(TCGContext *s)
int oi, oi_prev, nb_ops;
nb_ops = s->gen_next_op_idx;
- s->op_dead_args = tcg_malloc(nb_ops * sizeof(uint16_t));
- s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t));
-
+ s->op_arg_life = tcg_malloc(nb_ops * sizeof(TCGLifeData));
+
dead_temps = tcg_malloc(s->nb_temps);
mem_temps = tcg_malloc(s->nb_temps);
tcg_la_func_end(s, dead_temps, mem_temps);
@@ -1349,8 +1348,7 @@ static void tcg_liveness_analysis(TCGContext *s)
int i, nb_iargs, nb_oargs;
TCGOpcode opc_new, opc_new2;
bool have_opc_new2;
- uint16_t dead_args;
- uint8_t sync_args;
+ TCGLifeData arg_life = 0;
TCGArg arg;
TCGOp * const op = &s->gen_op_buf[oi];
@@ -1382,15 +1380,13 @@ static void tcg_liveness_analysis(TCGContext *s)
do_not_remove_call:
/* output args are dead */
- dead_args = 0;
- sync_args = 0;
for (i = 0; i < nb_oargs; i++) {
arg = args[i];
if (dead_temps[arg]) {
- dead_args |= (1 << i);
+ arg_life |= DEAD_ARG << i;
}
if (mem_temps[arg]) {
- sync_args |= (1 << i);
+ arg_life |= SYNC_ARG << i;
}
dead_temps[arg] = 1;
mem_temps[arg] = 0;
@@ -1411,7 +1407,7 @@ static void tcg_liveness_analysis(TCGContext *s)
arg = args[i];
if (arg != TCG_CALL_DUMMY_ARG) {
if (dead_temps[arg]) {
- dead_args |= (1 << i);
+ arg_life |= DEAD_ARG << i;
}
}
}
@@ -1420,8 +1416,6 @@ static void tcg_liveness_analysis(TCGContext *s)
arg = args[i];
dead_temps[arg] = 0;
}
- s->op_dead_args[oi] = dead_args;
- s->op_sync_args[oi] = sync_args;
}
}
break;
@@ -1532,15 +1526,13 @@ static void tcg_liveness_analysis(TCGContext *s)
} else {
do_not_remove:
/* output args are dead */
- dead_args = 0;
- sync_args = 0;
for (i = 0; i < nb_oargs; i++) {
arg = args[i];
if (dead_temps[arg]) {
- dead_args |= (1 << i);
+ arg_life |= DEAD_ARG << i;
}
if (mem_temps[arg]) {
- sync_args |= (1 << i);
+ arg_life |= SYNC_ARG << i;
}
dead_temps[arg] = 1;
mem_temps[arg] = 0;
@@ -1558,7 +1550,7 @@ static void tcg_liveness_analysis(TCGContext *s)
for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
arg = args[i];
if (dead_temps[arg]) {
- dead_args |= (1 << i);
+ arg_life |= DEAD_ARG << i;
}
}
/* input arguments are live for preceding opcodes */
@@ -1566,11 +1558,10 @@ static void tcg_liveness_analysis(TCGContext *s)
arg = args[i];
dead_temps[arg] = 0;
}
- s->op_dead_args[oi] = dead_args;
- s->op_sync_args[oi] = sync_args;
}
break;
}
+ s->op_arg_life[oi] = arg_life;
}
}
@@ -1891,11 +1882,11 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
save_globals(s, allocated_regs);
}
-#define IS_DEAD_ARG(n) ((dead_args >> (n)) & 1)
-#define NEED_SYNC_ARG(n) ((sync_args >> (n)) & 1)
+#define IS_DEAD_ARG(n) (arg_life & (DEAD_ARG << (n)))
+#define NEED_SYNC_ARG(n) (arg_life & (SYNC_ARG << (n)))
static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
- uint16_t dead_args, uint8_t sync_args)
+ TCGLifeData arg_life)
{
TCGTemp *ots;
tcg_target_ulong val;
@@ -1924,8 +1915,7 @@ static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
}
static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
- const TCGArg *args, uint16_t dead_args,
- uint8_t sync_args)
+ const TCGArg *args, TCGLifeData arg_life)
{
TCGRegSet allocated_regs;
TCGTemp *ts, *ots;
@@ -2010,8 +2000,7 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
static void tcg_reg_alloc_op(TCGContext *s,
const TCGOpDef *def, TCGOpcode opc,
- const TCGArg *args, uint16_t dead_args,
- uint8_t sync_args)
+ const TCGArg *args, TCGLifeData arg_life)
{
TCGRegSet allocated_regs;
int i, k, nb_iargs, nb_oargs;
@@ -2176,8 +2165,7 @@ static void tcg_reg_alloc_op(TCGContext *s,
#endif
static void tcg_reg_alloc_call(TCGContext *s, int nb_oargs, int nb_iargs,
- const TCGArg * const args, uint16_t dead_args,
- uint8_t sync_args)
+ const TCGArg * const args, TCGLifeData arg_life)
{
int flags, nb_regs, i;
TCGReg reg;
@@ -2397,8 +2385,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
TCGArg * const args = &s->gen_opparam_buf[op->args];
TCGOpcode opc = op->opc;
const TCGOpDef *def = &tcg_op_defs[opc];
- uint16_t dead_args = s->op_dead_args[oi];
- uint8_t sync_args = s->op_sync_args[oi];
+ TCGLifeData arg_life = s->op_arg_life[oi];
oi_next = op->next;
#ifdef CONFIG_PROFILER
@@ -2408,11 +2395,11 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
switch (opc) {
case INDEX_op_mov_i32:
case INDEX_op_mov_i64:
- tcg_reg_alloc_mov(s, def, args, dead_args, sync_args);
+ tcg_reg_alloc_mov(s, def, args, arg_life);
break;
case INDEX_op_movi_i32:
case INDEX_op_movi_i64:
- tcg_reg_alloc_movi(s, args, dead_args, sync_args);
+ tcg_reg_alloc_movi(s, args, arg_life);
break;
case INDEX_op_insn_start:
if (num_insns >= 0) {
@@ -2437,8 +2424,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
tcg_out_label(s, arg_label(args[0]), s->code_ptr);
break;
case INDEX_op_call:
- tcg_reg_alloc_call(s, op->callo, op->calli, args,
- dead_args, sync_args);
+ tcg_reg_alloc_call(s, op->callo, op->calli, args, arg_life);
break;
default:
/* Sanity check that we've not introduced any unhandled opcodes. */
@@ -2448,7 +2434,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
/* Note: in order to speed up the code, it would be much
faster to have specialized register allocator functions for
some common argument patterns */
- tcg_reg_alloc_op(s, def, opc, args, dead_args, sync_args);
+ tcg_reg_alloc_op(s, def, opc, args, arg_life);
break;
}
#ifdef CONFIG_DEBUG_TCG
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 66d7fc0..cc14560 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -505,6 +505,14 @@ typedef struct TCGTempSet {
unsigned long l[BITS_TO_LONGS(TCG_MAX_TEMPS)];
} TCGTempSet;
+/* While we limit helpers to 6 arguments, for 32-bit hosts, with padding,
+ this imples a max of 6*2 (64-bit in) + 2 (64-bit out) = 14 operands.
+ There are never more than 2 outputs, which means that we can store all
+ dead + sync data within 16 bits. */
+#define DEAD_ARG 4
+#define SYNC_ARG 1
+typedef uint16_t TCGLifeData;
+
typedef struct TCGOp {
TCGOpcode opc : 8;
@@ -538,12 +546,8 @@ struct TCGContext {
uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_addr if !USE_DIRECT_JUMP */
/* liveness analysis */
- uint16_t *op_dead_args; /* for each operation, each bit tells if the
- corresponding argument is dead */
- uint8_t *op_sync_args; /* for each operation, each bit tells if the
- corresponding output argument needs to be
- sync to memory. */
-
+ TCGLifeData *op_arg_life;
+
TCGRegSet reserved_regs;
intptr_t current_frame_offset;
intptr_t frame_start;
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (3 preceding siblings ...)
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 6/9] tcg: Fold life data into TCGOp Richard Henderson
` (5 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
Instead of using -1 as end of chain, use 0, and link through the 0
entry as a fully circular double-linked list.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
include/exec/gen-icount.h | 2 +-
tcg/optimize.c | 8 ++------
tcg/tcg-op.c | 2 +-
tcg/tcg.c | 32 ++++++++++++--------------------
tcg/tcg.h | 20 ++++++++++++--------
5 files changed, 28 insertions(+), 36 deletions(-)
diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
index a011324..5f16077 100644
--- a/include/exec/gen-icount.h
+++ b/include/exec/gen-icount.h
@@ -59,7 +59,7 @@ static void gen_tb_end(TranslationBlock *tb, int num_insns)
}
/* Terminate the linked list. */
- tcg_ctx.gen_op_buf[tcg_ctx.gen_last_op_idx].next = -1;
+ tcg_ctx.gen_op_buf[tcg_ctx.gen_op_buf[0].prev].next = 0;
}
static inline void gen_io_start(void)
diff --git a/tcg/optimize.c b/tcg/optimize.c
index c0d975b..8df7fc7 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -103,11 +103,7 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
.prev = prev,
.next = next
};
- if (prev >= 0) {
- s->gen_op_buf[prev].next = oi;
- } else {
- s->gen_first_op_idx = oi;
- }
+ s->gen_op_buf[prev].next = oi;
old_op->prev = oi;
return new_op;
@@ -583,7 +579,7 @@ void tcg_optimize(TCGContext *s)
nb_globals = s->nb_globals;
reset_all_temps(nb_temps);
- for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
+ for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
tcg_target_ulong mask, partmask, affected;
int nb_oargs, nb_iargs, i;
TCGArg tmp;
diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
index 569cdc6..62d91b4 100644
--- a/tcg/tcg-op.c
+++ b/tcg/tcg-op.c
@@ -52,7 +52,7 @@ static void tcg_emit_op(TCGContext *ctx, TCGOpcode opc, int args)
int pi = oi - 1;
tcg_debug_assert(oi < OPC_BUF_SIZE);
- ctx->gen_last_op_idx = oi;
+ ctx->gen_op_buf[0].prev = oi;
ctx->gen_next_op_idx = ni;
ctx->gen_op_buf[oi] = (TCGOp){
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 400e69c..bb1efe2 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -437,9 +437,9 @@ void tcg_func_start(TCGContext *s)
s->goto_tb_issue_mask = 0;
#endif
- s->gen_first_op_idx = 0;
- s->gen_last_op_idx = -1;
- s->gen_next_op_idx = 0;
+ s->gen_op_buf[0].next = 1;
+ s->gen_op_buf[0].prev = 0;
+ s->gen_next_op_idx = 1;
s->gen_next_parm_idx = 0;
s->be = tcg_malloc(sizeof(TCGBackendData));
@@ -868,7 +868,7 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
/* Make sure the calli field didn't overflow. */
tcg_debug_assert(s->gen_op_buf[i].calli == real_args);
- s->gen_last_op_idx = i;
+ s->gen_op_buf[0].prev = i;
s->gen_next_op_idx = i + 1;
s->gen_next_parm_idx = pi;
@@ -1004,7 +1004,7 @@ void tcg_dump_ops(TCGContext *s)
TCGOp *op;
int oi;
- for (oi = s->gen_first_op_idx; oi >= 0; oi = op->next) {
+ for (oi = s->gen_op_buf[0].next; oi != 0; oi = op->next) {
int i, k, nb_oargs, nb_iargs, nb_cargs;
const TCGOpDef *def;
const TCGArg *args;
@@ -1016,7 +1016,7 @@ void tcg_dump_ops(TCGContext *s)
args = &s->gen_opparam_buf[op->args];
if (c == INDEX_op_insn_start) {
- qemu_log("%s ----", oi != s->gen_first_op_idx ? "\n" : "");
+ qemu_log("%s ----", oi != s->gen_op_buf[0].next ? "\n" : "");
for (i = 0; i < TARGET_INSN_START_WORDS; ++i) {
target_ulong a;
@@ -1287,18 +1287,10 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
int next = op->next;
int prev = op->prev;
- if (next >= 0) {
- s->gen_op_buf[next].prev = prev;
- } else {
- s->gen_last_op_idx = prev;
- }
- if (prev >= 0) {
- s->gen_op_buf[prev].next = next;
- } else {
- s->gen_first_op_idx = next;
- }
+ s->gen_op_buf[next].prev = prev;
+ s->gen_op_buf[prev].next = next;
- memset(op, -1, sizeof(*op));
+ memset(op, 0, sizeof(*op));
#ifdef CONFIG_PROFILER
s->del_op_count++;
@@ -1344,7 +1336,7 @@ static void tcg_liveness_analysis(TCGContext *s)
mem_temps = tcg_malloc(s->nb_temps);
tcg_la_func_end(s, dead_temps, mem_temps);
- for (oi = s->gen_last_op_idx; oi >= 0; oi = oi_prev) {
+ for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
int i, nb_iargs, nb_oargs;
TCGOpcode opc_new, opc_new2;
bool have_opc_new2;
@@ -2321,7 +2313,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
{
int n;
- n = s->gen_last_op_idx + 1;
+ n = s->gen_op_buf[0].prev + 1;
s->op_count += n;
if (n > s->op_count_max) {
s->op_count_max = n;
@@ -2380,7 +2372,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
tcg_out_tb_init(s);
num_insns = -1;
- for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
+ for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
TCGOp * const op = &s->gen_op_buf[oi];
TCGArg * const args = &s->gen_opparam_buf[op->args];
TCGOpcode opc = op->opc;
diff --git a/tcg/tcg.h b/tcg/tcg.h
index cc14560..49b396d 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -520,17 +520,21 @@ typedef struct TCGOp {
unsigned callo : 2;
unsigned calli : 6;
- /* Index of the arguments for this op, or -1 for zero-operand ops. */
- signed args : 16;
+ /* Index of the arguments for this op, or 0 for zero-operand ops. */
+ unsigned args : 16;
- /* Index of the prex/next op, or -1 for the end of the list. */
- signed prev : 16;
- signed next : 16;
+ /* Index of the prex/next op, or 0 for the end of the list. */
+ unsigned prev : 16;
+ unsigned next : 16;
} TCGOp;
-QEMU_BUILD_BUG_ON(NB_OPS > 0xff);
-QEMU_BUILD_BUG_ON(OPC_BUF_SIZE >= 0x7fff);
-QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE >= 0x7fff);
+/* Make sure operands fit in the bitfields above. */
+QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
+QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 16));
+QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 16));
+
+/* Make sure that we don't overflow 64 bits without noticing. */
+QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
struct TCGContext {
uint8_t *pool_cur, *pool_end;
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 6/9] tcg: Fold life data into TCGOp
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (4 preceding siblings ...)
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 14:07 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array Richard Henderson
` (4 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
Reduce the size of other bitfields to make room.
This reduces the cache footprint of compilation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/tcg.c | 9 +++------
tcg/tcg.h | 26 ++++++++++++++------------
2 files changed, 17 insertions(+), 18 deletions(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index bb1efe2..c41640f 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1327,10 +1327,7 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps,
static void tcg_liveness_analysis(TCGContext *s)
{
uint8_t *dead_temps, *mem_temps;
- int oi, oi_prev, nb_ops;
-
- nb_ops = s->gen_next_op_idx;
- s->op_arg_life = tcg_malloc(nb_ops * sizeof(TCGLifeData));
+ int oi, oi_prev;
dead_temps = tcg_malloc(s->nb_temps);
mem_temps = tcg_malloc(s->nb_temps);
@@ -1553,7 +1550,7 @@ static void tcg_liveness_analysis(TCGContext *s)
}
break;
}
- s->op_arg_life[oi] = arg_life;
+ op->life = arg_life;
}
}
@@ -2377,7 +2374,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
TCGArg * const args = &s->gen_opparam_buf[op->args];
TCGOpcode opc = op->opc;
const TCGOpDef *def = &tcg_op_defs[opc];
- TCGLifeData arg_life = s->op_arg_life[oi];
+ TCGLifeData arg_life = op->life;
oi_next = op->next;
#ifdef CONFIG_PROFILER
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 49b396d..2ff3ad2 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -513,25 +513,30 @@ typedef struct TCGTempSet {
#define SYNC_ARG 1
typedef uint16_t TCGLifeData;
+/* The layout here is designed to avoid crossing of a 32-bit boundary.
+ If we do so, gcc adds padding, expanding the size to 12. */
typedef struct TCGOp {
- TCGOpcode opc : 8;
+ TCGOpcode opc : 8; /* 8 */
+
+ /* Index of the prex/next op, or 0 for the end of the list. */
+ unsigned prev : 10; /* 18 */
+ unsigned next : 10; /* 28 */
/* The number of out and in parameter for a call. */
- unsigned callo : 2;
- unsigned calli : 6;
+ unsigned calli : 4; /* 32 */
+ unsigned callo : 2; /* 34 */
/* Index of the arguments for this op, or 0 for zero-operand ops. */
- unsigned args : 16;
+ unsigned args : 14; /* 48 */
- /* Index of the prex/next op, or 0 for the end of the list. */
- unsigned prev : 16;
- unsigned next : 16;
+ /* Lifetime data of the operands. */
+ unsigned life : 16; /* 64 */
} TCGOp;
/* Make sure operands fit in the bitfields above. */
QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
-QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 16));
-QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 16));
+QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 10));
+QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 14));
/* Make sure that we don't overflow 64 bits without noticing. */
QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
@@ -549,9 +554,6 @@ struct TCGContext {
uint16_t *tb_jmp_insn_offset; /* tb->jmp_insn_offset if USE_DIRECT_JUMP */
uintptr_t *tb_jmp_target_addr; /* tb->jmp_target_addr if !USE_DIRECT_JUMP */
- /* liveness analysis */
- TCGLifeData *op_arg_life;
-
TCGRegSet reserved_regs;
intptr_t current_frame_offset;
intptr_t frame_start;
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (5 preceding siblings ...)
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 6/9] tcg: Fold life data into TCGOp Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 15:15 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps Richard Henderson
` (3 subsequent siblings)
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
We only need two bits per temporary. Fold the two bytes into one,
and reduce the memory and cachelines required during compilation.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/tcg.c | 118 +++++++++++++++++++++++++++++++-------------------------------
1 file changed, 59 insertions(+), 59 deletions(-)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index c41640f..fd92b06 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -332,7 +332,7 @@ void tcg_context_init(TCGContext *s)
memset(s, 0, sizeof(*s));
s->nb_globals = 0;
-
+
/* Count total number of arguments and allocate the corresponding
space */
total_args = 0;
@@ -824,16 +824,16 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
real_args++;
}
#endif
- /* If stack grows up, then we will be placing successive
- arguments at lower addresses, which means we need to
- reverse the order compared to how we would normally
- treat either big or little-endian. For those arguments
- that will wind up in registers, this still works for
- HPPA (the only current STACK_GROWSUP target) since the
- argument registers are *also* allocated in decreasing
- order. If another such target is added, this logic may
- have to get more complicated to differentiate between
- stack arguments and register arguments. */
+ /* If stack grows up, then we will be placing successive
+ arguments at lower addresses, which means we need to
+ reverse the order compared to how we would normally
+ treat either big or little-endian. For those arguments
+ that will wind up in registers, this still works for
+ HPPA (the only current STACK_GROWSUP target) since the
+ argument registers are *also* allocated in decreasing
+ order. If another such target is added, this logic may
+ have to get more complicated to differentiate between
+ stack arguments and register arguments. */
#if defined(HOST_WORDS_BIGENDIAN) != defined(TCG_TARGET_STACK_GROWSUP)
s->gen_opparam_buf[pi++] = args[i] + 1;
s->gen_opparam_buf[pi++] = args[i];
@@ -1297,27 +1297,28 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
#endif
}
+#define TS_DEAD 1
+#define TS_SYNC 2
+
/* liveness analysis: end of function: all temps are dead, and globals
should be in memory. */
-static inline void tcg_la_func_end(TCGContext *s, uint8_t *dead_temps,
- uint8_t *mem_temps)
+static inline void tcg_la_func_end(TCGContext *s, uint8_t *temp_state)
{
- memset(dead_temps, 1, s->nb_temps);
- memset(mem_temps, 1, s->nb_globals);
- memset(mem_temps + s->nb_globals, 0, s->nb_temps - s->nb_globals);
+ memset(temp_state, TS_DEAD | TS_SYNC, s->nb_globals);
+ memset(temp_state + s->nb_globals, TS_DEAD, s->nb_temps - s->nb_globals);
}
/* liveness analysis: end of basic block: all temps are dead, globals
and local temps should be in memory. */
-static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps,
- uint8_t *mem_temps)
+static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state)
{
- int i;
+ int i, n;
- memset(dead_temps, 1, s->nb_temps);
- memset(mem_temps, 1, s->nb_globals);
- for(i = s->nb_globals; i < s->nb_temps; i++) {
- mem_temps[i] = s->temps[i].temp_local;
+ tcg_la_func_end(s, temp_state);
+ for (i = s->nb_globals, n = s->nb_temps; i < n; i++) {
+ if (s->temps[i].temp_local) {
+ temp_state[i] |= TS_SYNC;
+ }
}
}
@@ -1326,12 +1327,12 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps,
temporaries are removed. */
static void tcg_liveness_analysis(TCGContext *s)
{
- uint8_t *dead_temps, *mem_temps;
+ uint8_t *temp_state;
int oi, oi_prev;
+ int nb_globals = s->nb_globals;
- dead_temps = tcg_malloc(s->nb_temps);
- mem_temps = tcg_malloc(s->nb_temps);
- tcg_la_func_end(s, dead_temps, mem_temps);
+ temp_state = tcg_malloc(s->nb_temps);
+ tcg_la_func_end(s, temp_state);
for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
int i, nb_iargs, nb_oargs;
@@ -1360,7 +1361,7 @@ static void tcg_liveness_analysis(TCGContext *s)
if (call_flags & TCG_CALL_NO_SIDE_EFFECTS) {
for (i = 0; i < nb_oargs; i++) {
arg = args[i];
- if (!dead_temps[arg] || mem_temps[arg]) {
+ if (temp_state[arg] != TS_DEAD) {
goto do_not_remove_call;
}
}
@@ -1371,39 +1372,41 @@ static void tcg_liveness_analysis(TCGContext *s)
/* output args are dead */
for (i = 0; i < nb_oargs; i++) {
arg = args[i];
- if (dead_temps[arg]) {
+ if (temp_state[arg] & TS_DEAD) {
arg_life |= DEAD_ARG << i;
}
- if (mem_temps[arg]) {
+ if (temp_state[arg] & TS_SYNC) {
arg_life |= SYNC_ARG << i;
}
- dead_temps[arg] = 1;
- mem_temps[arg] = 0;
+ temp_state[arg] = TS_DEAD;
}
- if (!(call_flags & TCG_CALL_NO_READ_GLOBALS)) {
- /* globals should be synced to memory */
- memset(mem_temps, 1, s->nb_globals);
- }
if (!(call_flags & (TCG_CALL_NO_WRITE_GLOBALS |
TCG_CALL_NO_READ_GLOBALS))) {
/* globals should go back to memory */
- memset(dead_temps, 1, s->nb_globals);
+ memset(temp_state, TS_DEAD | TS_SYNC, nb_globals);
+ } else if (!(call_flags & TCG_CALL_NO_READ_GLOBALS)) {
+ /* globals should be synced to memory */
+ for (i = 0; i < nb_globals; i++) {
+ temp_state[i] |= TS_SYNC;
+ }
}
/* record arguments that die in this helper */
for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
arg = args[i];
if (arg != TCG_CALL_DUMMY_ARG) {
- if (dead_temps[arg]) {
+ if (temp_state[arg] & TS_DEAD) {
arg_life |= DEAD_ARG << i;
}
}
}
/* input arguments are live for preceding opcodes */
- for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
+ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
arg = args[i];
- dead_temps[arg] = 0;
+ if (arg != TCG_CALL_DUMMY_ARG) {
+ temp_state[arg] &= ~TS_DEAD;
+ }
}
}
}
@@ -1412,8 +1415,7 @@ static void tcg_liveness_analysis(TCGContext *s)
break;
case INDEX_op_discard:
/* mark the temporary as dead */
- dead_temps[args[0]] = 1;
- mem_temps[args[0]] = 0;
+ temp_state[args[0]] = TS_DEAD;
break;
case INDEX_op_add2_i32:
@@ -1434,8 +1436,8 @@ static void tcg_liveness_analysis(TCGContext *s)
the low part. The result can be optimized to a simple
add or sub. This happens often for x86_64 guest when the
cpu mode is set to 32 bit. */
- if (dead_temps[args[1]] && !mem_temps[args[1]]) {
- if (dead_temps[args[0]] && !mem_temps[args[0]]) {
+ if (temp_state[args[1]] == TS_DEAD) {
+ if (temp_state[args[0]] == TS_DEAD) {
goto do_remove;
}
/* Replace the opcode and adjust the args in place,
@@ -1472,8 +1474,8 @@ static void tcg_liveness_analysis(TCGContext *s)
do_mul2:
nb_iargs = 2;
nb_oargs = 2;
- if (dead_temps[args[1]] && !mem_temps[args[1]]) {
- if (dead_temps[args[0]] && !mem_temps[args[0]]) {
+ if (temp_state[args[1]] == TS_DEAD) {
+ if (temp_state[args[0]] == TS_DEAD) {
/* Both parts of the operation are dead. */
goto do_remove;
}
@@ -1481,8 +1483,7 @@ static void tcg_liveness_analysis(TCGContext *s)
op->opc = opc = opc_new;
args[1] = args[2];
args[2] = args[3];
- } else if (have_opc_new2 && dead_temps[args[0]]
- && !mem_temps[args[0]]) {
+ } else if (temp_state[args[0]] == TS_DEAD && have_opc_new2) {
/* The low part of the operation is dead; generate the high. */
op->opc = opc = opc_new2;
args[0] = args[1];
@@ -1505,8 +1506,7 @@ static void tcg_liveness_analysis(TCGContext *s)
implies side effects */
if (!(def->flags & TCG_OPF_SIDE_EFFECTS) && nb_oargs != 0) {
for (i = 0; i < nb_oargs; i++) {
- arg = args[i];
- if (!dead_temps[arg] || mem_temps[arg]) {
+ if (temp_state[args[i]] != TS_DEAD) {
goto do_not_remove;
}
}
@@ -1517,35 +1517,35 @@ static void tcg_liveness_analysis(TCGContext *s)
/* output args are dead */
for (i = 0; i < nb_oargs; i++) {
arg = args[i];
- if (dead_temps[arg]) {
+ if (temp_state[arg] & TS_DEAD) {
arg_life |= DEAD_ARG << i;
}
- if (mem_temps[arg]) {
+ if (temp_state[arg] & TS_SYNC) {
arg_life |= SYNC_ARG << i;
}
- dead_temps[arg] = 1;
- mem_temps[arg] = 0;
+ temp_state[arg] = TS_DEAD;
}
/* if end of basic block, update */
if (def->flags & TCG_OPF_BB_END) {
- tcg_la_bb_end(s, dead_temps, mem_temps);
+ tcg_la_bb_end(s, temp_state);
} else if (def->flags & TCG_OPF_SIDE_EFFECTS) {
/* globals should be synced to memory */
- memset(mem_temps, 1, s->nb_globals);
+ for (i = 0; i < nb_globals; i++) {
+ temp_state[i] |= TS_SYNC;
+ }
}
/* record arguments that die in this opcode */
for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
arg = args[i];
- if (dead_temps[arg]) {
+ if (temp_state[arg] & TS_DEAD) {
arg_life |= DEAD_ARG << i;
}
}
/* input arguments are live for preceding opcodes */
for (i = nb_oargs; i < nb_oargs + nb_iargs; i++) {
- arg = args[i];
- dead_temps[arg] = 0;
+ temp_state[args[i]] &= ~TS_DEAD;
}
}
break;
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (6 preceding siblings ...)
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 15:15 ` Aurelien Jarno
2016-07-25 16:16 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass Richard Henderson
` (2 subsequent siblings)
10 siblings, 2 replies; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
tcg/tcg.c | 26 ++++++++++++++++++++++++++
1 file changed, 26 insertions(+)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index fd92b06..3e4bc99 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1009,6 +1009,7 @@ void tcg_dump_ops(TCGContext *s)
const TCGOpDef *def;
const TCGArg *args;
TCGOpcode c;
+ long pos = ftell(qemu_logfile);
op = &s->gen_op_buf[oi];
c = op->opc;
@@ -1133,6 +1134,31 @@ void tcg_dump_ops(TCGContext *s)
qemu_log("%s$0x%" TCG_PRIlx, k ? "," : "", args[k]);
}
}
+ if (op->life) {
+ unsigned life = op->life;
+
+ for (i = ftell(qemu_logfile) - pos; i < 48; ++i) {
+ putc(' ', qemu_logfile);
+ }
+
+ if (life & (SYNC_ARG * 3)) {
+ qemu_log(" sync:");
+ for (i = 0; i < 2; ++i) {
+ if (life & (SYNC_ARG << i)) {
+ qemu_log(" %d", i);
+ }
+ }
+ }
+ life /= DEAD_ARG;
+ if (life) {
+ qemu_log(" dead:");
+ for (i = 0; life; ++i, life >>= 1) {
+ if (life & 1) {
+ qemu_log(" %d", i);
+ }
+ }
+ }
+ }
qemu_log("\n");
}
}
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (7 preceding siblings ...)
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps Richard Henderson
@ 2016-06-24 3:48 ` Richard Henderson
2016-07-25 19:23 ` Aurelien Jarno
2016-06-24 10:31 ` [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Mark Cave-Ayland
2016-07-19 3:39 ` Richard Henderson
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-06-24 3:48 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland, aurelien, atar4qemu
Rather than rely on recursion during the middle of register allocation,
lower indirect registers to loads and stores off the indirect base into
plain temps.
For an x86_64 host, with sufficient registers, this results in identical
code, modulo the actual register assignments.
For an i686 host, with insufficient registers, this means that temps can
be (temporarily) spilled to the stack in order to satisfy an allocation.
This as opposed to the possibility of not being able to spill, to allocate
a register for the indirect base, in order to perform a spill.
Signed-off-by: Richard Henderson <rth@twiddle.net>
---
include/qemu/log.h | 1 +
tcg/optimize.c | 31 +-----
tcg/tcg.c | 306 +++++++++++++++++++++++++++++++++++++++++++----------
tcg/tcg.h | 4 +
util/log.c | 5 +-
5 files changed, 263 insertions(+), 84 deletions(-)
diff --git a/include/qemu/log.h b/include/qemu/log.h
index 8bec6b4..9c1cc38 100644
--- a/include/qemu/log.h
+++ b/include/qemu/log.h
@@ -42,6 +42,7 @@ static inline bool qemu_log_separate(void)
#define CPU_LOG_TB_NOCHAIN (1 << 13)
#define CPU_LOG_PAGE (1 << 14)
#define LOG_TRACE (1 << 15)
+#define CPU_LOG_TB_OP_IND (1 << 16)
/* Returns true if a bit is set in the current loglevel mask
*/
diff --git a/tcg/optimize.c b/tcg/optimize.c
index 8df7fc7..cffe89b 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -82,33 +82,6 @@ static void init_temp_info(TCGArg temp)
}
}
-static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
- TCGOpcode opc, int nargs)
-{
- int oi = s->gen_next_op_idx;
- int pi = s->gen_next_parm_idx;
- int prev = old_op->prev;
- int next = old_op - s->gen_op_buf;
- TCGOp *new_op;
-
- tcg_debug_assert(oi < OPC_BUF_SIZE);
- tcg_debug_assert(pi + nargs <= OPPARAM_BUF_SIZE);
- s->gen_next_op_idx = oi + 1;
- s->gen_next_parm_idx = pi + nargs;
-
- new_op = &s->gen_op_buf[oi];
- *new_op = (TCGOp){
- .opc = opc,
- .args = pi,
- .prev = prev,
- .next = next
- };
- s->gen_op_buf[prev].next = oi;
- old_op->prev = oi;
-
- return new_op;
-}
-
static int op_bits(TCGOpcode op)
{
const TCGOpDef *def = &tcg_op_defs[op];
@@ -1116,7 +1089,7 @@ void tcg_optimize(TCGContext *s)
uint64_t a = ((uint64_t)ah << 32) | al;
uint64_t b = ((uint64_t)bh << 32) | bl;
TCGArg rl, rh;
- TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2);
+ TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
TCGArg *args2 = &s->gen_opparam_buf[op2->args];
if (opc == INDEX_op_add2_i32) {
@@ -1142,7 +1115,7 @@ void tcg_optimize(TCGContext *s)
uint32_t b = temps[args[3]].val;
uint64_t r = (uint64_t)a * b;
TCGArg rl, rh;
- TCGOp *op2 = insert_op_before(s, op, INDEX_op_movi_i32, 2);
+ TCGOp *op2 = tcg_op_insert_before(s, op, INDEX_op_movi_i32, 2);
TCGArg *args2 = &s->gen_opparam_buf[op2->args];
rl = args[0];
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 3e4bc99..f00500e 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -531,8 +531,12 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
#endif
if (!base_ts->fixed_reg) {
- indirect_reg = 1;
+ /* We do not support double-indirect registers. */
+ tcg_debug_assert(!base_ts->indirect_reg);
base_ts->indirect_base = 1;
+ s->nb_indirects += (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64
+ ? 2 : 1);
+ indirect_reg = 1;
}
if (TCG_TARGET_REG_BITS == 32 && type == TCG_TYPE_I64) {
@@ -1323,9 +1327,66 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
#endif
}
+TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *old_op,
+ TCGOpcode opc, int nargs)
+{
+ int oi = s->gen_next_op_idx;
+ int pi = s->gen_next_parm_idx;
+ int prev = old_op->prev;
+ int next = old_op - s->gen_op_buf;
+ TCGOp *new_op;
+
+ tcg_debug_assert(oi < OPC_BUF_SIZE);
+ tcg_debug_assert(pi + nargs <= OPPARAM_BUF_SIZE);
+ s->gen_next_op_idx = oi + 1;
+ s->gen_next_parm_idx = pi + nargs;
+
+ new_op = &s->gen_op_buf[oi];
+ *new_op = (TCGOp){
+ .opc = opc,
+ .args = pi,
+ .prev = prev,
+ .next = next
+ };
+ s->gen_op_buf[prev].next = oi;
+ old_op->prev = oi;
+
+ return new_op;
+}
+
+TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *old_op,
+ TCGOpcode opc, int nargs)
+{
+ int oi = s->gen_next_op_idx;
+ int pi = s->gen_next_parm_idx;
+ int prev = old_op - s->gen_op_buf;
+ int next = old_op->next;
+ TCGOp *new_op;
+
+ tcg_debug_assert(oi < OPC_BUF_SIZE);
+ tcg_debug_assert(pi + nargs <= OPPARAM_BUF_SIZE);
+ s->gen_next_op_idx = oi + 1;
+ s->gen_next_parm_idx = pi + nargs;
+
+ new_op = &s->gen_op_buf[oi];
+ *new_op = (TCGOp){
+ .opc = opc,
+ .args = pi,
+ .prev = prev,
+ .next = next
+ };
+ s->gen_op_buf[next].prev = oi;
+ old_op->next = oi;
+
+ return new_op;
+}
+
#define TS_DEAD 1
#define TS_SYNC 2
+#define IS_DEAD_ARG(n) (arg_life & (DEAD_ARG << (n)))
+#define NEED_SYNC_ARG(n) (arg_life & (SYNC_ARG << (n)))
+
/* liveness analysis: end of function: all temps are dead, and globals
should be in memory. */
static inline void tcg_la_func_end(TCGContext *s, uint8_t *temp_state)
@@ -1351,13 +1412,11 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *temp_state)
/* Liveness analysis : update the opc_arg_life array to tell if a
given input arguments is dead. Instructions updating dead
temporaries are removed. */
-static void tcg_liveness_analysis(TCGContext *s)
+static void liveness_pass_1(TCGContext *s, uint8_t *temp_state)
{
- uint8_t *temp_state;
- int oi, oi_prev;
int nb_globals = s->nb_globals;
+ int oi, oi_prev;
- temp_state = tcg_malloc(s->nb_temps);
tcg_la_func_end(s, temp_state);
for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
@@ -1580,6 +1639,165 @@ static void tcg_liveness_analysis(TCGContext *s)
}
}
+/* Liveness analysis: Convert indirect regs to direct temporaries. */
+static bool liveness_pass_2(TCGContext *s, uint8_t *temp_state)
+{
+ int nb_globals = s->nb_globals;
+ int16_t *dir_temps;
+ int i, oi, oi_next;
+ bool changes = false;
+
+ dir_temps = tcg_malloc(nb_globals * sizeof(int16_t));
+ memset(dir_temps, 0, nb_globals * sizeof(int16_t));
+
+ /* Create a temporary for each indirect global. */
+ for (i = 0; i < nb_globals; ++i) {
+ TCGTemp *its = &s->temps[i];
+ if (its->indirect_reg) {
+ TCGTemp *dts = tcg_temp_alloc(s);
+ dts->type = its->type;
+ dts->base_type = its->base_type;
+ dir_temps[i] = temp_idx(s, dts);
+ }
+ }
+
+ memset(temp_state, TS_DEAD, nb_globals);
+
+ for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
+ TCGOp *op = &s->gen_op_buf[oi];
+ TCGArg *args = &s->gen_opparam_buf[op->args];
+ TCGOpcode opc = op->opc;
+ const TCGOpDef *def = &tcg_op_defs[opc];
+ TCGLifeData arg_life = op->life;
+ int nb_iargs, nb_oargs, call_flags;
+ TCGArg arg, dir;
+
+ oi_next = op->next;
+
+ if (opc == INDEX_op_call) {
+ nb_oargs = op->callo;
+ nb_iargs = op->calli;
+ call_flags = args[nb_oargs + nb_iargs + 1];
+ } else {
+ nb_iargs = def->nb_iargs;
+ nb_oargs = def->nb_oargs;
+
+ /* Set flags similar to how calls require. */
+ if (def->flags & TCG_OPF_BB_END) {
+ /* Like writing globals: save_globals */
+ call_flags = 0;
+ } else if (def->flags & TCG_OPF_SIDE_EFFECTS) {
+ /* Like reading globals: sync_globals */
+ call_flags = TCG_CALL_NO_WRITE_GLOBALS;
+ } else {
+ /* No effect on globals. */
+ call_flags = (TCG_CALL_NO_READ_GLOBALS |
+ TCG_CALL_NO_WRITE_GLOBALS);
+ }
+ }
+
+ /* Make sure that input arguments are available. */
+ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
+ arg = args[i];
+ /* Note this unsigned test catches TCG_CALL_ARG_DUMMY too. */
+ if (arg < nb_globals) {
+ dir = dir_temps[arg];
+ if (dir != 0 && temp_state[arg] == TS_DEAD) {
+ TCGTemp *its = &s->temps[arg];
+ TCGOpcode lopc = (its->type == TCG_TYPE_I32
+ ? INDEX_op_ld_i32
+ : INDEX_op_ld_i64);
+ TCGOp *lop = tcg_op_insert_before(s, op, lopc, 3);
+ TCGArg *largs = &s->gen_opparam_buf[lop->args];
+
+ largs[0] = dir;
+ largs[1] = temp_idx(s, its->mem_base);
+ largs[2] = its->mem_offset;
+
+ /* Loaded, but synced with memory. */
+ temp_state[arg] = TS_SYNC;
+ }
+ }
+ }
+
+ /* Perform input replacement, and mark inputs that became dead.
+ No action is required except keeping temp_state up to date
+ so that we reload when needed. */
+ for (i = nb_oargs; i < nb_iargs + nb_oargs; i++) {
+ arg = args[i];
+ if (arg < nb_globals) {
+ dir = dir_temps[arg];
+ if (dir != 0) {
+ args[i] = dir;
+ changes = true;
+ if (IS_DEAD_ARG(i)) {
+ temp_state[arg] = TS_DEAD;
+ }
+ }
+ }
+ }
+
+ /* Liveness analysis should ensure that the following are
+ all correct, for call sites and basic block end points. */
+ if (call_flags & TCG_CALL_NO_READ_GLOBALS) {
+ /* Nothing to do */
+ } else if (call_flags & TCG_CALL_NO_WRITE_GLOBALS) {
+ for (i = 0; i < nb_globals; ++i) {
+ /* Liveness should see that globals are synced back,
+ that is, either TS_DEAD or TS_SYNC. */
+ tcg_debug_assert(dir_temps[i] == 0
+ || temp_state[i] != 0);
+ }
+ } else {
+ for (i = 0; i < nb_globals; ++i) {
+ /* Liveness should see that globals are saved back,
+ that is, TS_DEAD, waiting to be reloaded. */
+ tcg_debug_assert(dir_temps[i] == 0
+ || temp_state[i] == TS_DEAD);
+ }
+ }
+
+ /* Outputs become available. */
+ for (i = 0; i < nb_oargs; i++) {
+ arg = args[i];
+ if (arg >= nb_globals) {
+ continue;
+ }
+ dir = dir_temps[arg];
+ if (dir == 0) {
+ continue;
+ }
+ args[i] = dir;
+ changes = true;
+
+ /* The output is now live and modified. */
+ temp_state[arg] = 0;
+
+ /* Sync outputs upon their last write. */
+ if (NEED_SYNC_ARG(i)) {
+ TCGTemp *its = &s->temps[arg];
+ TCGOpcode sopc = (its->type == TCG_TYPE_I32
+ ? INDEX_op_st_i32
+ : INDEX_op_st_i64);
+ TCGOp *sop = tcg_op_insert_after(s, op, sopc, 3);
+ TCGArg *sargs = &s->gen_opparam_buf[sop->args];
+
+ sargs[0] = dir;
+ sargs[1] = temp_idx(s, its->mem_base);
+ sargs[2] = its->mem_offset;
+
+ temp_state[arg] = TS_SYNC;
+ }
+ /* Drop outputs that are dead. */
+ if (IS_DEAD_ARG(i)) {
+ temp_state[arg] = TS_DEAD;
+ }
+ }
+ }
+
+ return changes;
+}
+
#ifdef CONFIG_DEBUG_TCG
static void dump_regs(TCGContext *s)
{
@@ -1710,14 +1928,6 @@ static void temp_sync(TCGContext *s, TCGTemp *ts,
if (!ts->mem_allocated) {
temp_allocate_frame(s, temp_idx(s, ts));
}
- if (ts->indirect_reg) {
- if (ts->val_type == TEMP_VAL_REG) {
- tcg_regset_set_reg(allocated_regs, ts->reg);
- }
- temp_load(s, ts->mem_base,
- tcg_target_available_regs[TCG_TYPE_PTR],
- allocated_regs);
- }
switch (ts->val_type) {
case TEMP_VAL_CONST:
/* If we're going to free the temp immediately, then we won't
@@ -1807,12 +2017,6 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
break;
case TEMP_VAL_MEM:
reg = tcg_reg_alloc(s, desired_regs, allocated_regs, ts->indirect_base);
- if (ts->indirect_reg) {
- tcg_regset_set_reg(allocated_regs, reg);
- temp_load(s, ts->mem_base,
- tcg_target_available_regs[TCG_TYPE_PTR],
- allocated_regs);
- }
tcg_out_ld(s, ts->type, reg, ts->mem_base->reg, ts->mem_offset);
ts->mem_coherent = 1;
break;
@@ -1830,14 +2034,9 @@ static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
static inline void temp_save(TCGContext *s, TCGTemp *ts,
TCGRegSet allocated_regs)
{
- /* ??? Liveness does not yet incorporate indirect bases. */
- if (!ts->indirect_base) {
- /* The liveness analysis already ensures that globals are back
- in memory. Keep an tcg_debug_assert for safety. */
- tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
- return;
- }
- temp_sync(s, ts, allocated_regs, 1);
+ /* The liveness analysis already ensures that globals are back
+ in memory. Keep an tcg_debug_assert for safety. */
+ tcg_debug_assert(ts->val_type == TEMP_VAL_MEM || ts->fixed_reg);
}
/* save globals to their canonical location and assume they can be
@@ -1861,14 +2060,9 @@ static void sync_globals(TCGContext *s, TCGRegSet allocated_regs)
for (i = 0; i < s->nb_globals; i++) {
TCGTemp *ts = &s->temps[i];
- /* ??? Liveness does not yet incorporate indirect bases. */
- if (!ts->indirect_base) {
- tcg_debug_assert(ts->val_type != TEMP_VAL_REG
- || ts->fixed_reg
- || ts->mem_coherent);
- continue;
- }
- temp_sync(s, ts, allocated_regs, 0);
+ tcg_debug_assert(ts->val_type != TEMP_VAL_REG
+ || ts->fixed_reg
+ || ts->mem_coherent);
}
}
@@ -1883,23 +2077,15 @@ static void tcg_reg_alloc_bb_end(TCGContext *s, TCGRegSet allocated_regs)
if (ts->temp_local) {
temp_save(s, ts, allocated_regs);
} else {
- /* ??? Liveness does not yet incorporate indirect bases. */
- if (!ts->indirect_base) {
- /* The liveness analysis already ensures that temps are dead.
- Keep an tcg_debug_assert for safety. */
- tcg_debug_assert(ts->val_type == TEMP_VAL_DEAD);
- continue;
- }
- temp_dead(s, ts);
+ /* The liveness analysis already ensures that temps are dead.
+ Keep an tcg_debug_assert for safety. */
+ tcg_debug_assert(ts->val_type == TEMP_VAL_DEAD);
}
}
save_globals(s, allocated_regs);
}
-#define IS_DEAD_ARG(n) (arg_life & (DEAD_ARG << (n)))
-#define NEED_SYNC_ARG(n) (arg_life & (SYNC_ARG << (n)))
-
static void tcg_reg_alloc_movi(TCGContext *s, const TCGArg *args,
TCGLifeData arg_life)
{
@@ -1962,12 +2148,6 @@ static void tcg_reg_alloc_mov(TCGContext *s, const TCGOpDef *def,
if (!ots->mem_allocated) {
temp_allocate_frame(s, args[0]);
}
- if (ots->indirect_reg) {
- tcg_regset_set_reg(allocated_regs, ts->reg);
- temp_load(s, ots->mem_base,
- tcg_target_available_regs[TCG_TYPE_PTR],
- allocated_regs);
- }
tcg_out_st(s, otype, ts->reg, ots->mem_base->reg, ots->mem_offset);
if (IS_DEAD_ARG(1)) {
temp_dead(s, ts);
@@ -2372,7 +2552,27 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
s->la_time -= profile_getclock();
#endif
- tcg_liveness_analysis(s);
+ {
+ uint8_t *temp_state = tcg_malloc(s->nb_temps + s->nb_indirects);
+
+ liveness_pass_1(s, temp_state);
+
+ if (s->nb_indirects > 0) {
+#ifdef DEBUG_DISAS
+ if (unlikely(qemu_loglevel_mask(CPU_LOG_TB_OP_IND)
+ && qemu_log_in_addr_range(tb->pc))) {
+ qemu_log("OP before indirect lowering:\n");
+ tcg_dump_ops(s);
+ qemu_log("\n");
+ }
+#endif
+ /* Replace indirect temps with direct temps. */
+ if (liveness_pass_2(s, temp_state)) {
+ /* If changes were made, re-run liveness. */
+ liveness_pass_1(s, temp_state);
+ }
+ }
+ }
#ifdef CONFIG_PROFILER
s->la_time += profile_getclock();
diff --git a/tcg/tcg.h b/tcg/tcg.h
index 2ff3ad2..c726355 100644
--- a/tcg/tcg.h
+++ b/tcg/tcg.h
@@ -547,6 +547,7 @@ struct TCGContext {
int nb_labels;
int nb_globals;
int nb_temps;
+ int nb_indirects;
/* goto_tb support */
tcg_insn_unit *code_buf;
@@ -839,6 +840,9 @@ void tcg_gen_callN(TCGContext *s, void *func,
TCGArg ret, int nargs, TCGArg *args);
void tcg_op_remove(TCGContext *s, TCGOp *op);
+TCGOp *tcg_op_insert_before(TCGContext *s, TCGOp *op, TCGOpcode opc, int narg);
+TCGOp *tcg_op_insert_after(TCGContext *s, TCGOp *op, TCGOpcode opc, int narg);
+
void tcg_optimize(TCGContext *s);
/* only used for debugging purposes */
diff --git a/util/log.c b/util/log.c
index 32e4160..7760d2a 100644
--- a/util/log.c
+++ b/util/log.c
@@ -239,8 +239,9 @@ const QEMULogItem qemu_log_items[] = {
{ CPU_LOG_TB_OP, "op",
"show micro ops for each compiled TB" },
{ CPU_LOG_TB_OP_OPT, "op_opt",
- "show micro ops (x86 only: before eflags optimization) and\n"
- "after liveness analysis" },
+ "show micro ops after optimization" },
+ { CPU_LOG_TB_OP_IND, "op_ind",
+ "show micro ops before indirect lowering" },
{ CPU_LOG_INT, "int",
"show interrupts/exceptions in short format" },
{ CPU_LOG_EXEC, "exec",
--
2.5.5
^ permalink raw reply related [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (8 preceding siblings ...)
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass Richard Henderson
@ 2016-06-24 10:31 ` Mark Cave-Ayland
2016-07-19 3:39 ` Richard Henderson
10 siblings, 0 replies; 25+ messages in thread
From: Mark Cave-Ayland @ 2016-06-24 10:31 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: aurelien, atar4qemu
On 24/06/16 04:48, Richard Henderson wrote:
> I was unhappy about the complexity of the second try.
>
> Better to convert to normal temps, allowing in rare
> occasions, spilling the "globals" to the stack in order
> to satisfy register allocation.
>
> I can no longer provoke an allocation failure on i686.
> Hopefully this fixes the OpenBSD case that Mark mentioned
> re the second attempt.
>
>
> r~
>
>
> Richard Henderson (9):
> tcg: Fix name for high-half register
> tcg: Optimize spills of constants
> tcg: Require liveness analysis
> tcg: Compress liveness data to 16 bits
> tcg: Reorg TCGOp chaining
> tcg: Fold life data into TCGOp
> tcg: Compress dead_temps and mem_temps into a single array
> tcg: Include liveness info in the dumps
> tcg: Lower indirect registers in a separate pass
>
> include/exec/gen-icount.h | 2 +-
> include/qemu/log.h | 1 +
> tcg/aarch64/tcg-target.inc.c | 10 +
> tcg/arm/tcg-target.inc.c | 6 +
> tcg/i386/tcg-target.inc.c | 21 +-
> tcg/ia64/tcg-target.inc.c | 10 +
> tcg/mips/tcg-target.inc.c | 10 +
> tcg/optimize.c | 37 +--
> tcg/ppc/tcg-target.inc.c | 6 +
> tcg/s390/tcg-target.inc.c | 6 +
> tcg/sparc/tcg-target.inc.c | 10 +
> tcg/tcg-op.c | 2 +-
> tcg/tcg.c | 690 ++++++++++++++++++++++++++++---------------
> tcg/tcg.h | 50 ++--
> tcg/tci/tcg-target.inc.c | 6 +
> util/log.c | 5 +-
> 16 files changed, 563 insertions(+), 309 deletions(-)
Hi Richard,
I gave this patchset a quick test, and I see no regressions with most of
the images, however while the OpenBSD 5.5 boot gets further I still see
(a different type) of panic:
$ ./qemu-system-sparc64 -cdrom /home/build/install55.iso -boot d -nographic
OpenBIOS for Sparc64
Configuration device id QEMU version 1 machine id 0
kernel cmdline
CPUs: 1 x SUNW,UltraSPARC-IIi
UUID: 00000000-0000-0000-0000-000000000000
Welcome to OpenBIOS v1.1 built on Apr 18 2016 08:20
Type 'help' for detailed information
Trying cdrom:f...
Not a bootable ELF image
Not a bootable a.out image
Loading FCode image...
Loaded 4829 bytes
entry point is 0x4000
OpenBSD IEEE 1275 Bootblock 1.3
..
Jumping to entry point 0000000000100000 for type 0000000000000001...
switching to new context: entry point 0x100000 stack 0x00000000ffe84a09
>> OpenBSD BOOT 1.6
Trying bsd...
open /pci@1fe,0/pci-ata@5/ide1@8200/cdrom@0:f/etc/random.seed: No such
file or directory
Booting /pci@1fe,0/pci-ata@5/ide1@8200/cdrom@0:f/bsd
3901336@0x1000000+6248@0x13b8798+3261984@0x1800000+932320@0x1b1c620
symbols @ 0xffc5a300 119 start=0x1000000
Unexpected client interface exception: -1
console is /pci@1fe,0/ebus@3/su
Copyright (c) 1982, 1986, 1989, 1991, 1993
The Regents of the University of California. All rights reserved.
Copyright (c) 1995-2014 OpenBSD. All rights reserved.
http://www.OpenBSD.org
OpenBSD 5.5 (RAMDISK) #153: Tue Mar 4 15:12:10 MST 2014
deraadt@sparc64.openbsd.org:/usr/src/sys/arch/sparc64/compile/RAMDISK
real mem = 134217728 (128MB)
avail mem = 122011648 (116MB)
mainbus0 at root: OpenBiosTeam,OpenBIOS
cpu0 at mainbus0: SUNW,UltraSPARC-IIi (rev 9.1) @ 100 MHz
cpu0: physical 256K instruction (64 b/l), 16K data (32 b/l), 256K
external (64 b/l)
psycho0 at mainbus0: SUNW,sabre, impl 0, version 0, ign 7c0
psycho0: bus range 0-2, PCI bus 0
psycho0: dvma map c0000000-dfffffff
pci0 at psycho0
ppb0 at pci0 dev 1 function 0 "Sun Simba" rev 0x11
pci1 at ppb0 bus 1
ppb1 at pci0 dev 1 function 1 "Sun Simba" rev 0x11
pci2 at ppb1 bus 2
unknown vendor 0x1234 product 0x1111 (class display subclass VGA, rev
0x02) at pci0 dev 2 function 0 not configured
ebus0 at pci0 dev 3 function 0 "Sun PCIO EBus2" rev 0x01
clock1 at ebus0 addr 2000-3fff: mk48t59
"fdthree" at ebus0 addr 0-ffffffff not configured
com0 at ebus0 addr 3f8-3ff ivec 0x2b: ns16550a, 16 byte fifo
com0: console
"kb_ps2" at ebus0 addr 60-67 not configured
"Realtek 8029" rev 0x00 at pci0 dev 4 function 0 not configured
pciide0 at pci0 dev 5 function 0 "CMD Technology PCI0646" rev 0x07: DMA,
channel 0 configured to native-PCI, channel 1 configured to native-PCI
pciide0: using ivec 0x7d4 for native-PCI interrupt
pciide0: channel 0 disabled (no drives)
atapiscsi0 at pciide0 channel 1 drive 0
scsibus0 at atapiscsi0: 2 targets
cd0 at scsibus0 targ 0 lun 0: <QEMU, QEMU DVD-ROM, 2.5+> ATAPI 5/cdrom
removable
cd0(pciide0:1:0): using PIO mode 4, Ultra-DMA mode 2
softraid0 at root
scsibus1 at softraid0: 256 targets
bootpath: /pci@1fe,0/pci-ata@5,0/ide1@8200,0/cdrom@0,0:f
root on rd0a swap on rd0b dump on rd0b
erase ^?, weraseqemu: fatal: Trap 0x0030 while trap level (5) >= MAXTL
(5), Error state
pc: 0000000001011d9c npc: 0000000001011da0
%g0-3: 0000000000000000 fffffc00d9d49470 0000000000000000 0000000000000000
%g4-7: fffffc00d9d48000 00000400e0017619 00000400e0017619 0000000000000017
%o0-3: 00000400e0016000 0000000000000030 0000000000000000 0000000000000000
%o4-7: 00000400e0016000 000000000000082d 00000400062ce391 0000000001011d9c
%l0-3: 00000400026a9970 00000000008e7000 00000000008e7000 00000000008e7000
%l4-7: 00000400062cfbf0 0000000000000000 000004000269a770 0000000000767b80
%i0-3: 00000400e0016000 0000000000000030 0000000000000000 0000000000000000
%i4-7: 00000400e0016000 000000000000082d 00000400062ce571 0000000001011d9c
%f00: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f08: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f16: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f24: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f32: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f40: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f48: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%f56: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
pstate: 00000015 ccr: 00 (icc: ---- xcc: ----) asi: 82 tl: 5 pil: 0
cansave: 6 canrestore: 0 otherwin: 0 wstate: 23 cleanwin: 6 cwp: 5
fsr: 0000000000000000 y: 00000000000000b3 fprs: 0000000000000004
Aborted
I'm starting to wonder if there is something wrong with my i386 VM I
spun up to test this...
ATB,
Mark.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
` (9 preceding siblings ...)
2016-06-24 10:31 ` [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Mark Cave-Ayland
@ 2016-07-19 3:39 ` Richard Henderson
2016-07-22 13:40 ` Aurelien Jarno
10 siblings, 1 reply; 25+ messages in thread
From: Richard Henderson @ 2016-07-19 3:39 UTC (permalink / raw)
To: qemu-devel; +Cc: aurelien
On 06/24/2016 09:18 AM, Richard Henderson wrote:
> I was unhappy about the complexity of the second try.
>
> Better to convert to normal temps, allowing in rare
> occasions, spilling the "globals" to the stack in order
> to satisfy register allocation.
>
> I can no longer provoke an allocation failure on i686.
> Hopefully this fixes the OpenBSD case that Mark mentioned
> re the second attempt.
Ping for review. It would be nice to have this fixed for 2.7, but this is
complex enough I'd prefer another set of eyes.
r~
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation
2016-07-19 3:39 ` Richard Henderson
@ 2016-07-22 13:40 ` Aurelien Jarno
0 siblings, 0 replies; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-22 13:40 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On 2016-07-19 09:09, Richard Henderson wrote:
> On 06/24/2016 09:18 AM, Richard Henderson wrote:
> > I was unhappy about the complexity of the second try.
> >
> > Better to convert to normal temps, allowing in rare
> > occasions, spilling the "globals" to the stack in order
> > to satisfy register allocation.
> >
> > I can no longer provoke an allocation failure on i686.
> > Hopefully this fixes the OpenBSD case that Mark mentioned
> > re the second attempt.
>
> Ping for review. It would be nice to have this fixed for 2.7, but this is
> complex enough I'd prefer another set of eyes.
I'll try to have a look during the week-end. Sorry about the delay.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register Richard Henderson
@ 2016-07-25 11:23 ` Aurelien Jarno
0 siblings, 0 replies; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 11:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/tcg.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 254427b..154ffe8 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -557,7 +557,7 @@ int tcg_global_mem_new_internal(TCGType type, TCGv_ptr base,
> ts2->mem_offset = offset + (1 - bigendian) * 4;
> pstrcpy(buf, sizeof(buf), name);
> pstrcat(buf, sizeof(buf), "_1");
> - ts->name = strdup(buf);
> + ts2->name = strdup(buf);
> } else {
> ts->base_type = type;
> ts->type = type;
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 3/9] tcg: Require liveness analysis
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 3/9] tcg: Require liveness analysis Richard Henderson
@ 2016-07-25 11:23 ` Aurelien Jarno
0 siblings, 0 replies; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 11:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/tcg.c | 20 --------------------
> 1 file changed, 20 deletions(-)
>
Thanks for doing that. The code without liveness analysis is not really
tested anymore and given the kind of optimisations we are doing there
it became quite difficult to compare the generated code with and without
liveness.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits Richard Henderson
@ 2016-07-25 11:23 ` Aurelien Jarno
0 siblings, 0 replies; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 11:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> This reduces both memory usage and per-insn cacheline usage
> during code generation.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/tcg.c | 58 ++++++++++++++++++++++------------------------------------
> tcg/tcg.h | 16 ++++++++++------
> 2 files changed, 32 insertions(+), 42 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 64060c6..400e69c 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1329,7 +1329,7 @@ static inline void tcg_la_bb_end(TCGContext *s, uint8_t *dead_temps,
> }
> }
>
> -/* Liveness analysis : update the opc_dead_args array to tell if a
> +/* Liveness analysis : update the opc_arg_life array to tell if a
> given input arguments is dead. Instructions updating dead
> temporaries are removed. */
> static void tcg_liveness_analysis(TCGContext *s)
> @@ -1338,9 +1338,8 @@ static void tcg_liveness_analysis(TCGContext *s)
> int oi, oi_prev, nb_ops;
>
> nb_ops = s->gen_next_op_idx;
> - s->op_dead_args = tcg_malloc(nb_ops * sizeof(uint16_t));
> - s->op_sync_args = tcg_malloc(nb_ops * sizeof(uint8_t));
> -
> + s->op_arg_life = tcg_malloc(nb_ops * sizeof(TCGLifeData));
> +
> dead_temps = tcg_malloc(s->nb_temps);
> mem_temps = tcg_malloc(s->nb_temps);
> tcg_la_func_end(s, dead_temps, mem_temps);
> @@ -1349,8 +1348,7 @@ static void tcg_liveness_analysis(TCGContext *s)
> int i, nb_iargs, nb_oargs;
> TCGOpcode opc_new, opc_new2;
> bool have_opc_new2;
> - uint16_t dead_args;
> - uint8_t sync_args;
> + TCGLifeData arg_life = 0;
A small improvement, probably for later: we can zero the s->op_arg_life
structure, and then access it directly instead of using the arg_life
temporary variable.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining Richard Henderson
@ 2016-07-25 11:23 ` Aurelien Jarno
2016-08-03 18:08 ` Richard Henderson
0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 11:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> Instead of using -1 as end of chain, use 0, and link through the 0
> entry as a fully circular double-linked list.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> include/exec/gen-icount.h | 2 +-
> tcg/optimize.c | 8 ++------
> tcg/tcg-op.c | 2 +-
> tcg/tcg.c | 32 ++++++++++++--------------------
> tcg/tcg.h | 20 ++++++++++++--------
> 5 files changed, 28 insertions(+), 36 deletions(-)
>
> diff --git a/include/exec/gen-icount.h b/include/exec/gen-icount.h
> index a011324..5f16077 100644
> --- a/include/exec/gen-icount.h
> +++ b/include/exec/gen-icount.h
> @@ -59,7 +59,7 @@ static void gen_tb_end(TranslationBlock *tb, int num_insns)
> }
>
> /* Terminate the linked list. */
> - tcg_ctx.gen_op_buf[tcg_ctx.gen_last_op_idx].next = -1;
> + tcg_ctx.gen_op_buf[tcg_ctx.gen_op_buf[0].prev].next = 0;
> }
>
> static inline void gen_io_start(void)
> diff --git a/tcg/optimize.c b/tcg/optimize.c
> index c0d975b..8df7fc7 100644
> --- a/tcg/optimize.c
> +++ b/tcg/optimize.c
> @@ -103,11 +103,7 @@ static TCGOp *insert_op_before(TCGContext *s, TCGOp *old_op,
> .prev = prev,
> .next = next
> };
> - if (prev >= 0) {
> - s->gen_op_buf[prev].next = oi;
> - } else {
> - s->gen_first_op_idx = oi;
> - }
> + s->gen_op_buf[prev].next = oi;
> old_op->prev = oi;
>
> return new_op;
> @@ -583,7 +579,7 @@ void tcg_optimize(TCGContext *s)
> nb_globals = s->nb_globals;
> reset_all_temps(nb_temps);
>
> - for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
> + for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
> tcg_target_ulong mask, partmask, affected;
> int nb_oargs, nb_iargs, i;
> TCGArg tmp;
> diff --git a/tcg/tcg-op.c b/tcg/tcg-op.c
> index 569cdc6..62d91b4 100644
> --- a/tcg/tcg-op.c
> +++ b/tcg/tcg-op.c
> @@ -52,7 +52,7 @@ static void tcg_emit_op(TCGContext *ctx, TCGOpcode opc, int args)
> int pi = oi - 1;
>
> tcg_debug_assert(oi < OPC_BUF_SIZE);
> - ctx->gen_last_op_idx = oi;
> + ctx->gen_op_buf[0].prev = oi;
> ctx->gen_next_op_idx = ni;
>
> ctx->gen_op_buf[oi] = (TCGOp){
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 400e69c..bb1efe2 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -437,9 +437,9 @@ void tcg_func_start(TCGContext *s)
> s->goto_tb_issue_mask = 0;
> #endif
>
> - s->gen_first_op_idx = 0;
> - s->gen_last_op_idx = -1;
> - s->gen_next_op_idx = 0;
> + s->gen_op_buf[0].next = 1;
> + s->gen_op_buf[0].prev = 0;
> + s->gen_next_op_idx = 1;
> s->gen_next_parm_idx = 0;
>
> s->be = tcg_malloc(sizeof(TCGBackendData));
> @@ -868,7 +868,7 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
> /* Make sure the calli field didn't overflow. */
> tcg_debug_assert(s->gen_op_buf[i].calli == real_args);
>
> - s->gen_last_op_idx = i;
> + s->gen_op_buf[0].prev = i;
> s->gen_next_op_idx = i + 1;
> s->gen_next_parm_idx = pi;
>
> @@ -1004,7 +1004,7 @@ void tcg_dump_ops(TCGContext *s)
> TCGOp *op;
> int oi;
>
> - for (oi = s->gen_first_op_idx; oi >= 0; oi = op->next) {
> + for (oi = s->gen_op_buf[0].next; oi != 0; oi = op->next) {
> int i, k, nb_oargs, nb_iargs, nb_cargs;
> const TCGOpDef *def;
> const TCGArg *args;
> @@ -1016,7 +1016,7 @@ void tcg_dump_ops(TCGContext *s)
> args = &s->gen_opparam_buf[op->args];
>
> if (c == INDEX_op_insn_start) {
> - qemu_log("%s ----", oi != s->gen_first_op_idx ? "\n" : "");
> + qemu_log("%s ----", oi != s->gen_op_buf[0].next ? "\n" : "");
>
> for (i = 0; i < TARGET_INSN_START_WORDS; ++i) {
> target_ulong a;
> @@ -1287,18 +1287,10 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
> int next = op->next;
> int prev = op->prev;
>
> - if (next >= 0) {
> - s->gen_op_buf[next].prev = prev;
> - } else {
> - s->gen_last_op_idx = prev;
> - }
> - if (prev >= 0) {
> - s->gen_op_buf[prev].next = next;
> - } else {
> - s->gen_first_op_idx = next;
> - }
> + s->gen_op_buf[next].prev = prev;
> + s->gen_op_buf[prev].next = next;
>
> - memset(op, -1, sizeof(*op));
> + memset(op, 0, sizeof(*op));
Doing so means you can remove the op at gen_op_buf[0]. The doubled
linked list is still valid, but then I guess you can't assume anymore
that the first op is gen_op_buf[0], as it is done elsewhere your patch.
I guess it's unlikely to happen that we have to remove the first op.
Maybe adding an assert there is enough?
> #ifdef CONFIG_PROFILER
> s->del_op_count++;
> @@ -1344,7 +1336,7 @@ static void tcg_liveness_analysis(TCGContext *s)
> mem_temps = tcg_malloc(s->nb_temps);
> tcg_la_func_end(s, dead_temps, mem_temps);
>
> - for (oi = s->gen_last_op_idx; oi >= 0; oi = oi_prev) {
> + for (oi = s->gen_op_buf[0].prev; oi != 0; oi = oi_prev) {
> int i, nb_iargs, nb_oargs;
> TCGOpcode opc_new, opc_new2;
> bool have_opc_new2;
> @@ -2321,7 +2313,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
> {
> int n;
>
> - n = s->gen_last_op_idx + 1;
> + n = s->gen_op_buf[0].prev + 1;
> s->op_count += n;
> if (n > s->op_count_max) {
> s->op_count_max = n;
> @@ -2380,7 +2372,7 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb)
> tcg_out_tb_init(s);
>
> num_insns = -1;
> - for (oi = s->gen_first_op_idx; oi >= 0; oi = oi_next) {
> + for (oi = s->gen_op_buf[0].next; oi != 0; oi = oi_next) {
> TCGOp * const op = &s->gen_op_buf[oi];
> TCGArg * const args = &s->gen_opparam_buf[op->args];
> TCGOpcode opc = op->opc;
> diff --git a/tcg/tcg.h b/tcg/tcg.h
> index cc14560..49b396d 100644
> --- a/tcg/tcg.h
> +++ b/tcg/tcg.h
> @@ -520,17 +520,21 @@ typedef struct TCGOp {
> unsigned callo : 2;
> unsigned calli : 6;
>
> - /* Index of the arguments for this op, or -1 for zero-operand ops. */
> - signed args : 16;
> + /* Index of the arguments for this op, or 0 for zero-operand ops. */
> + unsigned args : 16;
>
> - /* Index of the prex/next op, or -1 for the end of the list. */
> - signed prev : 16;
> - signed next : 16;
> + /* Index of the prex/next op, or 0 for the end of the list. */
It's not introduced by your patch, but you might want to fix the typo
prex -> prev.
> + unsigned prev : 16;
> + unsigned next : 16;
> } TCGOp;
>
> -QEMU_BUILD_BUG_ON(NB_OPS > 0xff);
> -QEMU_BUILD_BUG_ON(OPC_BUF_SIZE >= 0x7fff);
> -QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE >= 0x7fff);
> +/* Make sure operands fit in the bitfields above. */
> +QEMU_BUILD_BUG_ON(NB_OPS > (1 << 8));
> +QEMU_BUILD_BUG_ON(OPC_BUF_SIZE > (1 << 16));
> +QEMU_BUILD_BUG_ON(OPPARAM_BUF_SIZE > (1 << 16));
> +
> +/* Make sure that we don't overflow 64 bits without noticing. */
> +QEMU_BUILD_BUG_ON(sizeof(TCGOp) > 8);
>
> struct TCGContext {
> uint8_t *pool_cur, *pool_end;
It seems that gen_first_op_idx and gen_last_op_idx are now unused.
Shouldn't they be removed?
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 6/9] tcg: Fold life data into TCGOp
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 6/9] tcg: Fold life data into TCGOp Richard Henderson
@ 2016-07-25 14:07 ` Aurelien Jarno
0 siblings, 0 replies; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 14:07 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> Reduce the size of other bitfields to make room.
> This reduces the cache footprint of compilation.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/tcg.c | 9 +++------
> tcg/tcg.h | 26 ++++++++++++++------------
> 2 files changed, 17 insertions(+), 18 deletions(-)
This looks fine and goes in the right direction of having all the data
at the same location. In the long term it might be useful to have the
TCG stream and the associated data together to implement more agressive
optimisations.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array Richard Henderson
@ 2016-07-25 15:15 ` Aurelien Jarno
2016-08-03 18:22 ` Richard Henderson
0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 15:15 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> We only need two bits per temporary. Fold the two bytes into one,
> and reduce the memory and cachelines required during compilation.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/tcg.c | 118 +++++++++++++++++++++++++++++++-------------------------------
> 1 file changed, 59 insertions(+), 59 deletions(-)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index c41640f..fd92b06 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -332,7 +332,7 @@ void tcg_context_init(TCGContext *s)
>
> memset(s, 0, sizeof(*s));
> s->nb_globals = 0;
> -
> +
> /* Count total number of arguments and allocate the corresponding
> space */
> total_args = 0;
> @@ -824,16 +824,16 @@ void tcg_gen_callN(TCGContext *s, void *func, TCGArg ret,
> real_args++;
> }
> #endif
> - /* If stack grows up, then we will be placing successive
> - arguments at lower addresses, which means we need to
> - reverse the order compared to how we would normally
> - treat either big or little-endian. For those arguments
> - that will wind up in registers, this still works for
> - HPPA (the only current STACK_GROWSUP target) since the
> - argument registers are *also* allocated in decreasing
> - order. If another such target is added, this logic may
> - have to get more complicated to differentiate between
> - stack arguments and register arguments. */
> + /* If stack grows up, then we will be placing successive
> + arguments at lower addresses, which means we need to
> + reverse the order compared to how we would normally
> + treat either big or little-endian. For those arguments
> + that will wind up in registers, this still works for
> + HPPA (the only current STACK_GROWSUP target) since the
> + argument registers are *also* allocated in decreasing
> + order. If another such target is added, this logic may
> + have to get more complicated to differentiate between
> + stack arguments and register arguments. */
> #if defined(HOST_WORDS_BIGENDIAN) != defined(TCG_TARGET_STACK_GROWSUP)
> s->gen_opparam_buf[pi++] = args[i] + 1;
> s->gen_opparam_buf[pi++] = args[i];
> @@ -1297,27 +1297,28 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
> #endif
> }
>
> +#define TS_DEAD 1
> +#define TS_SYNC 2
I am not sure that TS_SYNC is the best name for that. There we really
want to tell that at this moment in the TCG instruction stream the
operand is in memory. It doesn't implied it is synced. Maybe TS_MEM?
The rest looks fine to me. The other alternative would have been to use
the TCGTempSet with the bitmap functions like in optimize.c, and use
only 2 bits per temp. That something that can be done later though.
All that said and provided you change the name:
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps Richard Henderson
@ 2016-07-25 15:15 ` Aurelien Jarno
2016-07-25 16:16 ` Aurelien Jarno
1 sibling, 0 replies; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 15:15 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/tcg.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps Richard Henderson
2016-07-25 15:15 ` Aurelien Jarno
@ 2016-07-25 16:16 ` Aurelien Jarno
1 sibling, 0 replies; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 16:16 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> tcg/tcg.c | 26 ++++++++++++++++++++++++++
> 1 file changed, 26 insertions(+)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index fd92b06..3e4bc99 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -1009,6 +1009,7 @@ void tcg_dump_ops(TCGContext *s)
> const TCGOpDef *def;
> const TCGArg *args;
> TCGOpcode c;
> + long pos = ftell(qemu_logfile);
>
Small additional note: ftell() doesn't work well when the logfile is the
terminal. As such when not dumping to the file, the alignment is wrong.
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass Richard Henderson
@ 2016-07-25 19:23 ` Aurelien Jarno
2016-08-03 19:27 ` Richard Henderson
0 siblings, 1 reply; 25+ messages in thread
From: Aurelien Jarno @ 2016-07-25 19:23 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 2016-06-23 20:48, Richard Henderson wrote:
> Rather than rely on recursion during the middle of register allocation,
> lower indirect registers to loads and stores off the indirect base into
> plain temps.
>
> For an x86_64 host, with sufficient registers, this results in identical
> code, modulo the actual register assignments.
>
> For an i686 host, with insufficient registers, this means that temps can
> be (temporarily) spilled to the stack in order to satisfy an allocation.
> This as opposed to the possibility of not being able to spill, to allocate
> a register for the indirect base, in order to perform a spill.
>
> Signed-off-by: Richard Henderson <rth@twiddle.net>
> ---
> include/qemu/log.h | 1 +
> tcg/optimize.c | 31 +-----
> tcg/tcg.c | 306 +++++++++++++++++++++++++++++++++++++++++++----------
> tcg/tcg.h | 4 +
> util/log.c | 5 +-
> 5 files changed, 263 insertions(+), 84 deletions(-)
This patch is a difficult one to review... On the purely technical side
it does what it is supposed to do and I haven't found any issue, though
it's probably very easy to miss one in this kind of code. I have done
tests with various sparc images and I haven't found any obvious
regression on an x86_64 host.
Now on the less technical side, I really like the idea of being able to
transform more or less in place the TCG instruction stream. Your more or
less recent patches towards that direction are great. That said I am a
bit worried that we loop many times on the various ops. We used to have
one forward pass (optimizer) and one backward pass (liveness analysis).
Your patch adds up to two additional passes (one forward and one
backward), this clearly has a cost. Given that indirect registers bring
a lot of performance I think it is worth it. Now I wonder if there is
any way to do the lowering of registers earlier, I mean before the
liveness analysis. This would probably generate plenty of useless ops,
but that are later removed by the liveness analysis. Maybe you have
already try that?
I think it also depends on which direction we want to go with TCG,
either plenty of small independent optimization passes, or keep the
number of passes limited which means more complex code. Contrary to
a compiler we have to do a much more difficult trade-off between the
optimization time and the level of optimization.
Nevertheless I think it's the correct way to go forward for now and
this patch fixes real issues on hosts with limited registers. Maybe just
add a note saying there *might* be better ways to do that.
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
--
Aurelien Jarno GPG: 4096R/1DDD8C9B
aurelien@aurel32.net http://www.aurel32.net
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining
2016-07-25 11:23 ` Aurelien Jarno
@ 2016-08-03 18:08 ` Richard Henderson
0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-08-03 18:08 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 07/25/2016 04:53 PM, Aurelien Jarno wrote:
> On 2016-06-23 20:48, Richard Henderson wrote:
>> @@ -1287,18 +1287,10 @@ void tcg_op_remove(TCGContext *s, TCGOp *op)
>> int next = op->next;
>> int prev = op->prev;
>>
>> - if (next >= 0) {
>> - s->gen_op_buf[next].prev = prev;
>> - } else {
>> - s->gen_last_op_idx = prev;
>> - }
>> - if (prev >= 0) {
>> - s->gen_op_buf[prev].next = next;
>> - } else {
>> - s->gen_first_op_idx = next;
>> - }
>> + s->gen_op_buf[next].prev = prev;
>> + s->gen_op_buf[prev].next = next;
>>
>> - memset(op, -1, sizeof(*op));
>> + memset(op, 0, sizeof(*op));
>
> Doing so means you can remove the op at gen_op_buf[0]. The doubled
> linked list is still valid, but then I guess you can't assume anymore
> that the first op is gen_op_buf[0], as it is done elsewhere your patch.
>
> I guess it's unlikely to happen that we have to remove the first op.
> Maybe adding an assert there is enough?
Assert added.
>> - /* Index of the prex/next op, or -1 for the end of the list. */
>> - signed prev : 16;
>> - signed next : 16;
>> + /* Index of the prex/next op, or 0 for the end of the list. */
>
> It's not introduced by your patch, but you might want to fix the typo
> prex -> prev.
Fixed.
> It seems that gen_first_op_idx and gen_last_op_idx are now unused.
> Shouldn't they be removed?
Yep, fixed.
r~
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array
2016-07-25 15:15 ` Aurelien Jarno
@ 2016-08-03 18:22 ` Richard Henderson
0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-08-03 18:22 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 07/25/2016 08:45 PM, Aurelien Jarno wrote:
>> > +#define TS_DEAD 1
>> > +#define TS_SYNC 2
> I am not sure that TS_SYNC is the best name for that. There we really
> want to tell that at this moment in the TCG instruction stream the
> operand is in memory. It doesn't implied it is synced. Maybe TS_MEM?
>
> The rest looks fine to me. The other alternative would have been to use
> the TCGTempSet with the bitmap functions like in optimize.c, and use
> only 2 bits per temp. That something that can be done later though.
I changed the name to TS_MEM as suggested.
r~
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass
2016-07-25 19:23 ` Aurelien Jarno
@ 2016-08-03 19:27 ` Richard Henderson
0 siblings, 0 replies; 25+ messages in thread
From: Richard Henderson @ 2016-08-03 19:27 UTC (permalink / raw)
To: Aurelien Jarno; +Cc: qemu-devel, mark.cave-ayland, atar4qemu
On 07/26/2016 12:53 AM, Aurelien Jarno wrote:
> Now on the less technical side, I really like the idea of being able to
> transform more or less in place the TCG instruction stream. Your more or
> less recent patches towards that direction are great. That said I am a
> bit worried that we loop many times on the various ops. We used to have
> one forward pass (optimizer) and one backward pass (liveness analysis).
> Your patch adds up to two additional passes (one forward and one
> backward), this clearly has a cost. Given that indirect registers bring
> a lot of performance I think it is worth it. Now I wonder if there is
> any way to do the lowering of registers earlier, I mean before the
> liveness analysis. This would probably generate plenty of useless ops,
> but that are later removed by the liveness analysis. Maybe you have
> already try that?
No, I did not try that, simply because we don't do liveness analysis of memory.
And that's what we have with lowering indirect registers earlier. Indeed, it
means we're right back where we were before introducing them.
We need liveness analysis on the tcg globals in order to know where to add the
reads and writes. I see no way around that.
The one place where the code could be improved to remove a pass is to have the
indirect lowering pass update liveness at the same time. We need accurate
liveness in order to satisfy the asserts in the final code generation pass, so
we have to do something. I simply thought it was easier to re-run the original
liveness pass rather than complicating the indirect lowering pass.
> I think it also depends on which direction we want to go with TCG,
> either plenty of small independent optimization passes, or keep the
> number of passes limited which means more complex code. Contrary to
> a compiler we have to do a much more difficult trade-off between the
> optimization time and the level of optimization.
Indeed. Fewer passes over large amounts of data is better, but I'm not sure we
have "large" amounts of data for the average TB. On the other hand, smaller
passes can reduce the code size of any one loop so that each fits in icache
when one unified pass might not.
r~
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2016-08-03 19:27 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-24 3:48 [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 1/9] tcg: Fix name for high-half register Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 2/9] tcg: Optimize spills of constants Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 3/9] tcg: Require liveness analysis Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 4/9] tcg: Compress liveness data to 16 bits Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 5/9] tcg: Reorg TCGOp chaining Richard Henderson
2016-07-25 11:23 ` Aurelien Jarno
2016-08-03 18:08 ` Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 6/9] tcg: Fold life data into TCGOp Richard Henderson
2016-07-25 14:07 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 7/9] tcg: Compress dead_temps and mem_temps into a single array Richard Henderson
2016-07-25 15:15 ` Aurelien Jarno
2016-08-03 18:22 ` Richard Henderson
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 8/9] tcg: Include liveness info in the dumps Richard Henderson
2016-07-25 15:15 ` Aurelien Jarno
2016-07-25 16:16 ` Aurelien Jarno
2016-06-24 3:48 ` [Qemu-devel] [PATCH v3 9/9] tcg: Lower indirect registers in a separate pass Richard Henderson
2016-07-25 19:23 ` Aurelien Jarno
2016-08-03 19:27 ` Richard Henderson
2016-06-24 10:31 ` [Qemu-devel] [PATCH v3 0/9] Third try at fixing sparc register allocation Mark Cave-Ayland
2016-07-19 3:39 ` Richard Henderson
2016-07-22 13:40 ` Aurelien Jarno
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).