* [PULL v2 00/47] tcg misc queue
@ 2023-01-06 3:17 Richard Henderson
2023-01-06 3:17 ` [PULL v2 19/47] tcg: Introduce paired register allocation Richard Henderson
` (2 more replies)
0 siblings, 3 replies; 4+ messages in thread
From: Richard Henderson @ 2023-01-06 3:17 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell
Changes in patch 47, to reduce execution time with --enable-debug.
Changes in patch 19, to fix an i386 specific register allocation failure.
r~
The following changes since commit cb9c6a8e5ad6a1f0ce164d352e3102df46986e22:
.gitlab-ci.d/windows: Work-around timeout and OpenGL problems of the MSYS2 jobs (2023-01-04 18:58:33 +0000)
are available in the Git repository at:
https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230105
for you to fetch changes up to d4846c33ebe04d2141dcc613b5558d2f1d8077af:
tests/tcg/multiarch: add vma-pthread.c (2023-01-05 11:41:29 -0800)
----------------------------------------------------------------
Fix race conditions in new user-only vma tracking.
Add tcg backend paired register allocation.
Cleanup tcg backend function call abi.
----------------------------------------------------------------
Ilya Leoshkevich (1):
tests/tcg/multiarch: add vma-pthread.c
Mark Cave-Ayland (1):
tcg: convert tcg/README to rst
Philippe Mathieu-Daudé (5):
tcg/s390x: Fix coding style
tcg: Massage process_op_defs()
tcg: Pass number of arguments to tcg_emit_op() / tcg_op_insert_*()
tcg: Convert typecode_to_ffi from array to function
tcg: Factor init_ffi_layouts() out of tcg_context_init()
Richard Henderson (40):
meson: Move CONFIG_TCG_INTERPRETER to config_host
tcg: Cleanup trailing whitespace
qemu/main-loop: Introduce QEMU_IOTHREAD_LOCK_GUARD
hw/mips: Use QEMU_IOTHREAD_LOCK_GUARD in cpu_mips_irq_request
target/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in ppc_maybe_interrupt
target/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in cpu_interrupt_exittb
target/riscv: Use QEMU_IOTHREAD_LOCK_GUARD in riscv_cpu_update_mip
hw/ppc: Use QEMU_IOTHREAD_LOCK_GUARD in ppc_set_irq
accel/tcg: Use QEMU_IOTHREAD_LOCK_GUARD in io_readx/io_writex
tcg: Tidy tcg_reg_alloc_op
tcg: Remove TCG_TARGET_STACK_GROWSUP
tci: MAX_OPC_PARAM_IARGS is no longer used
tcg: Fix tcg_reg_alloc_dup*
tcg: Centralize updates to reg_to_temp
tcg: Remove check_regs
tcg: Introduce paired register allocation
accel/tcg: Set cflags_next_tb in cpu_common_initfn
target/sparc: Avoid TCGV_{LOW,HIGH}
tcg: Move TCG_{LOW,HIGH} to tcg-internal.h
tcg: Add temp_subindex to TCGTemp
tcg: Simplify calls to temp_sync vs mem_coherent
tcg: Allocate TCGTemp pairs in host memory order
tcg: Move TCG_TYPE_COUNT outside enum
tcg: Introduce tcg_type_size
tcg: Introduce TCGCallReturnKind and TCGCallArgumentKind
tcg: Replace TCG_TARGET_CALL_ALIGN_ARGS with TCG_TARGET_CALL_ARG_I64
tcg: Replace TCG_TARGET_EXTEND_ARGS with TCG_TARGET_CALL_ARG_I32
tcg: Use TCG_CALL_ARG_EVEN for TCI special case
accel/tcg/plugin: Don't search for the function pointer index
accel/tcg/plugin: Avoid duplicate copy in copy_call
accel/tcg/plugin: Use copy_op in append_{udata,mem}_cb
tcg: Vary the allocation size for TCGOp
tcg: Use output_pref wrapper function
tcg: Reorg function calls
tcg: Move ffi_cif pointer into TCGHelperInfo
tcg/aarch64: Merge tcg_out_callr into tcg_out_call
tcg: Add TCGHelperInfo argument to tcg_out_call
accel/tcg: Fix tb_invalidate_phys_page_unwind
accel/tcg: Use g_free_rcu for user-exec interval trees
accel/tcg: Handle false negative lookup in page_check_range
include/exec/helper-head.h | 2 +-
include/qemu/main-loop.h | 29 +
include/tcg/tcg-op.h | 35 +-
include/tcg/tcg.h | 96 +-
tcg/aarch64/tcg-target.h | 4 +-
tcg/arm/tcg-target.h | 4 +-
tcg/i386/tcg-target.h | 2 +
tcg/loongarch64/tcg-target.h | 3 +-
tcg/mips/tcg-target.h | 4 +-
tcg/riscv/tcg-target.h | 7 +-
tcg/s390x/tcg-target.h | 3 +-
tcg/sparc64/tcg-target.h | 3 +-
tcg/tcg-internal.h | 58 +-
tcg/tci/tcg-target.h | 7 +
tests/tcg/multiarch/nop_func.h | 25 +
accel/tcg/cputlb.c | 25 +-
accel/tcg/plugin-gen.c | 54 +-
accel/tcg/tb-maint.c | 78 +-
accel/tcg/user-exec.c | 59 +-
hw/core/cpu-common.c | 1 +
hw/mips/mips_int.c | 11 +-
hw/ppc/ppc.c | 10 +-
target/ppc/excp_helper.c | 11 +-
target/ppc/helper_regs.c | 14 +-
target/riscv/cpu_helper.c | 10 +-
target/sparc/translate.c | 21 +-
tcg/optimize.c | 10 +-
tcg/tcg-op-vec.c | 10 +-
tcg/tcg-op.c | 49 +-
tcg/tcg.c | 1663 +++++++++++++++++++++-------------
tcg/tci.c | 1 -
tests/tcg/multiarch/munmap-pthread.c | 16 +-
tests/tcg/multiarch/vma-pthread.c | 207 +++++
docs/devel/atomics.rst | 2 +
docs/devel/index-tcg.rst | 1 +
docs/devel/tcg-ops.rst | 941 +++++++++++++++++++
docs/devel/tcg.rst | 2 +-
meson.build | 4 +-
tcg/README | 784 ----------------
tcg/aarch64/tcg-target.c.inc | 19 +-
tcg/arm/tcg-target.c.inc | 10 +-
tcg/i386/tcg-target.c.inc | 5 +-
tcg/loongarch64/tcg-target.c.inc | 7 +-
tcg/mips/tcg-target.c.inc | 3 +-
tcg/ppc/tcg-target.c.inc | 36 +-
tcg/riscv/tcg-target.c.inc | 7 +-
tcg/s390x/tcg-target.c.inc | 32 +-
tcg/sparc64/tcg-target.c.inc | 3 +-
tcg/tci/tcg-target.c.inc | 7 +-
tests/tcg/multiarch/Makefile.target | 3 +
50 files changed, 2635 insertions(+), 1763 deletions(-)
create mode 100644 tests/tcg/multiarch/nop_func.h
create mode 100644 tests/tcg/multiarch/vma-pthread.c
create mode 100644 docs/devel/tcg-ops.rst
delete mode 100644 tcg/README
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PULL v2 19/47] tcg: Introduce paired register allocation
2023-01-06 3:17 [PULL v2 00/47] tcg misc queue Richard Henderson
@ 2023-01-06 3:17 ` Richard Henderson
2023-01-06 3:17 ` [PULL v2 47/47] tests/tcg/multiarch: add vma-pthread.c Richard Henderson
2023-01-06 22:15 ` [PULL v2 00/47] tcg misc queue Peter Maydell
2 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2023-01-06 3:17 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell
There are several instances where we need to be able to
allocate a pair of registers to related inputs/outputs.
Add 'p' and 'm' register constraints for this, in order to
be able to allocate the even/odd register first or second.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
include/tcg/tcg.h | 2 +
tcg/tcg.c | 424 ++++++++++++++++++++++++++++++++++++++++------
2 files changed, 378 insertions(+), 48 deletions(-)
diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index d84bae6e3f..5c2254ce9f 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -951,6 +951,8 @@ typedef struct TCGArgConstraint {
unsigned ct : 16;
unsigned alias_index : 4;
unsigned sort_index : 4;
+ unsigned pair_index : 4;
+ unsigned pair : 2; /* 0: none, 1: first, 2: second, 3: second alias */
bool oalias : 1;
bool ialias : 1;
bool newreg : 1;
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 92141bd79a..2cf24b4453 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1969,15 +1969,32 @@ static void tcg_dump_ops(TCGContext *s, FILE *f, bool have_prefs)
static int get_constraint_priority(const TCGOpDef *def, int k)
{
const TCGArgConstraint *arg_ct = &def->args_ct[k];
- int n;
+ int n = ctpop64(arg_ct->regs);
- if (arg_ct->oalias) {
- /* an alias is equivalent to a single register */
- n = 1;
- } else {
- n = ctpop64(arg_ct->regs);
+ /*
+ * Sort constraints of a single register first, which includes output
+ * aliases (which must exactly match the input already allocated).
+ */
+ if (n == 1 || arg_ct->oalias) {
+ return INT_MAX;
}
- return TCG_TARGET_NB_REGS - n + 1;
+
+ /*
+ * Sort register pairs next, first then second immediately after.
+ * Arbitrarily sort multiple pairs by the index of the first reg;
+ * there shouldn't be many pairs.
+ */
+ switch (arg_ct->pair) {
+ case 1:
+ case 3:
+ return (k + 1) * 2;
+ case 2:
+ return (arg_ct->pair_index + 1) * 2 - 1;
+ }
+
+ /* Finally, sort by decreasing register count. */
+ assert(n > 1);
+ return -n;
}
/* sort from highest priority to lowest */
@@ -2012,7 +2029,8 @@ static void process_op_defs(TCGContext *s)
for (op = 0; op < NB_OPS; op++) {
TCGOpDef *def = &tcg_op_defs[op];
const TCGTargetOpDef *tdefs;
- int i, o, nb_args;
+ bool saw_alias_pair = false;
+ int i, o, i2, o2, nb_args;
if (def->flags & TCG_OPF_NOT_PRESENT) {
continue;
@@ -2053,6 +2071,9 @@ static void process_op_defs(TCGContext *s)
/* The input sets ialias. */
def->args_ct[i].ialias = 1;
def->args_ct[i].alias_index = o;
+ if (def->args_ct[i].pair) {
+ saw_alias_pair = true;
+ }
tcg_debug_assert(ct_str[1] == '\0');
continue;
@@ -2061,6 +2082,38 @@ static void process_op_defs(TCGContext *s)
def->args_ct[i].newreg = true;
ct_str++;
break;
+
+ case 'p': /* plus */
+ /* Allocate to the register after the previous. */
+ tcg_debug_assert(i > (input_p ? def->nb_oargs : 0));
+ o = i - 1;
+ tcg_debug_assert(!def->args_ct[o].pair);
+ tcg_debug_assert(!def->args_ct[o].ct);
+ def->args_ct[i] = (TCGArgConstraint){
+ .pair = 2,
+ .pair_index = o,
+ .regs = def->args_ct[o].regs << 1,
+ };
+ def->args_ct[o].pair = 1;
+ def->args_ct[o].pair_index = i;
+ tcg_debug_assert(ct_str[1] == '\0');
+ continue;
+
+ case 'm': /* minus */
+ /* Allocate to the register before the previous. */
+ tcg_debug_assert(i > (input_p ? def->nb_oargs : 0));
+ o = i - 1;
+ tcg_debug_assert(!def->args_ct[o].pair);
+ tcg_debug_assert(!def->args_ct[o].ct);
+ def->args_ct[i] = (TCGArgConstraint){
+ .pair = 1,
+ .pair_index = o,
+ .regs = def->args_ct[o].regs >> 1,
+ };
+ def->args_ct[o].pair = 2;
+ def->args_ct[o].pair_index = i;
+ tcg_debug_assert(ct_str[1] == '\0');
+ continue;
}
do {
@@ -2084,6 +2137,8 @@ static void process_op_defs(TCGContext *s)
default:
case '0' ... '9':
case '&':
+ case 'p':
+ case 'm':
/* Typo in TCGTargetOpDef constraint. */
g_assert_not_reached();
}
@@ -2093,6 +2148,79 @@ static void process_op_defs(TCGContext *s)
/* TCGTargetOpDef entry with too much information? */
tcg_debug_assert(i == TCG_MAX_OP_ARGS || tdefs->args_ct_str[i] == NULL);
+ /*
+ * Fix up output pairs that are aliased with inputs.
+ * When we created the alias, we copied pair from the output.
+ * There are three cases:
+ * (1a) Pairs of inputs alias pairs of outputs.
+ * (1b) One input aliases the first of a pair of outputs.
+ * (2) One input aliases the second of a pair of outputs.
+ *
+ * Case 1a is handled by making sure that the pair_index'es are
+ * properly updated so that they appear the same as a pair of inputs.
+ *
+ * Case 1b is handled by setting the pair_index of the input to
+ * itself, simply so it doesn't point to an unrelated argument.
+ * Since we don't encounter the "second" during the input allocation
+ * phase, nothing happens with the second half of the input pair.
+ *
+ * Case 2 is handled by setting the second input to pair=3, the
+ * first output to pair=3, and the pair_index'es to match.
+ */
+ if (saw_alias_pair) {
+ for (i = def->nb_oargs; i < nb_args; i++) {
+ /*
+ * Since [0-9pm] must be alone in the constraint string,
+ * the only way they can both be set is if the pair comes
+ * from the output alias.
+ */
+ if (!def->args_ct[i].ialias) {
+ continue;
+ }
+ switch (def->args_ct[i].pair) {
+ case 0:
+ break;
+ case 1:
+ o = def->args_ct[i].alias_index;
+ o2 = def->args_ct[o].pair_index;
+ tcg_debug_assert(def->args_ct[o].pair == 1);
+ tcg_debug_assert(def->args_ct[o2].pair == 2);
+ if (def->args_ct[o2].oalias) {
+ /* Case 1a */
+ i2 = def->args_ct[o2].alias_index;
+ tcg_debug_assert(def->args_ct[i2].pair == 2);
+ def->args_ct[i2].pair_index = i;
+ def->args_ct[i].pair_index = i2;
+ } else {
+ /* Case 1b */
+ def->args_ct[i].pair_index = i;
+ }
+ break;
+ case 2:
+ o = def->args_ct[i].alias_index;
+ o2 = def->args_ct[o].pair_index;
+ tcg_debug_assert(def->args_ct[o].pair == 2);
+ tcg_debug_assert(def->args_ct[o2].pair == 1);
+ if (def->args_ct[o2].oalias) {
+ /* Case 1a */
+ i2 = def->args_ct[o2].alias_index;
+ tcg_debug_assert(def->args_ct[i2].pair == 1);
+ def->args_ct[i2].pair_index = i;
+ def->args_ct[i].pair_index = i2;
+ } else {
+ /* Case 2 */
+ def->args_ct[i].pair = 3;
+ def->args_ct[o2].pair = 3;
+ def->args_ct[i].pair_index = o2;
+ def->args_ct[o2].pair_index = i;
+ }
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ }
+ }
+
/* sort the constraints (XXX: this is just an heuristic) */
sort_constraints(def, 0, def->nb_oargs);
sort_constraints(def, def->nb_oargs, def->nb_iargs);
@@ -3141,6 +3269,52 @@ static TCGReg tcg_reg_alloc(TCGContext *s, TCGRegSet required_regs,
tcg_abort();
}
+static TCGReg tcg_reg_alloc_pair(TCGContext *s, TCGRegSet required_regs,
+ TCGRegSet allocated_regs,
+ TCGRegSet preferred_regs, bool rev)
+{
+ int i, j, k, fmin, n = ARRAY_SIZE(tcg_target_reg_alloc_order);
+ TCGRegSet reg_ct[2];
+ const int *order;
+
+ /* Ensure that if I is not in allocated_regs, I+1 is not either. */
+ reg_ct[1] = required_regs & ~(allocated_regs | (allocated_regs >> 1));
+ tcg_debug_assert(reg_ct[1] != 0);
+ reg_ct[0] = reg_ct[1] & preferred_regs;
+
+ order = rev ? indirect_reg_alloc_order : tcg_target_reg_alloc_order;
+
+ /*
+ * Skip the preferred_regs option if it cannot be satisfied,
+ * or if the preference made no difference.
+ */
+ k = reg_ct[0] == 0 || reg_ct[0] == reg_ct[1];
+
+ /*
+ * Minimize the number of flushes by looking for 2 free registers first,
+ * then a single flush, then two flushes.
+ */
+ for (fmin = 2; fmin >= 0; fmin--) {
+ for (j = k; j < 2; j++) {
+ TCGRegSet set = reg_ct[j];
+
+ for (i = 0; i < n; i++) {
+ TCGReg reg = order[i];
+
+ if (tcg_regset_test_reg(set, reg)) {
+ int f = !s->reg_to_temp[reg] + !s->reg_to_temp[reg + 1];
+ if (f >= fmin) {
+ tcg_reg_free(s, reg, allocated_regs);
+ tcg_reg_free(s, reg + 1, allocated_regs);
+ return reg;
+ }
+ }
+ }
+ }
+ }
+ tcg_abort();
+}
+
/* Make sure the temporary is in a register. If needed, allocate the register
from DESIRED while avoiding ALLOCATED. */
static void temp_load(TCGContext *s, TCGTemp *ts, TCGRegSet desired_regs,
@@ -3550,8 +3724,10 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
/* satisfy input constraints */
for (k = 0; k < nb_iargs; k++) {
- TCGRegSet i_preferred_regs;
- bool allocate_new_reg;
+ TCGRegSet i_preferred_regs, i_required_regs;
+ bool allocate_new_reg, copyto_new_reg;
+ TCGTemp *ts2;
+ int i1, i2;
i = def->args_ct[nb_oargs + k].sort_index;
arg = op->args[i];
@@ -3568,43 +3744,164 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
reg = ts->reg;
i_preferred_regs = 0;
+ i_required_regs = arg_ct->regs;
allocate_new_reg = false;
+ copyto_new_reg = false;
- if (arg_ct->ialias) {
+ switch (arg_ct->pair) {
+ case 0: /* not paired */
+ if (arg_ct->ialias) {
+ i_preferred_regs = op->output_pref[arg_ct->alias_index];
+
+ /*
+ * If the input is readonly, then it cannot also be an
+ * output and aliased to itself. If the input is not
+ * dead after the instruction, we must allocate a new
+ * register and move it.
+ */
+ if (temp_readonly(ts) || !IS_DEAD_ARG(i)) {
+ allocate_new_reg = true;
+ } else if (ts->val_type == TEMP_VAL_REG) {
+ /*
+ * Check if the current register has already been
+ * allocated for another input.
+ */
+ allocate_new_reg =
+ tcg_regset_test_reg(i_allocated_regs, reg);
+ }
+ }
+ if (!allocate_new_reg) {
+ temp_load(s, ts, i_required_regs, i_allocated_regs,
+ i_preferred_regs);
+ reg = ts->reg;
+ allocate_new_reg = !tcg_regset_test_reg(i_required_regs, reg);
+ }
+ if (allocate_new_reg) {
+ /*
+ * Allocate a new register matching the constraint
+ * and move the temporary register into it.
+ */
+ temp_load(s, ts, tcg_target_available_regs[ts->type],
+ i_allocated_regs, 0);
+ reg = tcg_reg_alloc(s, i_required_regs, i_allocated_regs,
+ i_preferred_regs, ts->indirect_base);
+ copyto_new_reg = true;
+ }
+ break;
+
+ case 1:
+ /* First of an input pair; if i1 == i2, the second is an output. */
+ i1 = i;
+ i2 = arg_ct->pair_index;
+ ts2 = i1 != i2 ? arg_temp(op->args[i2]) : NULL;
+
+ /*
+ * It is easier to default to allocating a new pair
+ * and to identify a few cases where it's not required.
+ */
+ if (arg_ct->ialias) {
+ i_preferred_regs = op->output_pref[arg_ct->alias_index];
+ if (IS_DEAD_ARG(i1) &&
+ IS_DEAD_ARG(i2) &&
+ !temp_readonly(ts) &&
+ ts->val_type == TEMP_VAL_REG &&
+ ts->reg < TCG_TARGET_NB_REGS - 1 &&
+ tcg_regset_test_reg(i_required_regs, reg) &&
+ !tcg_regset_test_reg(i_allocated_regs, reg) &&
+ !tcg_regset_test_reg(i_allocated_regs, reg + 1) &&
+ (ts2
+ ? ts2->val_type == TEMP_VAL_REG &&
+ ts2->reg == reg + 1 &&
+ !temp_readonly(ts2)
+ : s->reg_to_temp[reg + 1] == NULL)) {
+ break;
+ }
+ } else {
+ /* Without aliasing, the pair must also be an input. */
+ tcg_debug_assert(ts2);
+ if (ts->val_type == TEMP_VAL_REG &&
+ ts2->val_type == TEMP_VAL_REG &&
+ ts2->reg == reg + 1 &&
+ tcg_regset_test_reg(i_required_regs, reg)) {
+ break;
+ }
+ }
+ reg = tcg_reg_alloc_pair(s, i_required_regs, i_allocated_regs,
+ 0, ts->indirect_base);
+ goto do_pair;
+
+ case 2: /* pair second */
+ reg = new_args[arg_ct->pair_index] + 1;
+ goto do_pair;
+
+ case 3: /* ialias with second output, no first input */
+ tcg_debug_assert(arg_ct->ialias);
i_preferred_regs = op->output_pref[arg_ct->alias_index];
- /*
- * If the input is readonly, then it cannot also be an
- * output and aliased to itself. If the input is not
- * dead after the instruction, we must allocate a new
- * register and move it.
- */
- if (temp_readonly(ts) || !IS_DEAD_ARG(i)) {
- allocate_new_reg = true;
- } else if (ts->val_type == TEMP_VAL_REG) {
- /*
- * Check if the current register has already been
- * allocated for another input.
- */
- allocate_new_reg = tcg_regset_test_reg(i_allocated_regs, reg);
+ if (IS_DEAD_ARG(i) &&
+ !temp_readonly(ts) &&
+ ts->val_type == TEMP_VAL_REG &&
+ reg > 0 &&
+ s->reg_to_temp[reg - 1] == NULL &&
+ tcg_regset_test_reg(i_required_regs, reg) &&
+ !tcg_regset_test_reg(i_allocated_regs, reg) &&
+ !tcg_regset_test_reg(i_allocated_regs, reg - 1)) {
+ tcg_regset_set_reg(i_allocated_regs, reg - 1);
+ break;
}
- }
+ reg = tcg_reg_alloc_pair(s, i_required_regs >> 1,
+ i_allocated_regs, 0,
+ ts->indirect_base);
+ tcg_regset_set_reg(i_allocated_regs, reg);
+ reg += 1;
+ goto do_pair;
- if (!allocate_new_reg) {
- temp_load(s, ts, arg_ct->regs, i_allocated_regs, i_preferred_regs);
- reg = ts->reg;
- allocate_new_reg = !tcg_regset_test_reg(arg_ct->regs, reg);
- }
-
- if (allocate_new_reg) {
+ do_pair:
/*
- * Allocate a new register matching the constraint
- * and move the temporary register into it.
+ * If an aliased input is not dead after the instruction,
+ * we must allocate a new register and move it.
*/
- temp_load(s, ts, tcg_target_available_regs[ts->type],
- i_allocated_regs, 0);
- reg = tcg_reg_alloc(s, arg_ct->regs, i_allocated_regs,
- i_preferred_regs, ts->indirect_base);
+ if (arg_ct->ialias && (!IS_DEAD_ARG(i) || temp_readonly(ts))) {
+ TCGRegSet t_allocated_regs = i_allocated_regs;
+
+ /*
+ * Because of the alias, and the continued life, make sure
+ * that the temp is somewhere *other* than the reg pair,
+ * and we get a copy in reg.
+ */
+ tcg_regset_set_reg(t_allocated_regs, reg);
+ tcg_regset_set_reg(t_allocated_regs, reg + 1);
+ if (ts->val_type == TEMP_VAL_REG && ts->reg == reg) {
+ /* If ts was already in reg, copy it somewhere else. */
+ TCGReg nr;
+ bool ok;
+
+ tcg_debug_assert(ts->kind != TEMP_FIXED);
+ nr = tcg_reg_alloc(s, tcg_target_available_regs[ts->type],
+ t_allocated_regs, 0, ts->indirect_base);
+ ok = tcg_out_mov(s, ts->type, nr, reg);
+ tcg_debug_assert(ok);
+
+ set_temp_val_reg(s, ts, nr);
+ } else {
+ temp_load(s, ts, tcg_target_available_regs[ts->type],
+ t_allocated_regs, 0);
+ copyto_new_reg = true;
+ }
+ } else {
+ /* Preferably allocate to reg, otherwise copy. */
+ i_required_regs = (TCGRegSet)1 << reg;
+ temp_load(s, ts, i_required_regs, i_allocated_regs,
+ i_preferred_regs);
+ copyto_new_reg = ts->reg != reg;
+ }
+ break;
+
+ default:
+ g_assert_not_reached();
+ }
+
+ if (copyto_new_reg) {
if (!tcg_out_mov(s, ts->type, reg, ts->reg)) {
/*
* Cross register class move not supported. Sync the
@@ -3656,15 +3953,46 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
/* ENV should not be modified. */
tcg_debug_assert(!temp_readonly(ts));
- if (arg_ct->oalias && !const_args[arg_ct->alias_index]) {
- reg = new_args[arg_ct->alias_index];
- } else if (arg_ct->newreg) {
- reg = tcg_reg_alloc(s, arg_ct->regs,
- i_allocated_regs | o_allocated_regs,
- op->output_pref[k], ts->indirect_base);
- } else {
- reg = tcg_reg_alloc(s, arg_ct->regs, o_allocated_regs,
- op->output_pref[k], ts->indirect_base);
+ switch (arg_ct->pair) {
+ case 0: /* not paired */
+ if (arg_ct->oalias && !const_args[arg_ct->alias_index]) {
+ reg = new_args[arg_ct->alias_index];
+ } else if (arg_ct->newreg) {
+ reg = tcg_reg_alloc(s, arg_ct->regs,
+ i_allocated_regs | o_allocated_regs,
+ op->output_pref[k], ts->indirect_base);
+ } else {
+ reg = tcg_reg_alloc(s, arg_ct->regs, o_allocated_regs,
+ op->output_pref[k], ts->indirect_base);
+ }
+ break;
+
+ case 1: /* first of pair */
+ tcg_debug_assert(!arg_ct->newreg);
+ if (arg_ct->oalias) {
+ reg = new_args[arg_ct->alias_index];
+ break;
+ }
+ reg = tcg_reg_alloc_pair(s, arg_ct->regs, o_allocated_regs,
+ op->output_pref[k], ts->indirect_base);
+ break;
+
+ case 2: /* second of pair */
+ tcg_debug_assert(!arg_ct->newreg);
+ if (arg_ct->oalias) {
+ reg = new_args[arg_ct->alias_index];
+ } else {
+ reg = new_args[arg_ct->pair_index] + 1;
+ }
+ break;
+
+ case 3: /* first of pair, aliasing with a second input */
+ tcg_debug_assert(!arg_ct->newreg);
+ reg = new_args[arg_ct->pair_index] - 1;
+ break;
+
+ default:
+ g_assert_not_reached();
}
tcg_regset_set_reg(o_allocated_regs, reg);
set_temp_val_reg(s, ts, reg);
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PULL v2 47/47] tests/tcg/multiarch: add vma-pthread.c
2023-01-06 3:17 [PULL v2 00/47] tcg misc queue Richard Henderson
2023-01-06 3:17 ` [PULL v2 19/47] tcg: Introduce paired register allocation Richard Henderson
@ 2023-01-06 3:17 ` Richard Henderson
2023-01-06 22:15 ` [PULL v2 00/47] tcg misc queue Peter Maydell
2 siblings, 0 replies; 4+ messages in thread
From: Richard Henderson @ 2023-01-06 3:17 UTC (permalink / raw)
To: qemu-devel; +Cc: peter.maydell, Ilya Leoshkevich, Alex Bennée
From: Ilya Leoshkevich <iii@linux.ibm.com>
Add a test that locklessly changes and exercises page protection bits
from various threads. This helps catch race conditions in the VMA
handling.
Acked-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Message-Id: <20221223120252.513319-1-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tests/tcg/multiarch/nop_func.h | 25 ++++
tests/tcg/multiarch/munmap-pthread.c | 16 +--
tests/tcg/multiarch/vma-pthread.c | 207 +++++++++++++++++++++++++++
tests/tcg/multiarch/Makefile.target | 3 +
4 files changed, 236 insertions(+), 15 deletions(-)
create mode 100644 tests/tcg/multiarch/nop_func.h
create mode 100644 tests/tcg/multiarch/vma-pthread.c
diff --git a/tests/tcg/multiarch/nop_func.h b/tests/tcg/multiarch/nop_func.h
new file mode 100644
index 0000000000..f714d21000
--- /dev/null
+++ b/tests/tcg/multiarch/nop_func.h
@@ -0,0 +1,25 @@
+/*
+ * No-op functions that can be safely copied.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+#ifndef NOP_FUNC_H
+#define NOP_FUNC_H
+
+static const char nop_func[] = {
+#if defined(__aarch64__)
+ 0xc0, 0x03, 0x5f, 0xd6, /* ret */
+#elif defined(__alpha__)
+ 0x01, 0x80, 0xFA, 0x6B, /* ret */
+#elif defined(__arm__)
+ 0x1e, 0xff, 0x2f, 0xe1, /* bx lr */
+#elif defined(__riscv)
+ 0x67, 0x80, 0x00, 0x00, /* ret */
+#elif defined(__s390__)
+ 0x07, 0xfe, /* br %r14 */
+#elif defined(__i386__) || defined(__x86_64__)
+ 0xc3, /* ret */
+#endif
+};
+
+#endif
diff --git a/tests/tcg/multiarch/munmap-pthread.c b/tests/tcg/multiarch/munmap-pthread.c
index d7143b00d5..1c79005846 100644
--- a/tests/tcg/multiarch/munmap-pthread.c
+++ b/tests/tcg/multiarch/munmap-pthread.c
@@ -7,21 +7,7 @@
#include <sys/mman.h>
#include <unistd.h>
-static const char nop_func[] = {
-#if defined(__aarch64__)
- 0xc0, 0x03, 0x5f, 0xd6, /* ret */
-#elif defined(__alpha__)
- 0x01, 0x80, 0xFA, 0x6B, /* ret */
-#elif defined(__arm__)
- 0x1e, 0xff, 0x2f, 0xe1, /* bx lr */
-#elif defined(__riscv)
- 0x67, 0x80, 0x00, 0x00, /* ret */
-#elif defined(__s390__)
- 0x07, 0xfe, /* br %r14 */
-#elif defined(__i386__) || defined(__x86_64__)
- 0xc3, /* ret */
-#endif
-};
+#include "nop_func.h"
static void *thread_mmap_munmap(void *arg)
{
diff --git a/tests/tcg/multiarch/vma-pthread.c b/tests/tcg/multiarch/vma-pthread.c
new file mode 100644
index 0000000000..7045da08fc
--- /dev/null
+++ b/tests/tcg/multiarch/vma-pthread.c
@@ -0,0 +1,207 @@
+/*
+ * Test that VMA updates do not race.
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ *
+ * Map a contiguous chunk of RWX memory. Split it into 8 equally sized
+ * regions, each of which is guaranteed to have a certain combination of
+ * protection bits set.
+ *
+ * Reader, writer and executor threads perform the respective operations on
+ * pages, which are guaranteed to have the respective protection bit set.
+ * Two mutator threads change the non-fixed protection bits randomly.
+ */
+#include <assert.h>
+#include <fcntl.h>
+#include <pthread.h>
+#include <stdbool.h>
+#include <stdlib.h>
+#include <string.h>
+#include <stdio.h>
+#include <sys/mman.h>
+#include <unistd.h>
+
+#include "nop_func.h"
+
+#define PAGE_IDX_BITS 10
+#define PAGE_COUNT (1 << PAGE_IDX_BITS)
+#define PAGE_IDX_MASK (PAGE_COUNT - 1)
+#define REGION_IDX_BITS 3
+#define PAGE_IDX_R_MASK (1 << 7)
+#define PAGE_IDX_W_MASK (1 << 8)
+#define PAGE_IDX_X_MASK (1 << 9)
+#define REGION_MASK (PAGE_IDX_R_MASK | PAGE_IDX_W_MASK | PAGE_IDX_X_MASK)
+#define PAGES_PER_REGION (1 << (PAGE_IDX_BITS - REGION_IDX_BITS))
+
+struct context {
+ int pagesize;
+ char *ptr;
+ int dev_null_fd;
+ volatile int mutator_count;
+};
+
+static void *thread_read(void *arg)
+{
+ struct context *ctx = arg;
+ ssize_t sret;
+ size_t i, j;
+ int ret;
+
+ for (i = 0; ctx->mutator_count; i++) {
+ char *p;
+
+ j = (i & PAGE_IDX_MASK) | PAGE_IDX_R_MASK;
+ p = &ctx->ptr[j * ctx->pagesize];
+
+ /* Read directly. */
+ ret = memcmp(p, nop_func, sizeof(nop_func));
+ if (ret != 0) {
+ fprintf(stderr, "fail direct read %p\n", p);
+ abort();
+ }
+
+ /* Read indirectly. */
+ sret = write(ctx->dev_null_fd, p, 1);
+ if (sret != 1) {
+ if (sret < 0) {
+ fprintf(stderr, "fail indirect read %p (%m)\n", p);
+ } else {
+ fprintf(stderr, "fail indirect read %p (%zd)\n", p, sret);
+ }
+ abort();
+ }
+ }
+
+ return NULL;
+}
+
+static void *thread_write(void *arg)
+{
+ struct context *ctx = arg;
+ struct timespec *ts;
+ size_t i, j;
+ int ret;
+
+ for (i = 0; ctx->mutator_count; i++) {
+ j = (i & PAGE_IDX_MASK) | PAGE_IDX_W_MASK;
+
+ /* Write directly. */
+ memcpy(&ctx->ptr[j * ctx->pagesize], nop_func, sizeof(nop_func));
+
+ /* Write using a syscall. */
+ ts = (struct timespec *)(&ctx->ptr[(j + 1) * ctx->pagesize] -
+ sizeof(struct timespec));
+ ret = clock_gettime(CLOCK_REALTIME, ts);
+ if (ret != 0) {
+ fprintf(stderr, "fail indirect write %p (%m)\n", ts);
+ abort();
+ }
+ }
+
+ return NULL;
+}
+
+static void *thread_execute(void *arg)
+{
+ struct context *ctx = arg;
+ size_t i, j;
+
+ for (i = 0; ctx->mutator_count; i++) {
+ j = (i & PAGE_IDX_MASK) | PAGE_IDX_X_MASK;
+ ((void(*)(void))&ctx->ptr[j * ctx->pagesize])();
+ }
+
+ return NULL;
+}
+
+static void *thread_mutate(void *arg)
+{
+ size_t i, start_idx, end_idx, page_idx, tmp;
+ struct context *ctx = arg;
+ unsigned int seed;
+ int prot, ret;
+
+ seed = (unsigned int)time(NULL);
+ for (i = 0; i < 10000; i++) {
+ start_idx = rand_r(&seed) & PAGE_IDX_MASK;
+ end_idx = rand_r(&seed) & PAGE_IDX_MASK;
+ if (start_idx > end_idx) {
+ tmp = start_idx;
+ start_idx = end_idx;
+ end_idx = tmp;
+ }
+ prot = rand_r(&seed) & (PROT_READ | PROT_WRITE | PROT_EXEC);
+ for (page_idx = start_idx & REGION_MASK; page_idx <= end_idx;
+ page_idx += PAGES_PER_REGION) {
+ if (page_idx & PAGE_IDX_R_MASK) {
+ prot |= PROT_READ;
+ }
+ if (page_idx & PAGE_IDX_W_MASK) {
+ /* FIXME: qemu syscalls check for both read+write. */
+ prot |= PROT_WRITE | PROT_READ;
+ }
+ if (page_idx & PAGE_IDX_X_MASK) {
+ prot |= PROT_EXEC;
+ }
+ }
+ ret = mprotect(&ctx->ptr[start_idx * ctx->pagesize],
+ (end_idx - start_idx + 1) * ctx->pagesize, prot);
+ assert(ret == 0);
+ }
+
+ __atomic_fetch_sub(&ctx->mutator_count, 1, __ATOMIC_SEQ_CST);
+
+ return NULL;
+}
+
+int main(void)
+{
+ pthread_t threads[5];
+ struct context ctx;
+ size_t i;
+ int ret;
+
+ /* Without a template, nothing to test. */
+ if (sizeof(nop_func) == 0) {
+ return EXIT_SUCCESS;
+ }
+
+ /* Initialize memory chunk. */
+ ctx.pagesize = getpagesize();
+ ctx.ptr = mmap(NULL, PAGE_COUNT * ctx.pagesize,
+ PROT_READ | PROT_WRITE | PROT_EXEC,
+ MAP_PRIVATE | MAP_ANONYMOUS, -1, 0);
+ assert(ctx.ptr != MAP_FAILED);
+ for (i = 0; i < PAGE_COUNT; i++) {
+ memcpy(&ctx.ptr[i * ctx.pagesize], nop_func, sizeof(nop_func));
+ }
+ ctx.dev_null_fd = open("/dev/null", O_WRONLY);
+ assert(ctx.dev_null_fd >= 0);
+ ctx.mutator_count = 2;
+
+ /* Start threads. */
+ ret = pthread_create(&threads[0], NULL, thread_read, &ctx);
+ assert(ret == 0);
+ ret = pthread_create(&threads[1], NULL, thread_write, &ctx);
+ assert(ret == 0);
+ ret = pthread_create(&threads[2], NULL, thread_execute, &ctx);
+ assert(ret == 0);
+ for (i = 3; i <= 4; i++) {
+ ret = pthread_create(&threads[i], NULL, thread_mutate, &ctx);
+ assert(ret == 0);
+ }
+
+ /* Wait for threads to stop. */
+ for (i = 0; i < sizeof(threads) / sizeof(threads[0]); i++) {
+ ret = pthread_join(threads[i], NULL);
+ assert(ret == 0);
+ }
+
+ /* Destroy memory chunk. */
+ ret = close(ctx.dev_null_fd);
+ assert(ret == 0);
+ ret = munmap(ctx.ptr, PAGE_COUNT * ctx.pagesize);
+ assert(ret == 0);
+
+ return EXIT_SUCCESS;
+}
diff --git a/tests/tcg/multiarch/Makefile.target b/tests/tcg/multiarch/Makefile.target
index 5f0fee1aad..e7213af492 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -39,6 +39,9 @@ signals: LDFLAGS+=-lrt -lpthread
munmap-pthread: CFLAGS+=-pthread
munmap-pthread: LDFLAGS+=-pthread
+vma-pthread: CFLAGS+=-pthread
+vma-pthread: LDFLAGS+=-pthread
+
# We define the runner for test-mmap after the individual
# architectures have defined their supported pages sizes. If no
# additional page sizes are defined we only run the default test.
--
2.34.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PULL v2 00/47] tcg misc queue
2023-01-06 3:17 [PULL v2 00/47] tcg misc queue Richard Henderson
2023-01-06 3:17 ` [PULL v2 19/47] tcg: Introduce paired register allocation Richard Henderson
2023-01-06 3:17 ` [PULL v2 47/47] tests/tcg/multiarch: add vma-pthread.c Richard Henderson
@ 2023-01-06 22:15 ` Peter Maydell
2 siblings, 0 replies; 4+ messages in thread
From: Peter Maydell @ 2023-01-06 22:15 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Fri, 6 Jan 2023 at 03:17, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Changes in patch 47, to reduce execution time with --enable-debug.
> Changes in patch 19, to fix an i386 specific register allocation failure.
>
>
> r~
>
>
> The following changes since commit cb9c6a8e5ad6a1f0ce164d352e3102df46986e22:
>
> .gitlab-ci.d/windows: Work-around timeout and OpenGL problems of the MSYS2 jobs (2023-01-04 18:58:33 +0000)
>
> are available in the Git repository at:
>
> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230105
>
> for you to fetch changes up to d4846c33ebe04d2141dcc613b5558d2f1d8077af:
>
> tests/tcg/multiarch: add vma-pthread.c (2023-01-05 11:41:29 -0800)
>
> ----------------------------------------------------------------
> Fix race conditions in new user-only vma tracking.
> Add tcg backend paired register allocation.
> Cleanup tcg backend function call abi.
>
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/8.0
for any user-visible changes.
-- PMM
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2023-01-06 22:16 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-06 3:17 [PULL v2 00/47] tcg misc queue Richard Henderson
2023-01-06 3:17 ` [PULL v2 19/47] tcg: Introduce paired register allocation Richard Henderson
2023-01-06 3:17 ` [PULL v2 47/47] tests/tcg/multiarch: add vma-pthread.c Richard Henderson
2023-01-06 22:15 ` [PULL v2 00/47] tcg misc queue Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).