* [PULL 0/4] tcg patch queue @ 2024-01-10 21:52 Richard Henderson 2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson ` (4 more replies) 0 siblings, 5 replies; 6+ messages in thread From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw) To: qemu-devel The following changes since commit 34eac35f893664eb8545b98142e23d9954722766: Merge tag 'pull-riscv-to-apply-20240110' of https://github.com/alistair23/qemu into staging (2024-01-10 11:41:56 +0000) are available in the Git repository at: https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240111 for you to fetch changes up to 1d513e06d96697f44de4a1b85c6ff627c443e306: util: fix build with musl libc on ppc64le (2024-01-11 08:48:16 +1100) ---------------------------------------------------------------- tcg/i386: Use more 8-bit immediate forms for add, sub, or, xor tcg/ppc: Use new registers for LQ destination util: fix build with musl libc on ppc64le ---------------------------------------------------------------- Natanael Copa (1): util: fix build with musl libc on ppc64le Paolo Bonzini (2): tcg/i386: convert add/sub of 128 to sub/add of -128 tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates Richard Henderson (1): tcg/ppc: Use new registers for LQ destination tcg/ppc/tcg-target-con-set.h | 2 +- tcg/tcg.c | 21 ++++++++++++---- util/cpuinfo-ppc.c | 6 ++--- tcg/i386/tcg-target.c.inc | 60 +++++++++++++++++++++++++++++++++----------- tcg/ppc/tcg-target.c.inc | 3 ++- 5 files changed, 67 insertions(+), 25 deletions(-) ^ permalink raw reply [flat|nested] 6+ messages in thread
* [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson @ 2024-01-10 21:52 ` Richard Henderson 2024-01-10 21:52 ` [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates Richard Henderson ` (3 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini From: Paolo Bonzini <pbonzini@redhat.com> Extend the existing conditional that generates INC/DEC, to also swap an ADD for a SUB and vice versa when the immediate is 128. This facilitates using OPC_ARITH_EvIb instead of OPC_ARITH_EvIz. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20231228120514.70205-1-pbonzini@redhat.com> [rth: Use a switch on C] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/i386/tcg-target.c.inc | 49 +++++++++++++++++++++++++++------------ 1 file changed, 34 insertions(+), 15 deletions(-) diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc index a83f8aab30..29e80af78b 100644 --- a/tcg/i386/tcg-target.c.inc +++ b/tcg/i386/tcg-target.c.inc @@ -1316,23 +1316,41 @@ static void tgen_arithi(TCGContext *s, int c, int r0, c &= 7; } - /* ??? While INC is 2 bytes shorter than ADDL $1, they also induce - partial flags update stalls on Pentium4 and are not recommended - by current Intel optimization manuals. */ - if (!cf && (c == ARITH_ADD || c == ARITH_SUB) && (val == 1 || val == -1)) { - int is_inc = (c == ARITH_ADD) ^ (val < 0); - if (TCG_TARGET_REG_BITS == 64) { - /* The single-byte increment encodings are re-tasked as the - REX prefixes. Use the MODRM encoding. */ - tcg_out_modrm(s, OPC_GRP5 + rexw, - (is_inc ? EXT5_INC_Ev : EXT5_DEC_Ev), r0); - } else { - tcg_out8(s, (is_inc ? OPC_INC_r32 : OPC_DEC_r32) + r0); + switch (c) { + case ARITH_ADD: + case ARITH_SUB: + if (!cf) { + /* + * ??? While INC is 2 bytes shorter than ADDL $1, they also induce + * partial flags update stalls on Pentium4 and are not recommended + * by current Intel optimization manuals. + */ + if (val == 1 || val == -1) { + int is_inc = (c == ARITH_ADD) ^ (val < 0); + if (TCG_TARGET_REG_BITS == 64) { + /* + * The single-byte increment encodings are re-tasked + * as the REX prefixes. Use the MODRM encoding. + */ + tcg_out_modrm(s, OPC_GRP5 + rexw, + (is_inc ? EXT5_INC_Ev : EXT5_DEC_Ev), r0); + } else { + tcg_out8(s, (is_inc ? OPC_INC_r32 : OPC_DEC_r32) + r0); + } + return; + } + if (val == 128) { + /* + * Facilitate using an 8-bit immediate. Carry is inverted + * by this transformation, so do it only if cf == 0. + */ + c ^= ARITH_ADD ^ ARITH_SUB; + val = -128; + } } - return; - } + break; - if (c == ARITH_AND) { + case ARITH_AND: if (TCG_TARGET_REG_BITS == 64) { if (val == 0xffffffffu) { tcg_out_ext32u(s, r0, r0); @@ -1351,6 +1369,7 @@ static void tgen_arithi(TCGContext *s, int c, int r0, tcg_out_ext16u(s, r0, r0); return; } + break; } if (val == (int8_t)val) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates 2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson 2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson @ 2024-01-10 21:52 ` Richard Henderson 2024-01-10 21:52 ` [PULL 3/4] tcg/ppc: Use new registers for LQ destination Richard Henderson ` (2 subsequent siblings) 4 siblings, 0 replies; 6+ messages in thread From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw) To: qemu-devel; +Cc: Paolo Bonzini From: Paolo Bonzini <pbonzini@redhat.com> In the case where OR or XOR has an 8-bit immediate between 128 and 255, we can operate on a low-byte register and shorten the output by two or three bytes (two if a prefix byte is needed for REX.B). Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20231228120524.70239-1-pbonzini@redhat.com> [rth: Incorporate into switch.] Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/i386/tcg-target.c.inc | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc index 29e80af78b..d268199fc1 100644 --- a/tcg/i386/tcg-target.c.inc +++ b/tcg/i386/tcg-target.c.inc @@ -244,6 +244,7 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece) #define P_VEXL 0x80000 /* Set VEX.L = 1 */ #define P_EVEX 0x100000 /* Requires EVEX encoding */ +#define OPC_ARITH_EbIb (0x80) #define OPC_ARITH_EvIz (0x81) #define OPC_ARITH_EvIb (0x83) #define OPC_ARITH_GvEv (0x03) /* ... plus (ARITH_FOO << 3) */ @@ -1370,6 +1371,16 @@ static void tgen_arithi(TCGContext *s, int c, int r0, return; } break; + + case ARITH_OR: + case ARITH_XOR: + if (val >= 0x80 && val <= 0xff + && (r0 < 4 || TCG_TARGET_REG_BITS == 64)) { + tcg_out_modrm(s, OPC_ARITH_EbIb + P_REXB_RM, c, r0); + tcg_out8(s, val); + return; + } + break; } if (val == (int8_t)val) { -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL 3/4] tcg/ppc: Use new registers for LQ destination 2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson 2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson 2024-01-10 21:52 ` [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates Richard Henderson @ 2024-01-10 21:52 ` Richard Henderson 2024-01-10 21:52 ` [PULL 4/4] util: fix build with musl libc on ppc64le Richard Henderson 2024-01-11 15:16 ` [PULL 0/4] tcg patch queue Peter Maydell 4 siblings, 0 replies; 6+ messages in thread From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw) To: qemu-devel; +Cc: qemu-stable, Philippe Mathieu-Daudé LQ has a constraint that RTp != RA, else SIGILL. Therefore, force the destination of INDEX_op_qemu_*_ld128 to be a new register pair, so that it cannot overlap the input address. This requires new support in process_op_defs and tcg_reg_alloc_op. Cc: qemu-stable@nongnu.org Fixes: 526cd4ec01f ("tcg/ppc: Support 128-bit load/store") Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org> Message-Id: <20240102013456.131846-1-richard.henderson@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- tcg/ppc/tcg-target-con-set.h | 2 +- tcg/tcg.c | 21 ++++++++++++++++----- tcg/ppc/tcg-target.c.inc | 3 ++- 3 files changed, 19 insertions(+), 7 deletions(-) diff --git a/tcg/ppc/tcg-target-con-set.h b/tcg/ppc/tcg-target-con-set.h index bbd7b21247..cb47b29452 100644 --- a/tcg/ppc/tcg-target-con-set.h +++ b/tcg/ppc/tcg-target-con-set.h @@ -35,7 +35,7 @@ C_O1_I3(v, v, v, v) C_O1_I4(r, r, ri, rZ, rZ) C_O1_I4(r, r, r, ri, ri) C_O2_I1(r, r, r) -C_O2_I1(o, m, r) +C_N1O1_I1(o, m, r) C_O2_I2(r, r, r, r) C_O2_I4(r, r, rI, rZM, r, r) C_O2_I4(r, r, r, r, rI, rZM) diff --git a/tcg/tcg.c b/tcg/tcg.c index 896a36caeb..e2c38f6d11 100644 --- a/tcg/tcg.c +++ b/tcg/tcg.c @@ -653,6 +653,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1, #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4), #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2), +#define C_N1O1_I1(O1, O2, I1) C_PFX3(c_n1o1_i1_, O1, O2, I1), #define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1), #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1), @@ -676,6 +677,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode); #undef C_O1_I3 #undef C_O1_I4 #undef C_N1_I2 +#undef C_N1O1_I1 #undef C_N2_I1 #undef C_O2_I1 #undef C_O2_I2 @@ -696,6 +698,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode); #define C_O1_I4(O1, I1, I2, I3, I4) { .args_ct_str = { #O1, #I1, #I2, #I3, #I4 } }, #define C_N1_I2(O1, I1, I2) { .args_ct_str = { "&" #O1, #I1, #I2 } }, +#define C_N1O1_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, #O2, #I1 } }, #define C_N2_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, "&" #O2, #I1 } }, #define C_O2_I1(O1, O2, I1) { .args_ct_str = { #O1, #O2, #I1 } }, @@ -718,6 +721,7 @@ static const TCGTargetOpDef constraint_sets[] = { #undef C_O1_I3 #undef C_O1_I4 #undef C_N1_I2 +#undef C_N1O1_I1 #undef C_N2_I1 #undef C_O2_I1 #undef C_O2_I2 @@ -738,6 +742,7 @@ static const TCGTargetOpDef constraint_sets[] = { #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4) #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2) +#define C_N1O1_I1(O1, O2, I1) C_PFX3(c_n1o1_i1_, O1, O2, I1) #define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1) #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1) @@ -2988,6 +2993,7 @@ static void process_op_defs(TCGContext *s) .pair = 2, .pair_index = o, .regs = def->args_ct[o].regs << 1, + .newreg = def->args_ct[o].newreg, }; def->args_ct[o].pair = 1; def->args_ct[o].pair_index = i; @@ -3004,6 +3010,7 @@ static void process_op_defs(TCGContext *s) .pair = 1, .pair_index = o, .regs = def->args_ct[o].regs >> 1, + .newreg = def->args_ct[o].newreg, }; def->args_ct[o].pair = 2; def->args_ct[o].pair_index = i; @@ -5036,17 +5043,21 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op) break; case 1: /* first of pair */ - tcg_debug_assert(!arg_ct->newreg); if (arg_ct->oalias) { reg = new_args[arg_ct->alias_index]; - break; + } else if (arg_ct->newreg) { + reg = tcg_reg_alloc_pair(s, arg_ct->regs, + i_allocated_regs | o_allocated_regs, + output_pref(op, k), + ts->indirect_base); + } else { + reg = tcg_reg_alloc_pair(s, arg_ct->regs, o_allocated_regs, + output_pref(op, k), + ts->indirect_base); } - reg = tcg_reg_alloc_pair(s, arg_ct->regs, o_allocated_regs, - output_pref(op, k), ts->indirect_base); break; case 2: /* second of pair */ - tcg_debug_assert(!arg_ct->newreg); if (arg_ct->oalias) { reg = new_args[arg_ct->alias_index]; } else { diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc index 856c3b18f5..54816967bc 100644 --- a/tcg/ppc/tcg-target.c.inc +++ b/tcg/ppc/tcg-target.c.inc @@ -2595,6 +2595,7 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg datalo, TCGReg datahi, tcg_debug_assert(!need_bswap); tcg_debug_assert(datalo & 1); tcg_debug_assert(datahi == datalo - 1); + tcg_debug_assert(!is_ld || datahi != index); insn = is_ld ? LQ : STQ; tcg_out32(s, insn | TAI(datahi, index, 0)); } else { @@ -4071,7 +4072,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op) case INDEX_op_qemu_ld_a32_i128: case INDEX_op_qemu_ld_a64_i128: - return C_O2_I1(o, m, r); + return C_N1O1_I1(o, m, r); case INDEX_op_qemu_st_a32_i128: case INDEX_op_qemu_st_a64_i128: return C_O0_I3(o, m, r); -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL 4/4] util: fix build with musl libc on ppc64le 2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson ` (2 preceding siblings ...) 2024-01-10 21:52 ` [PULL 3/4] tcg/ppc: Use new registers for LQ destination Richard Henderson @ 2024-01-10 21:52 ` Richard Henderson 2024-01-11 15:16 ` [PULL 0/4] tcg patch queue Peter Maydell 4 siblings, 0 replies; 6+ messages in thread From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw) To: qemu-devel; +Cc: Natanael Copa, qemu-stable From: Natanael Copa <ncopa@alpinelinux.org> Use PPC_FEATURE2_ISEL and PPC_FEATURE2_VEC_CRYPTO from linux headers instead of the GNU specific PPC_FEATURE2_HAS_ISEL and PPC_FEATURE2_HAS_VEC_CRYPTO. This fixes build with musl libc. Cc: qemu-stable@nongnu.org Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1861 Signed-off-by: Natanael Copa <ncopa@alpinelinux.org> Fixes: 63922f467a ("tcg/ppc: Replace HAVE_ISEL macro with a variable") Fixes: 68f340d4cd ("tcg/ppc: Enable Altivec detection") Message-Id: <20231219105236.7059-1-ncopa@alpinelinux.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- util/cpuinfo-ppc.c | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/util/cpuinfo-ppc.c b/util/cpuinfo-ppc.c index 1ea3db0ac8..b2d8893a06 100644 --- a/util/cpuinfo-ppc.c +++ b/util/cpuinfo-ppc.c @@ -6,10 +6,10 @@ #include "qemu/osdep.h" #include "host/cpuinfo.h" +#include <asm/cputable.h> #ifdef CONFIG_GETAUXVAL # include <sys/auxv.h> #else -# include <asm/cputable.h> # include "elf.h" #endif @@ -40,7 +40,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void) info |= CPUINFO_V2_06; } - if (hwcap2 & PPC_FEATURE2_HAS_ISEL) { + if (hwcap2 & PPC_FEATURE2_ISEL) { info |= CPUINFO_ISEL; } if (hwcap & PPC_FEATURE_HAS_ALTIVEC) { @@ -53,7 +53,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void) * always have both anyway, since VSX came with Power7 * and crypto came with Power8. */ - if (hwcap2 & PPC_FEATURE2_HAS_VEC_CRYPTO) { + if (hwcap2 & PPC_FEATURE2_VEC_CRYPTO) { info |= CPUINFO_CRYPTO; } } -- 2.34.1 ^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PULL 0/4] tcg patch queue 2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson ` (3 preceding siblings ...) 2024-01-10 21:52 ` [PULL 4/4] util: fix build with musl libc on ppc64le Richard Henderson @ 2024-01-11 15:16 ` Peter Maydell 4 siblings, 0 replies; 6+ messages in thread From: Peter Maydell @ 2024-01-11 15:16 UTC (permalink / raw) To: Richard Henderson; +Cc: qemu-devel On Wed, 10 Jan 2024 at 21:52, Richard Henderson <richard.henderson@linaro.org> wrote: > > The following changes since commit 34eac35f893664eb8545b98142e23d9954722766: > > Merge tag 'pull-riscv-to-apply-20240110' of https://github.com/alistair23/qemu into staging (2024-01-10 11:41:56 +0000) > > are available in the Git repository at: > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240111 > > for you to fetch changes up to 1d513e06d96697f44de4a1b85c6ff627c443e306: > > util: fix build with musl libc on ppc64le (2024-01-11 08:48:16 +1100) > > ---------------------------------------------------------------- > tcg/i386: Use more 8-bit immediate forms for add, sub, or, xor > tcg/ppc: Use new registers for LQ destination > util: fix build with musl libc on ppc64le > Applied, thanks. Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0 for any user-visible changes. -- PMM ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-11 15:18 UTC | newest] Thread overview: 6+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson 2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson 2024-01-10 21:52 ` [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates Richard Henderson 2024-01-10 21:52 ` [PULL 3/4] tcg/ppc: Use new registers for LQ destination Richard Henderson 2024-01-10 21:52 ` [PULL 4/4] util: fix build with musl libc on ppc64le Richard Henderson 2024-01-11 15:16 ` [PULL 0/4] tcg patch queue Peter Maydell
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).