* [PULL 0/4] tcg patch queue
@ 2024-01-10 21:52 Richard Henderson
2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson
` (4 more replies)
0 siblings, 5 replies; 6+ messages in thread
From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw)
To: qemu-devel
The following changes since commit 34eac35f893664eb8545b98142e23d9954722766:
Merge tag 'pull-riscv-to-apply-20240110' of https://github.com/alistair23/qemu into staging (2024-01-10 11:41:56 +0000)
are available in the Git repository at:
https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240111
for you to fetch changes up to 1d513e06d96697f44de4a1b85c6ff627c443e306:
util: fix build with musl libc on ppc64le (2024-01-11 08:48:16 +1100)
----------------------------------------------------------------
tcg/i386: Use more 8-bit immediate forms for add, sub, or, xor
tcg/ppc: Use new registers for LQ destination
util: fix build with musl libc on ppc64le
----------------------------------------------------------------
Natanael Copa (1):
util: fix build with musl libc on ppc64le
Paolo Bonzini (2):
tcg/i386: convert add/sub of 128 to sub/add of -128
tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates
Richard Henderson (1):
tcg/ppc: Use new registers for LQ destination
tcg/ppc/tcg-target-con-set.h | 2 +-
tcg/tcg.c | 21 ++++++++++++----
util/cpuinfo-ppc.c | 6 ++---
tcg/i386/tcg-target.c.inc | 60 +++++++++++++++++++++++++++++++++-----------
tcg/ppc/tcg-target.c.inc | 3 ++-
5 files changed, 67 insertions(+), 25 deletions(-)
^ permalink raw reply [flat|nested] 6+ messages in thread
* [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128
2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson
@ 2024-01-10 21:52 ` Richard Henderson
2024-01-10 21:52 ` [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates Richard Henderson
` (3 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini
From: Paolo Bonzini <pbonzini@redhat.com>
Extend the existing conditional that generates INC/DEC, to also swap an
ADD for a SUB and vice versa when the immediate is 128. This facilitates
using OPC_ARITH_EvIb instead of OPC_ARITH_EvIz.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231228120514.70205-1-pbonzini@redhat.com>
[rth: Use a switch on C]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/i386/tcg-target.c.inc | 49 +++++++++++++++++++++++++++------------
1 file changed, 34 insertions(+), 15 deletions(-)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index a83f8aab30..29e80af78b 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -1316,23 +1316,41 @@ static void tgen_arithi(TCGContext *s, int c, int r0,
c &= 7;
}
- /* ??? While INC is 2 bytes shorter than ADDL $1, they also induce
- partial flags update stalls on Pentium4 and are not recommended
- by current Intel optimization manuals. */
- if (!cf && (c == ARITH_ADD || c == ARITH_SUB) && (val == 1 || val == -1)) {
- int is_inc = (c == ARITH_ADD) ^ (val < 0);
- if (TCG_TARGET_REG_BITS == 64) {
- /* The single-byte increment encodings are re-tasked as the
- REX prefixes. Use the MODRM encoding. */
- tcg_out_modrm(s, OPC_GRP5 + rexw,
- (is_inc ? EXT5_INC_Ev : EXT5_DEC_Ev), r0);
- } else {
- tcg_out8(s, (is_inc ? OPC_INC_r32 : OPC_DEC_r32) + r0);
+ switch (c) {
+ case ARITH_ADD:
+ case ARITH_SUB:
+ if (!cf) {
+ /*
+ * ??? While INC is 2 bytes shorter than ADDL $1, they also induce
+ * partial flags update stalls on Pentium4 and are not recommended
+ * by current Intel optimization manuals.
+ */
+ if (val == 1 || val == -1) {
+ int is_inc = (c == ARITH_ADD) ^ (val < 0);
+ if (TCG_TARGET_REG_BITS == 64) {
+ /*
+ * The single-byte increment encodings are re-tasked
+ * as the REX prefixes. Use the MODRM encoding.
+ */
+ tcg_out_modrm(s, OPC_GRP5 + rexw,
+ (is_inc ? EXT5_INC_Ev : EXT5_DEC_Ev), r0);
+ } else {
+ tcg_out8(s, (is_inc ? OPC_INC_r32 : OPC_DEC_r32) + r0);
+ }
+ return;
+ }
+ if (val == 128) {
+ /*
+ * Facilitate using an 8-bit immediate. Carry is inverted
+ * by this transformation, so do it only if cf == 0.
+ */
+ c ^= ARITH_ADD ^ ARITH_SUB;
+ val = -128;
+ }
}
- return;
- }
+ break;
- if (c == ARITH_AND) {
+ case ARITH_AND:
if (TCG_TARGET_REG_BITS == 64) {
if (val == 0xffffffffu) {
tcg_out_ext32u(s, r0, r0);
@@ -1351,6 +1369,7 @@ static void tgen_arithi(TCGContext *s, int c, int r0,
tcg_out_ext16u(s, r0, r0);
return;
}
+ break;
}
if (val == (int8_t)val) {
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates
2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson
2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson
@ 2024-01-10 21:52 ` Richard Henderson
2024-01-10 21:52 ` [PULL 3/4] tcg/ppc: Use new registers for LQ destination Richard Henderson
` (2 subsequent siblings)
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw)
To: qemu-devel; +Cc: Paolo Bonzini
From: Paolo Bonzini <pbonzini@redhat.com>
In the case where OR or XOR has an 8-bit immediate between 128 and 255,
we can operate on a low-byte register and shorten the output by two or
three bytes (two if a prefix byte is needed for REX.B).
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20231228120524.70239-1-pbonzini@redhat.com>
[rth: Incorporate into switch.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/i386/tcg-target.c.inc | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index 29e80af78b..d268199fc1 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -244,6 +244,7 @@ static bool tcg_target_const_match(int64_t val, TCGType type, int ct, int vece)
#define P_VEXL 0x80000 /* Set VEX.L = 1 */
#define P_EVEX 0x100000 /* Requires EVEX encoding */
+#define OPC_ARITH_EbIb (0x80)
#define OPC_ARITH_EvIz (0x81)
#define OPC_ARITH_EvIb (0x83)
#define OPC_ARITH_GvEv (0x03) /* ... plus (ARITH_FOO << 3) */
@@ -1370,6 +1371,16 @@ static void tgen_arithi(TCGContext *s, int c, int r0,
return;
}
break;
+
+ case ARITH_OR:
+ case ARITH_XOR:
+ if (val >= 0x80 && val <= 0xff
+ && (r0 < 4 || TCG_TARGET_REG_BITS == 64)) {
+ tcg_out_modrm(s, OPC_ARITH_EbIb + P_REXB_RM, c, r0);
+ tcg_out8(s, val);
+ return;
+ }
+ break;
}
if (val == (int8_t)val) {
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL 3/4] tcg/ppc: Use new registers for LQ destination
2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson
2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson
2024-01-10 21:52 ` [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates Richard Henderson
@ 2024-01-10 21:52 ` Richard Henderson
2024-01-10 21:52 ` [PULL 4/4] util: fix build with musl libc on ppc64le Richard Henderson
2024-01-11 15:16 ` [PULL 0/4] tcg patch queue Peter Maydell
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable, Philippe Mathieu-Daudé
LQ has a constraint that RTp != RA, else SIGILL.
Therefore, force the destination of INDEX_op_qemu_*_ld128 to be a
new register pair, so that it cannot overlap the input address.
This requires new support in process_op_defs and tcg_reg_alloc_op.
Cc: qemu-stable@nongnu.org
Fixes: 526cd4ec01f ("tcg/ppc: Support 128-bit load/store")
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240102013456.131846-1-richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/ppc/tcg-target-con-set.h | 2 +-
tcg/tcg.c | 21 ++++++++++++++++-----
tcg/ppc/tcg-target.c.inc | 3 ++-
3 files changed, 19 insertions(+), 7 deletions(-)
diff --git a/tcg/ppc/tcg-target-con-set.h b/tcg/ppc/tcg-target-con-set.h
index bbd7b21247..cb47b29452 100644
--- a/tcg/ppc/tcg-target-con-set.h
+++ b/tcg/ppc/tcg-target-con-set.h
@@ -35,7 +35,7 @@ C_O1_I3(v, v, v, v)
C_O1_I4(r, r, ri, rZ, rZ)
C_O1_I4(r, r, r, ri, ri)
C_O2_I1(r, r, r)
-C_O2_I1(o, m, r)
+C_N1O1_I1(o, m, r)
C_O2_I2(r, r, r, r)
C_O2_I4(r, r, rI, rZM, r, r)
C_O2_I4(r, r, r, r, rI, rZM)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 896a36caeb..e2c38f6d11 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -653,6 +653,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1,
#define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4),
#define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2),
+#define C_N1O1_I1(O1, O2, I1) C_PFX3(c_n1o1_i1_, O1, O2, I1),
#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1),
#define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1),
@@ -676,6 +677,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
#undef C_O1_I3
#undef C_O1_I4
#undef C_N1_I2
+#undef C_N1O1_I1
#undef C_N2_I1
#undef C_O2_I1
#undef C_O2_I2
@@ -696,6 +698,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
#define C_O1_I4(O1, I1, I2, I3, I4) { .args_ct_str = { #O1, #I1, #I2, #I3, #I4 } },
#define C_N1_I2(O1, I1, I2) { .args_ct_str = { "&" #O1, #I1, #I2 } },
+#define C_N1O1_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, #O2, #I1 } },
#define C_N2_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, "&" #O2, #I1 } },
#define C_O2_I1(O1, O2, I1) { .args_ct_str = { #O1, #O2, #I1 } },
@@ -718,6 +721,7 @@ static const TCGTargetOpDef constraint_sets[] = {
#undef C_O1_I3
#undef C_O1_I4
#undef C_N1_I2
+#undef C_N1O1_I1
#undef C_N2_I1
#undef C_O2_I1
#undef C_O2_I2
@@ -738,6 +742,7 @@ static const TCGTargetOpDef constraint_sets[] = {
#define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4)
#define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2)
+#define C_N1O1_I1(O1, O2, I1) C_PFX3(c_n1o1_i1_, O1, O2, I1)
#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1)
#define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1)
@@ -2988,6 +2993,7 @@ static void process_op_defs(TCGContext *s)
.pair = 2,
.pair_index = o,
.regs = def->args_ct[o].regs << 1,
+ .newreg = def->args_ct[o].newreg,
};
def->args_ct[o].pair = 1;
def->args_ct[o].pair_index = i;
@@ -3004,6 +3010,7 @@ static void process_op_defs(TCGContext *s)
.pair = 1,
.pair_index = o,
.regs = def->args_ct[o].regs >> 1,
+ .newreg = def->args_ct[o].newreg,
};
def->args_ct[o].pair = 2;
def->args_ct[o].pair_index = i;
@@ -5036,17 +5043,21 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
break;
case 1: /* first of pair */
- tcg_debug_assert(!arg_ct->newreg);
if (arg_ct->oalias) {
reg = new_args[arg_ct->alias_index];
- break;
+ } else if (arg_ct->newreg) {
+ reg = tcg_reg_alloc_pair(s, arg_ct->regs,
+ i_allocated_regs | o_allocated_regs,
+ output_pref(op, k),
+ ts->indirect_base);
+ } else {
+ reg = tcg_reg_alloc_pair(s, arg_ct->regs, o_allocated_regs,
+ output_pref(op, k),
+ ts->indirect_base);
}
- reg = tcg_reg_alloc_pair(s, arg_ct->regs, o_allocated_regs,
- output_pref(op, k), ts->indirect_base);
break;
case 2: /* second of pair */
- tcg_debug_assert(!arg_ct->newreg);
if (arg_ct->oalias) {
reg = new_args[arg_ct->alias_index];
} else {
diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index 856c3b18f5..54816967bc 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2595,6 +2595,7 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg datalo, TCGReg datahi,
tcg_debug_assert(!need_bswap);
tcg_debug_assert(datalo & 1);
tcg_debug_assert(datahi == datalo - 1);
+ tcg_debug_assert(!is_ld || datahi != index);
insn = is_ld ? LQ : STQ;
tcg_out32(s, insn | TAI(datahi, index, 0));
} else {
@@ -4071,7 +4072,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_qemu_ld_a32_i128:
case INDEX_op_qemu_ld_a64_i128:
- return C_O2_I1(o, m, r);
+ return C_N1O1_I1(o, m, r);
case INDEX_op_qemu_st_a32_i128:
case INDEX_op_qemu_st_a64_i128:
return C_O0_I3(o, m, r);
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* [PULL 4/4] util: fix build with musl libc on ppc64le
2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson
` (2 preceding siblings ...)
2024-01-10 21:52 ` [PULL 3/4] tcg/ppc: Use new registers for LQ destination Richard Henderson
@ 2024-01-10 21:52 ` Richard Henderson
2024-01-11 15:16 ` [PULL 0/4] tcg patch queue Peter Maydell
4 siblings, 0 replies; 6+ messages in thread
From: Richard Henderson @ 2024-01-10 21:52 UTC (permalink / raw)
To: qemu-devel; +Cc: Natanael Copa, qemu-stable
From: Natanael Copa <ncopa@alpinelinux.org>
Use PPC_FEATURE2_ISEL and PPC_FEATURE2_VEC_CRYPTO from linux headers
instead of the GNU specific PPC_FEATURE2_HAS_ISEL and
PPC_FEATURE2_HAS_VEC_CRYPTO. This fixes build with musl libc.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1861
Signed-off-by: Natanael Copa <ncopa@alpinelinux.org>
Fixes: 63922f467a ("tcg/ppc: Replace HAVE_ISEL macro with a variable")
Fixes: 68f340d4cd ("tcg/ppc: Enable Altivec detection")
Message-Id: <20231219105236.7059-1-ncopa@alpinelinux.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
util/cpuinfo-ppc.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/util/cpuinfo-ppc.c b/util/cpuinfo-ppc.c
index 1ea3db0ac8..b2d8893a06 100644
--- a/util/cpuinfo-ppc.c
+++ b/util/cpuinfo-ppc.c
@@ -6,10 +6,10 @@
#include "qemu/osdep.h"
#include "host/cpuinfo.h"
+#include <asm/cputable.h>
#ifdef CONFIG_GETAUXVAL
# include <sys/auxv.h>
#else
-# include <asm/cputable.h>
# include "elf.h"
#endif
@@ -40,7 +40,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
info |= CPUINFO_V2_06;
}
- if (hwcap2 & PPC_FEATURE2_HAS_ISEL) {
+ if (hwcap2 & PPC_FEATURE2_ISEL) {
info |= CPUINFO_ISEL;
}
if (hwcap & PPC_FEATURE_HAS_ALTIVEC) {
@@ -53,7 +53,7 @@ unsigned __attribute__((constructor)) cpuinfo_init(void)
* always have both anyway, since VSX came with Power7
* and crypto came with Power8.
*/
- if (hwcap2 & PPC_FEATURE2_HAS_VEC_CRYPTO) {
+ if (hwcap2 & PPC_FEATURE2_VEC_CRYPTO) {
info |= CPUINFO_CRYPTO;
}
}
--
2.34.1
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PULL 0/4] tcg patch queue
2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson
` (3 preceding siblings ...)
2024-01-10 21:52 ` [PULL 4/4] util: fix build with musl libc on ppc64le Richard Henderson
@ 2024-01-11 15:16 ` Peter Maydell
4 siblings, 0 replies; 6+ messages in thread
From: Peter Maydell @ 2024-01-11 15:16 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
On Wed, 10 Jan 2024 at 21:52, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The following changes since commit 34eac35f893664eb8545b98142e23d9954722766:
>
> Merge tag 'pull-riscv-to-apply-20240110' of https://github.com/alistair23/qemu into staging (2024-01-10 11:41:56 +0000)
>
> are available in the Git repository at:
>
> https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20240111
>
> for you to fetch changes up to 1d513e06d96697f44de4a1b85c6ff627c443e306:
>
> util: fix build with musl libc on ppc64le (2024-01-11 08:48:16 +1100)
>
> ----------------------------------------------------------------
> tcg/i386: Use more 8-bit immediate forms for add, sub, or, xor
> tcg/ppc: Use new registers for LQ destination
> util: fix build with musl libc on ppc64le
>
Applied, thanks.
Please update the changelog at https://wiki.qemu.org/ChangeLog/9.0
for any user-visible changes.
-- PMM
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2024-01-11 15:18 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-10 21:52 [PULL 0/4] tcg patch queue Richard Henderson
2024-01-10 21:52 ` [PULL 1/4] tcg/i386: convert add/sub of 128 to sub/add of -128 Richard Henderson
2024-01-10 21:52 ` [PULL 2/4] tcg/i386: use 8-bit OR or XOR for unsigned 8-bit immediates Richard Henderson
2024-01-10 21:52 ` [PULL 3/4] tcg/ppc: Use new registers for LQ destination Richard Henderson
2024-01-10 21:52 ` [PULL 4/4] util: fix build with musl libc on ppc64le Richard Henderson
2024-01-11 15:16 ` [PULL 0/4] tcg patch queue Peter Maydell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).