* [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0
@ 2025-12-10 13:16 Paolo Bonzini
2025-12-10 13:16 ` [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction Paolo Bonzini
` (17 more replies)
0 siblings, 18 replies; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
This notably includes the last patches from the original set that implemented
the new decoder (cleaning up a bit the x87 decoder), more removal of temporaries,
and more size reduction for CC computation helpers. On top of that there are a
few simplifications, fies and optimizations.
The diffstat is large but most of it is moving code around.
Paolo
Paolo Bonzini (18):
target/i386/tcg: fix check for invalid VSIB instruction
target/i386/tcg: ignore V3 in 32-bit mode
target/i386/tcg: update cc_op after PUSHF
target/i386/tcg: mark more instructions that are invalid in 64-bit mode
target/i386/tcg: do not compute all flags for SAHF
target/i386/tcg: remove do_decode_0F
target/i386/tcg: move and expand misplaced comment
target/i386/tcg: simplify effective address calculation
target/i386/tcg: unnest switch statements in disas_insn_x87
target/i386/tcg: move fcom/fcomp differentiation to gen_helper_fp_arith_ST0_FT0
target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for fcom STn and fcomp STn
target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for undocumented fcom/fcomp variants
target/i386/tcg: unify more pop/no-pop x87 instructions
target/i386/tcg: kill tmp1_i64
target/i386/tcg: kill tmp2_i32
target/i386/tcg: commonize code to compute SF/ZF/PF
target/i386/tcg: add a CCOp for SBB x,x
target/i386/tcg: move fetch code out of translate.c
target/i386/cpu.h | 17 +-
target/i386/tcg/decode-new.h | 3 +
target/i386/tcg/cc_helper_template.h.inc | 112 +--
target/i386/cpu-dump.c | 2 +
target/i386/tcg/cc_helper.c | 280 +++++---
target/i386/tcg/translate.c | 824 ++++++++---------------
target/i386/tcg/decode-new.c.inc | 328 ++++++++-
target/i386/tcg/emit.c.inc | 109 ++-
8 files changed, 845 insertions(+), 830 deletions(-)
--
2.52.0
^ permalink raw reply [flat|nested] 41+ messages in thread
* [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 15:47 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 02/18] target/i386/tcg: ignore V3 in 32-bit mode Paolo Bonzini
` (16 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable
VSIB instructions (VEX class 12) must not have an address prefix.
Checking s->aflag == MO_16 is not enough because in 64-bit mode
the address prefix changes aflag to MO_32. Add a specific check
bit instead.
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.h | 3 +++
target/i386/tcg/decode-new.c.inc | 27 +++++++++++++--------------
2 files changed, 16 insertions(+), 14 deletions(-)
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index 7f23d373ea7..38882b5c6ab 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -181,6 +181,9 @@ typedef enum X86InsnCheck {
/* Vendor-specific checks for Intel/AMD differences */
X86_CHECK_i64_amd = 2048,
X86_CHECK_o64_intel = 4096,
+
+ /* No 0x67 prefix allowed */
+ X86_CHECK_no_adr = 8192,
} X86InsnCheck;
typedef enum X86InsnSpecial {
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 0f8c5d16938..0b85b0f6513 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -623,10 +623,10 @@ static const X86OpEntry opcodes_0F38_00toEF[240] = {
[0x46] = X86_OP_ENTRY3(VPSRAV, V,x, H,x, W,x, vex6 chk(W0) cpuid(AVX2) p_66),
[0x47] = X86_OP_ENTRY3(VPSLLV, V,x, H,x, W,x, vex6 cpuid(AVX2) p_66),
- [0x90] = X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 cpuid(AVX2) p_66), /* vpgatherdd/q */
- [0x91] = X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 cpuid(AVX2) p_66), /* vpgatherqd/q */
- [0x92] = X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 cpuid(AVX2) p_66), /* vgatherdps/d */
- [0x93] = X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 cpuid(AVX2) p_66), /* vgatherqps/d */
+ [0x90] = X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 chk(no_adr) cpuid(AVX2) p_66), /* vpgatherdd/q */
+ [0x91] = X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 chk(no_adr) cpuid(AVX2) p_66), /* vpgatherqd/q */
+ [0x92] = X86_OP_ENTRY3(VPGATHERD, V,x, H,x, M,d, vex12 chk(no_adr) cpuid(AVX2) p_66), /* vgatherdps/d */
+ [0x93] = X86_OP_ENTRY3(VPGATHERQ, V,x, H,x, M,q, vex12 chk(no_adr) cpuid(AVX2) p_66), /* vgatherqps/d */
/* Should be exception type 2 but they do not have legacy SSE equivalents? */
[0x96] = X86_OP_ENTRY3(VFMADDSUB132Px, V,x, H,x, W,x, vex6 cpuid(FMA) p_66),
@@ -2435,8 +2435,8 @@ static bool validate_vex(DisasContext *s, X86DecodedInsn *decode)
break;
case 12:
/* Must have a VSIB byte and no address prefix. */
- assert(s->has_modrm);
- if ((s->modrm & 7) != 4 || s->aflag == MO_16) {
+ assert(s->has_modrm && (decode->e.check & X86_CHECK_no_adr));
+ if ((s->modrm & 7) != 4) {
goto illegal;
}
@@ -2740,15 +2740,14 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
goto illegal_op;
}
}
- if (decode.e.check & X86_CHECK_prot_or_vm86) {
- if (!PE(s)) {
- goto illegal_op;
- }
+ if ((decode.e.check & X86_CHECK_prot_or_vm86) && !PE(s)) {
+ goto illegal_op;
}
- if (decode.e.check & X86_CHECK_no_vm86) {
- if (VM86(s)) {
- goto illegal_op;
- }
+ if ((decode.e.check & X86_CHECK_no_vm86) && VM86(s)) {
+ goto illegal_op;
+ }
+ if ((decode.e.check & X86_CHECK_no_adr) && (s->prefix & PREFIX_ADR)) {
+ goto illegal_op;
}
}
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 02/18] target/i386/tcg: ignore V3 in 32-bit mode
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
2025-12-10 13:16 ` [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 15:52 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 03/18] target/i386/tcg: update cc_op after PUSHF Paolo Bonzini
` (15 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable
From the manual: "In 64-bit mode all 4 bits may be used. [...]
In 32-bit and 16-bit modes bit 6 must be 1 (if bit 6 is not 1, the
2-byte VEX version will generate LDS instruction and the 3-byte VEX
version will ignore this bit)."
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 0b85b0f6513..c9b4d5ffa32 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -2665,7 +2665,7 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
goto unknown_op;
}
}
- s->vex_v = (~vex3 >> 3) & 0xf;
+ s->vex_v = (~vex3 >> 3) & (CODE64(s) ? 15 : 7);
s->vex_l = (vex3 >> 2) & 1;
s->prefix |= pp_prefix[vex3 & 3] | PREFIX_VEX;
}
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 03/18] target/i386/tcg: update cc_op after PUSHF
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
2025-12-10 13:16 ` [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction Paolo Bonzini
2025-12-10 13:16 ` [PATCH 02/18] target/i386/tcg: ignore V3 in 32-bit mode Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 15:55 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 04/18] target/i386/tcg: mark more instructions that are invalid in 64-bit mode Paolo Bonzini
` (14 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
PUSHF needs to compute the full eflags, set the cc_op to
CC_OP_EFLAGS.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/emit.c.inc | 2 ++
1 file changed, 2 insertions(+)
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 1a7fab9333a..22e53f5b000 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -3250,6 +3250,8 @@ static void gen_PUSHF(DisasContext *s, X86DecodedInsn *decode)
gen_update_cc_op(s);
gen_helper_read_eflags(s->T0, tcg_env);
gen_push_v(s, s->T0);
+ decode->cc_src = s->T0;
+ decode->cc_op = CC_OP_EFLAGS;
}
static MemOp gen_shift_count(DisasContext *s, X86DecodedInsn *decode,
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 04/18] target/i386/tcg: mark more instructions that are invalid in 64-bit mode
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (2 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 03/18] target/i386/tcg: update cc_op after PUSHF Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 15:59 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 05/18] target/i386/tcg: do not compute all flags for SAHF Paolo Bonzini
` (13 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index c9b4d5ffa32..213dbb9637c 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1698,9 +1698,9 @@ static const X86OpEntry opcodes_root[256] = {
[0xD1] = X86_OP_GROUP1(group2, E,v),
[0xD2] = X86_OP_GROUP2(group2, E,b, 1,b), /* CL */
[0xD3] = X86_OP_GROUP2(group2, E,v, 1,b), /* CL */
- [0xD4] = X86_OP_ENTRY2(AAM, 0,w, I,b),
- [0xD5] = X86_OP_ENTRY2(AAD, 0,w, I,b),
- [0xD6] = X86_OP_ENTRYw(SALC, 0,b),
+ [0xD4] = X86_OP_ENTRY2(AAM, 0,w, I,b, chk(i64)),
+ [0xD5] = X86_OP_ENTRY2(AAD, 0,w, I,b, chk(i64)),
+ [0xD6] = X86_OP_ENTRYw(SALC, 0,b, chk(i64)),
[0xD7] = X86_OP_ENTRY1(XLAT, 0,b, zextT0), /* AL read/written */
[0xE0] = X86_OP_ENTRYr(LOOPNE, J,b), /* implicit: CX with aflag size */
@@ -1834,7 +1834,7 @@ static const X86OpEntry opcodes_root[256] = {
[0xCB] = X86_OP_ENTRY0(RETF),
[0xCC] = X86_OP_ENTRY0(INT3),
[0xCD] = X86_OP_ENTRYr(INT, I,b, chk(vm86_iopl)),
- [0xCE] = X86_OP_ENTRY0(INTO),
+ [0xCE] = X86_OP_ENTRY0(INTO, chk(i64)),
[0xCF] = X86_OP_ENTRY0(IRET, chk(vm86_iopl) svm(IRET)),
/*
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 05/18] target/i386/tcg: do not compute all flags for SAHF
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (3 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 04/18] target/i386/tcg: mark more instructions that are invalid in 64-bit mode Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:03 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 06/18] target/i386/tcg: remove do_decode_0F Paolo Bonzini
` (12 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Only OF is needed, the others are overwritten.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/emit.c.inc | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 22e53f5b000..131aefce53c 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -3778,7 +3778,7 @@ static void gen_SAHF(DisasContext *s, X86DecodedInsn *decode)
return gen_illegal_opcode(s);
}
tcg_gen_shri_tl(s->T0, cpu_regs[R_EAX], 8);
- gen_compute_eflags(s);
+ gen_neg_setcc(s, JCC_O << 1, cpu_cc_src);
tcg_gen_andi_tl(cpu_cc_src, cpu_cc_src, CC_O);
tcg_gen_andi_tl(s->T0, s->T0, CC_S | CC_Z | CC_A | CC_P | CC_C);
tcg_gen_or_tl(cpu_cc_src, cpu_cc_src, s->T0);
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 06/18] target/i386/tcg: remove do_decode_0F
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (4 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 05/18] target/i386/tcg: do not compute all flags for SAHF Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:03 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 07/18] target/i386/tcg: move and expand misplaced comment Paolo Bonzini
` (11 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
It is not needed anymore since all prefixes are handled by the
new decoder.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 7 +------
1 file changed, 1 insertion(+), 6 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 213dbb9637c..ea8e26f7f98 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1430,15 +1430,10 @@ static const X86OpEntry opcodes_0F[256] = {
[0xff] = X86_OP_ENTRYr(UD, nop,v), /* UD0 */
};
-static void do_decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
-{
- *entry = opcodes_0F[*b];
-}
-
static void decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
{
*b = x86_ldub_code(env, s);
- do_decode_0F(s, env, entry, b);
+ *entry = opcodes_0F[*b];
}
static void decode_63(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 07/18] target/i386/tcg: move and expand misplaced comment
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (5 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 06/18] target/i386/tcg: remove do_decode_0F Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:04 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 08/18] target/i386/tcg: simplify effective address calculation Paolo Bonzini
` (10 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index ea8e26f7f98..9d17bae7e75 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1878,16 +1878,11 @@ static const X86OpEntry opcodes_root[256] = {
#undef vex12
#undef vex13
-/*
- * Decode the fixed part of the opcode and place the last
- * in b.
- */
static void decode_root(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
{
*entry = opcodes_root[*b];
}
-
static int decode_modrm(DisasContext *s, CPUX86State *env,
X86DecodedInsn *decode, X86DecodedOp *op)
{
@@ -2222,6 +2217,10 @@ static bool decode_insn(DisasContext *s, CPUX86State *env, X86DecodeFunc decode_
{
X86OpEntry *e = &decode->e;
+ /*
+ * Each step decodes part of the opcode and place the last not-fully-decoded
+ * byte in decode->b. If the modrm byte is read, it is placed in s->modrm.
+ */
decode_func(s, env, e, &decode->b);
while (e->is_decode) {
e->is_decode = false;
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 08/18] target/i386/tcg: simplify effective address calculation
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (6 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 07/18] target/i386/tcg: move and expand misplaced comment Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:15 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 09/18] target/i386/tcg: unnest switch statements in disas_insn_x87 Paolo Bonzini
` (9 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Split gen_lea_v_seg_dest into three simple phases (extend from
16 bits, add, final extend), with optimization for known-zero bases
to avoid back-to-back extensions.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 64 ++++++++++++-------------------------
1 file changed, 20 insertions(+), 44 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 0cb87d02012..2ab3c2ac663 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -627,54 +627,30 @@ static TCGv eip_cur_tl(DisasContext *s)
static void gen_lea_v_seg_dest(DisasContext *s, MemOp aflag, TCGv dest, TCGv a0,
int def_seg, int ovr_seg)
{
- switch (aflag) {
-#ifdef TARGET_X86_64
- case MO_64:
- if (ovr_seg < 0) {
- tcg_gen_mov_tl(dest, a0);
- return;
+ int easize;
+ bool has_base;
+
+ if (ovr_seg < 0) {
+ ovr_seg = def_seg;
+ }
+
+ has_base = ovr_seg >= 0 && (ADDSEG(s) || ovr_seg >= R_FS);
+ easize = CODE64(s) ? MO_64 : MO_32;
+
+ if (has_base) {
+ if (aflag < easize) {
+ /* Truncate before summing base. */
+ tcg_gen_ext_tl(dest, a0, aflag);
+ a0 = dest;
}
- break;
-#endif
- case MO_32:
- /* 32 bit address */
- if (ovr_seg < 0 && ADDSEG(s)) {
- ovr_seg = def_seg;
- }
- if (ovr_seg < 0) {
- tcg_gen_ext32u_tl(dest, a0);
- return;
- }
- break;
- case MO_16:
- /* 16 bit address */
- tcg_gen_ext16u_tl(dest, a0);
+ tcg_gen_add_tl(dest, a0, cpu_seg_base[ovr_seg]);
a0 = dest;
- if (ovr_seg < 0) {
- if (ADDSEG(s)) {
- ovr_seg = def_seg;
- } else {
- return;
- }
- }
- break;
- default:
- g_assert_not_reached();
+ } else {
+ /* Possibly one extension, but that's it. */
+ easize = aflag;
}
- if (ovr_seg >= 0) {
- TCGv seg = cpu_seg_base[ovr_seg];
-
- if (aflag == MO_64) {
- tcg_gen_add_tl(dest, a0, seg);
- } else if (CODE64(s)) {
- tcg_gen_ext32u_tl(dest, a0);
- tcg_gen_add_tl(dest, dest, seg);
- } else {
- tcg_gen_add_tl(dest, a0, seg);
- tcg_gen_ext32u_tl(dest, dest);
- }
- }
+ tcg_gen_ext_tl(dest, a0, easize);
}
static void gen_lea_v_seg(DisasContext *s, TCGv a0,
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 09/18] target/i386/tcg: unnest switch statements in disas_insn_x87
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (7 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 08/18] target/i386/tcg: simplify effective address calculation Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:20 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 10/18] target/i386/tcg: move fcom/fcomp differentiation to gen_helper_fp_arith_ST0_FT0 Paolo Bonzini
` (8 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 290 +++++++++++++++++-------------------
1 file changed, 134 insertions(+), 156 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 2ab3c2ac663..c755329b3d9 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2457,36 +2457,32 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
switch (op) {
case 0x00 ... 0x07: /* fxxxs */
- case 0x10 ... 0x17: /* fixxxl */
- case 0x20 ... 0x27: /* fxxxl */
- case 0x30 ... 0x37: /* fixxx */
- {
- int op1;
- op1 = op & 7;
+ tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUL);
+ gen_helper_flds_FT0(tcg_env, s->tmp2_i32);
+ goto fp_arith_ST0_FT0;
- switch (op >> 4) {
- case 0:
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUL);
- gen_helper_flds_FT0(tcg_env, s->tmp2_i32);
- break;
- case 1:
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUL);
- gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
- break;
- case 2:
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
- s->mem_index, MO_LEUQ);
- gen_helper_fldl_FT0(tcg_env, s->tmp1_i64);
- break;
- case 3:
- default:
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LESW);
- gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
- break;
- }
+ case 0x10 ... 0x17: /* fixxxl */
+ tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUL);
+ gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
+ goto fp_arith_ST0_FT0;
+
+ case 0x20 ... 0x27: /* fxxxl */
+ tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
+ s->mem_index, MO_LEUQ);
+ gen_helper_fldl_FT0(tcg_env, s->tmp1_i64);
+ goto fp_arith_ST0_FT0;
+
+ case 0x30 ... 0x37: /* fixxx */
+ tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LESW);
+ gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
+ goto fp_arith_ST0_FT0;
+
+fp_arith_ST0_FT0:
+ {
+ int op1 = op & 7;
gen_helper_fp_arith_ST0_FT0(op1);
if (op1 == 3) {
@@ -2495,88 +2491,78 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
}
}
break;
+
case 0x08: /* flds */
- case 0x0a: /* fsts */
- case 0x0b: /* fstps */
- case 0x18 ... 0x1b: /* fildl, fisttpl, fistl, fistpl */
- case 0x28 ... 0x2b: /* fldl, fisttpll, fstl, fstpl */
- case 0x38 ... 0x3b: /* filds, fisttps, fists, fistps */
- switch (op & 7) {
- case 0:
- switch (op >> 4) {
- case 0:
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUL);
- gen_helper_flds_ST0(tcg_env, s->tmp2_i32);
- break;
- case 1:
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUL);
- gen_helper_fildl_ST0(tcg_env, s->tmp2_i32);
- break;
- case 2:
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
- s->mem_index, MO_LEUQ);
- gen_helper_fldl_ST0(tcg_env, s->tmp1_i64);
- break;
- case 3:
- default:
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LESW);
- gen_helper_fildl_ST0(tcg_env, s->tmp2_i32);
- break;
- }
- break;
- case 1:
- /* XXX: the corresponding CPUID bit must be tested ! */
- switch (op >> 4) {
- case 1:
- gen_helper_fisttl_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUL);
- break;
- case 2:
- gen_helper_fisttll_ST0(s->tmp1_i64, tcg_env);
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0,
- s->mem_index, MO_LEUQ);
- break;
- case 3:
- default:
- gen_helper_fistt_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUW);
- break;
- }
+ tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUL);
+ gen_helper_flds_ST0(tcg_env, s->tmp2_i32);
+ break;
+ case 0x18: /* fildl */
+ tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUL);
+ gen_helper_fildl_ST0(tcg_env, s->tmp2_i32);
+ break;
+ case 0x28: /* fldl */
+ tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
+ s->mem_index, MO_LEUQ);
+ gen_helper_fldl_ST0(tcg_env, s->tmp1_i64);
+ break;
+ case 0x38: /* filds */
+ tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LESW);
+ gen_helper_fildl_ST0(tcg_env, s->tmp2_i32);
+ break;
+
+ case 0x19: /* fisttpl */
+ gen_helper_fisttl_ST0(s->tmp2_i32, tcg_env);
+ tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUL);
+ gen_helper_fpop(tcg_env);
+ break;
+ case 0x29: /* fisttpll */
+ gen_helper_fisttll_ST0(s->tmp1_i64, tcg_env);
+ tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0,
+ s->mem_index, MO_LEUQ);
+ gen_helper_fpop(tcg_env);
+ break;
+ case 0x39: /* fisttps */
+ gen_helper_fistt_ST0(s->tmp2_i32, tcg_env);
+ tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUW);
+ gen_helper_fpop(tcg_env);
+ break;
+
+ case 0x0a: case 0x0b: /* fsts, fstps */
+ gen_helper_fsts_ST0(s->tmp2_i32, tcg_env);
+ tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUL);
+ if ((op & 7) == 3) {
+ gen_helper_fpop(tcg_env);
+ }
+ break;
+ case 0x1a: case 0x1b: /* fistl, fistpl */
+ gen_helper_fistl_ST0(s->tmp2_i32, tcg_env);
+ tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUL);
+ if ((op & 7) == 3) {
+ gen_helper_fpop(tcg_env);
+ }
+ break;
+ case 0x2a: case 0x2b: /* fstl, fstpl */
+ gen_helper_fstl_ST0(s->tmp1_i64, tcg_env);
+ tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0,
+ s->mem_index, MO_LEUQ);
+ if ((op & 7) == 3) {
+ gen_helper_fpop(tcg_env);
+ }
+ break;
+
+ case 0x3a: case 0x3b: /* fists, fistps */
+ gen_helper_fist_ST0(s->tmp2_i32, tcg_env);
+ tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ s->mem_index, MO_LEUW);
+ if ((op & 7) == 3) {
gen_helper_fpop(tcg_env);
- break;
- default:
- switch (op >> 4) {
- case 0:
- gen_helper_fsts_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUL);
- break;
- case 1:
- gen_helper_fistl_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUL);
- break;
- case 2:
- gen_helper_fstl_ST0(s->tmp1_i64, tcg_env);
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0,
- s->mem_index, MO_LEUQ);
- break;
- case 3:
- default:
- gen_helper_fist_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
- s->mem_index, MO_LEUW);
- break;
- }
- if ((op & 7) == 3) {
- gen_helper_fpop(tcg_env);
- }
- break;
}
break;
case 0x0c: /* fldenv mem */
@@ -2707,39 +2693,37 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
}
break;
case 0x0d: /* grp d9/5 */
- {
- switch (rm) {
- case 0:
- gen_helper_fpush(tcg_env);
- gen_helper_fld1_ST0(tcg_env);
- break;
- case 1:
- gen_helper_fpush(tcg_env);
- gen_helper_fldl2t_ST0(tcg_env);
- break;
- case 2:
- gen_helper_fpush(tcg_env);
- gen_helper_fldl2e_ST0(tcg_env);
- break;
- case 3:
- gen_helper_fpush(tcg_env);
- gen_helper_fldpi_ST0(tcg_env);
- break;
- case 4:
- gen_helper_fpush(tcg_env);
- gen_helper_fldlg2_ST0(tcg_env);
- break;
- case 5:
- gen_helper_fpush(tcg_env);
- gen_helper_fldln2_ST0(tcg_env);
- break;
- case 6:
- gen_helper_fpush(tcg_env);
- gen_helper_fldz_ST0(tcg_env);
- break;
- default:
- goto illegal_op;
- }
+ switch (rm) {
+ case 0:
+ gen_helper_fpush(tcg_env);
+ gen_helper_fld1_ST0(tcg_env);
+ break;
+ case 1:
+ gen_helper_fpush(tcg_env);
+ gen_helper_fldl2t_ST0(tcg_env);
+ break;
+ case 2:
+ gen_helper_fpush(tcg_env);
+ gen_helper_fldl2e_ST0(tcg_env);
+ break;
+ case 3:
+ gen_helper_fpush(tcg_env);
+ gen_helper_fldpi_ST0(tcg_env);
+ break;
+ case 4:
+ gen_helper_fpush(tcg_env);
+ gen_helper_fldlg2_ST0(tcg_env);
+ break;
+ case 5:
+ gen_helper_fpush(tcg_env);
+ gen_helper_fldln2_ST0(tcg_env);
+ break;
+ case 6:
+ gen_helper_fpush(tcg_env);
+ gen_helper_fldz_ST0(tcg_env);
+ break;
+ default:
+ goto illegal_op;
}
break;
case 0x0e: /* grp d9/6 */
@@ -2801,22 +2785,16 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
}
break;
case 0x00: case 0x01: case 0x04 ... 0x07: /* fxxx st, sti */
+ gen_helper_fmov_FT0_STN(tcg_env,
+ tcg_constant_i32(opreg));
+ gen_helper_fp_arith_ST0_FT0(op & 7);
+ break;
+
case 0x20: case 0x21: case 0x24 ... 0x27: /* fxxx sti, st */
case 0x30: case 0x31: case 0x34 ... 0x37: /* fxxxp sti, st */
- {
- int op1;
-
- op1 = op & 7;
- if (op >= 0x20) {
- gen_helper_fp_arith_STN_ST0(op1, opreg);
- if (op >= 0x30) {
- gen_helper_fpop(tcg_env);
- }
- } else {
- gen_helper_fmov_FT0_STN(tcg_env,
- tcg_constant_i32(opreg));
- gen_helper_fp_arith_ST0_FT0(op1);
- }
+ gen_helper_fp_arith_STN_ST0(op & 7, opreg);
+ if (op >= 0x30) {
+ gen_helper_fpop(tcg_env);
}
break;
case 0x02: /* fcom */
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 10/18] target/i386/tcg: move fcom/fcomp differentiation to gen_helper_fp_arith_ST0_FT0
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (8 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 09/18] target/i386/tcg: unnest switch statements in disas_insn_x87 Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:21 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 11/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for fcom STn and fcomp STn Paolo Bonzini
` (7 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
There is only one call site for gen_helper_fp_arith_ST0_FT0(), therefore
there is no need to check the op1 == 3 in the caller. Once this is done,
eliminate the goto to that call site.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 23 ++++++++---------------
1 file changed, 8 insertions(+), 15 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index c755329b3d9..3c55b62bdec 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1485,6 +1485,7 @@ static void gen_helper_fp_arith_ST0_FT0(int op)
break;
case 3:
gen_helper_fcom_ST0_FT0(tcg_env);
+ gen_helper_fpop(tcg_env);
break;
case 4:
gen_helper_fsub_ST0_FT0(tcg_env);
@@ -2460,36 +2461,28 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
s->mem_index, MO_LEUL);
gen_helper_flds_FT0(tcg_env, s->tmp2_i32);
- goto fp_arith_ST0_FT0;
+ gen_helper_fp_arith_ST0_FT0(op & 7);
+ break;
case 0x10 ... 0x17: /* fixxxl */
tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
s->mem_index, MO_LEUL);
gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
- goto fp_arith_ST0_FT0;
+ gen_helper_fp_arith_ST0_FT0(op & 7);
+ break;
case 0x20 ... 0x27: /* fxxxl */
tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
s->mem_index, MO_LEUQ);
gen_helper_fldl_FT0(tcg_env, s->tmp1_i64);
- goto fp_arith_ST0_FT0;
+ gen_helper_fp_arith_ST0_FT0(op & 7);
+ break;
case 0x30 ... 0x37: /* fixxx */
tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
s->mem_index, MO_LESW);
gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
- goto fp_arith_ST0_FT0;
-
-fp_arith_ST0_FT0:
- {
- int op1 = op & 7;
-
- gen_helper_fp_arith_ST0_FT0(op1);
- if (op1 == 3) {
- /* fcomp needs pop */
- gen_helper_fpop(tcg_env);
- }
- }
+ gen_helper_fp_arith_ST0_FT0(op & 7);
break;
case 0x08: /* flds */
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 11/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for fcom STn and fcomp STn
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (9 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 10/18] target/i386/tcg: move fcom/fcomp differentiation to gen_helper_fp_arith_ST0_FT0 Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:24 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 12/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for undocumented fcom/fcomp variants Paolo Bonzini
` (6 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Treat specially the undocumented ops, instead of treating specially the
two d8/0 opcodes that have undocumented variants: just call
gen_helper_fp_arith_ST0_FT0 for all opcodes in the d8/0 encoding.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 4 +---
1 file changed, 1 insertion(+), 3 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 3c55b62bdec..8f50071a4f4 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2777,7 +2777,7 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
break;
}
break;
- case 0x00: case 0x01: case 0x04 ... 0x07: /* fxxx st, sti */
+ case 0x00 ... 0x07: /* fxxx st, sti */
gen_helper_fmov_FT0_STN(tcg_env,
tcg_constant_i32(opreg));
gen_helper_fp_arith_ST0_FT0(op & 7);
@@ -2790,12 +2790,10 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
gen_helper_fpop(tcg_env);
}
break;
- case 0x02: /* fcom */
case 0x22: /* fcom2, undocumented op */
gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
gen_helper_fcom_ST0_FT0(tcg_env);
break;
- case 0x03: /* fcomp */
case 0x23: /* fcomp3, undocumented op */
case 0x32: /* fcomp5, undocumented op */
gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 12/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for undocumented fcom/fcomp variants
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (10 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 11/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for fcom STn and fcomp STn Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:26 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 13/18] target/i386/tcg: unify more pop/no-pop x87 instructions Paolo Bonzini
` (5 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
For 0x32 hack the op to be fcomp; for the others there isn't even anything special
to do.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 15 +++++----------
1 file changed, 5 insertions(+), 10 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 8f50071a4f4..f47bb5de8b3 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2777,7 +2777,12 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
break;
}
break;
+ case 0x32: /* fcomp5, undocumented op */
+ /* map to fcomp; op & 7 == 2 would not pop */
+ op = 0x03;
+ /* fallthrough */
case 0x00 ... 0x07: /* fxxx st, sti */
+ case 0x22 ... 0x23: /* fcom2 and fcomp3, undocumented ops */
gen_helper_fmov_FT0_STN(tcg_env,
tcg_constant_i32(opreg));
gen_helper_fp_arith_ST0_FT0(op & 7);
@@ -2790,16 +2795,6 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
gen_helper_fpop(tcg_env);
}
break;
- case 0x22: /* fcom2, undocumented op */
- gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
- gen_helper_fcom_ST0_FT0(tcg_env);
- break;
- case 0x23: /* fcomp3, undocumented op */
- case 0x32: /* fcomp5, undocumented op */
- gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
- gen_helper_fcom_ST0_FT0(tcg_env);
- gen_helper_fpop(tcg_env);
- break;
case 0x15: /* da/5 */
switch (rm) {
case 1: /* fucompp */
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 13/18] target/i386/tcg: unify more pop/no-pop x87 instructions
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (11 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 12/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for undocumented fcom/fcomp variants Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-10 13:16 ` [PATCH 14/18] target/i386/tcg: kill tmp1_i64 Paolo Bonzini
` (4 subsequent siblings)
17 siblings, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 49 ++++++++++++++-----------------------
1 file changed, 18 insertions(+), 31 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index f47bb5de8b3..8cd70456a51 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2828,44 +2828,55 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
}
break;
case 0x1d: /* fucomi */
+ case 0x3d: /* fucomip */
if (!(s->cpuid_features & CPUID_CMOV)) {
goto illegal_op;
}
gen_update_cc_op(s);
gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
gen_helper_fucomi_ST0_FT0(tcg_env);
+ if (op >= 0x30) {
+ gen_helper_fpop(tcg_env);
+ }
assume_cc_op(s, CC_OP_EFLAGS);
break;
case 0x1e: /* fcomi */
+ case 0x3e: /* fcomip */
if (!(s->cpuid_features & CPUID_CMOV)) {
goto illegal_op;
}
gen_update_cc_op(s);
gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
gen_helper_fcomi_ST0_FT0(tcg_env);
+ if (op >= 0x30) {
+ gen_helper_fpop(tcg_env);
+ }
assume_cc_op(s, CC_OP_EFLAGS);
break;
case 0x28: /* ffree sti */
+ case 0x38: /* ffreep sti, undocumented op */
gen_helper_ffree_STN(tcg_env, tcg_constant_i32(opreg));
+ if (op >= 0x30) {
+ gen_helper_fpop(tcg_env);
+ }
break;
case 0x2a: /* fst sti */
- gen_helper_fmov_STN_ST0(tcg_env, tcg_constant_i32(opreg));
- break;
case 0x2b: /* fstp sti */
case 0x0b: /* fstp1 sti, undocumented op */
case 0x3a: /* fstp8 sti, undocumented op */
case 0x3b: /* fstp9 sti, undocumented op */
gen_helper_fmov_STN_ST0(tcg_env, tcg_constant_i32(opreg));
- gen_helper_fpop(tcg_env);
+ if (op != 0x2a) {
+ gen_helper_fpop(tcg_env);
+ }
break;
case 0x2c: /* fucom st(i) */
- gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
- gen_helper_fucom_ST0_FT0(tcg_env);
- break;
case 0x2d: /* fucomp st(i) */
gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
gen_helper_fucom_ST0_FT0(tcg_env);
- gen_helper_fpop(tcg_env);
+ if (op == 0x2d) {
+ gen_helper_fpop(tcg_env);
+ }
break;
case 0x33: /* de/3 */
switch (rm) {
@@ -2879,10 +2890,6 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
goto illegal_op;
}
break;
- case 0x38: /* ffreep sti, undocumented op */
- gen_helper_ffree_STN(tcg_env, tcg_constant_i32(opreg));
- gen_helper_fpop(tcg_env);
- break;
case 0x3c: /* df/4 */
switch (rm) {
case 0:
@@ -2894,26 +2901,6 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
goto illegal_op;
}
break;
- case 0x3d: /* fucomip */
- if (!(s->cpuid_features & CPUID_CMOV)) {
- goto illegal_op;
- }
- gen_update_cc_op(s);
- gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
- gen_helper_fucomi_ST0_FT0(tcg_env);
- gen_helper_fpop(tcg_env);
- assume_cc_op(s, CC_OP_EFLAGS);
- break;
- case 0x3e: /* fcomip */
- if (!(s->cpuid_features & CPUID_CMOV)) {
- goto illegal_op;
- }
- gen_update_cc_op(s);
- gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
- gen_helper_fcomi_ST0_FT0(tcg_env);
- gen_helper_fpop(tcg_env);
- assume_cc_op(s, CC_OP_EFLAGS);
- break;
case 0x10 ... 0x13: /* fcmovxx */
case 0x18 ... 0x1b:
{
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 14/18] target/i386/tcg: kill tmp1_i64
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (12 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 13/18] target/i386/tcg: unify more pop/no-pop x87 instructions Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:28 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 15/18] target/i386/tcg: kill tmp2_i32 Paolo Bonzini
` (3 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 66 ++++++++++++++++++++--------------
target/i386/tcg/emit.c.inc | 72 ++++++++++++++++++++++---------------
2 files changed, 84 insertions(+), 54 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 8cd70456a51..108276f4008 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -136,7 +136,6 @@ typedef struct DisasContext {
/* TCG local register indexes (only used inside old micro ops) */
TCGv_i32 tmp2_i32;
- TCGv_i64 tmp1_i64;
sigjmp_buf jmpbuf;
TCGOp *prev_insn_start;
@@ -2365,14 +2364,18 @@ static void gen_jmp_rel_csize(DisasContext *s, int diff, int tb_num)
static inline void gen_ldq_env_A0(DisasContext *s, int offset)
{
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, s->mem_index, MO_LEUQ);
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, offset);
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_qemu_ld_i64(t, s->A0, s->mem_index, MO_LEUQ);
+ tcg_gen_st_i64(t, tcg_env, offset);
}
static inline void gen_stq_env_A0(DisasContext *s, int offset)
{
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, offset);
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0, s->mem_index, MO_LEUQ);
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_ld_i64(t, tcg_env, offset);
+ tcg_gen_qemu_st_i64(t, s->A0, s->mem_index, MO_LEUQ);
}
static inline void gen_ldo_env_A0(DisasContext *s, int offset, bool align)
@@ -2452,6 +2455,7 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
TCGv ea = gen_lea_modrm_1(s, decode->mem, false);
TCGv last_addr = tcg_temp_new();
bool update_fdp = true;
+ TCGv_i64 t64;
tcg_gen_mov_tl(last_addr, ea);
gen_lea_v_seg(s, ea, decode->mem.def_seg, s->override);
@@ -2472,9 +2476,10 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
break;
case 0x20 ... 0x27: /* fxxxl */
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
+ t64 = tcg_temp_new_i64();
+ tcg_gen_qemu_ld_i64(t64, s->A0,
s->mem_index, MO_LEUQ);
- gen_helper_fldl_FT0(tcg_env, s->tmp1_i64);
+ gen_helper_fldl_FT0(tcg_env, t64);
gen_helper_fp_arith_ST0_FT0(op & 7);
break;
@@ -2496,9 +2501,10 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
gen_helper_fildl_ST0(tcg_env, s->tmp2_i32);
break;
case 0x28: /* fldl */
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
+ t64 = tcg_temp_new_i64();
+ tcg_gen_qemu_ld_i64(t64, s->A0,
s->mem_index, MO_LEUQ);
- gen_helper_fldl_ST0(tcg_env, s->tmp1_i64);
+ gen_helper_fldl_ST0(tcg_env, t64);
break;
case 0x38: /* filds */
tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
@@ -2513,8 +2519,9 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
gen_helper_fpop(tcg_env);
break;
case 0x29: /* fisttpll */
- gen_helper_fisttll_ST0(s->tmp1_i64, tcg_env);
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0,
+ t64 = tcg_temp_new_i64();
+ gen_helper_fisttll_ST0(t64, tcg_env);
+ tcg_gen_qemu_st_i64(t64, s->A0,
s->mem_index, MO_LEUQ);
gen_helper_fpop(tcg_env);
break;
@@ -2542,8 +2549,9 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
}
break;
case 0x2a: case 0x2b: /* fstl, fstpl */
- gen_helper_fstl_ST0(s->tmp1_i64, tcg_env);
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0,
+ t64 = tcg_temp_new_i64();
+ gen_helper_fstl_ST0(t64, tcg_env);
+ tcg_gen_qemu_st_i64(t64, s->A0,
s->mem_index, MO_LEUQ);
if ((op & 7) == 3) {
gen_helper_fpop(tcg_env);
@@ -2611,13 +2619,15 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
gen_helper_fpop(tcg_env);
break;
case 0x3d: /* fildll */
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
+ t64 = tcg_temp_new_i64();
+ tcg_gen_qemu_ld_i64(t64, s->A0,
s->mem_index, MO_LEUQ);
- gen_helper_fildll_ST0(tcg_env, s->tmp1_i64);
+ gen_helper_fildll_ST0(tcg_env, t64);
break;
case 0x3f: /* fistpll */
- gen_helper_fistll_ST0(s->tmp1_i64, tcg_env);
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0,
+ t64 = tcg_temp_new_i64();
+ gen_helper_fistll_ST0(t64, tcg_env);
+ tcg_gen_qemu_st_i64(t64, s->A0,
s->mem_index, MO_LEUQ);
gen_helper_fpop(tcg_env);
break;
@@ -2951,6 +2961,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
int modrm = s->modrm;
MemOp ot;
int reg, rm, mod, op;
+ TCGv_i64 t64;
/* now check op code */
switch (b) {
@@ -3142,9 +3153,10 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
|| (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) {
goto illegal_op;
}
+ t64 = tcg_temp_new_i64();
tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_xgetbv(s->tmp1_i64, tcg_env, s->tmp2_i32);
- tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], s->tmp1_i64);
+ gen_helper_xgetbv(t64, tcg_env, s->tmp2_i32);
+ tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], t64);
break;
case 0xd1: /* xsetbv */
@@ -3156,10 +3168,11 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (!check_cpl0(s)) {
break;
}
- tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],
+ t64 = tcg_temp_new_i64();
+ tcg_gen_concat_tl_i64(t64, cpu_regs[R_EAX],
cpu_regs[R_EDX]);
tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_xsetbv(tcg_env, s->tmp2_i32, s->tmp1_i64);
+ gen_helper_xsetbv(tcg_env, s->tmp2_i32, t64);
/* End TB because translation flags may change. */
s->base.is_jmp = DISAS_EOB_NEXT;
break;
@@ -3319,18 +3332,20 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
+ t64 = tcg_temp_new_i64();
tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_rdpkru(s->tmp1_i64, tcg_env, s->tmp2_i32);
- tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], s->tmp1_i64);
+ gen_helper_rdpkru(t64, tcg_env, s->tmp2_i32);
+ tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], t64);
break;
case 0xef: /* wrpkru */
if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
- tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],
+ t64 = tcg_temp_new_i64();
+ tcg_gen_concat_tl_i64(t64, cpu_regs[R_EAX],
cpu_regs[R_EDX]);
tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_wrpkru(tcg_env, s->tmp2_i32, s->tmp1_i64);
+ gen_helper_wrpkru(tcg_env, s->tmp2_i32, t64);
break;
CASE_MODRM_OP(6): /* lmsw */
@@ -3722,7 +3737,6 @@ static void i386_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cpu)
dc->T1 = tcg_temp_new();
dc->A0 = tcg_temp_new();
- dc->tmp1_i64 = tcg_temp_new_i64();
dc->tmp2_i32 = tcg_temp_new_i32();
dc->cc_srcT = tcg_temp_new();
}
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 131aefce53c..8dac4d09da1 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -521,10 +521,12 @@ static void gen_3dnow(DisasContext *s, X86DecodedInsn *decode)
gen_helper_enter_mmx(tcg_env);
if (fn == FN_3DNOW_MOVE) {
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[1].offset);
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset);
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_ld_i64(t, tcg_env, decode->op[1].offset);
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset);
} else {
- fn(tcg_env, OP_PTR0, OP_PTR1);
+ fn(tcg_env, OP_PTR0, OP_PTR1);
}
}
@@ -2596,10 +2598,11 @@ static void gen_MOVQ(DisasContext *s, X86DecodedInsn *decode)
{
int vec_len = vector_len(s, decode);
int lo_ofs = vector_elem_offset(&decode->op[0], MO_64, 0);
+ TCGv_i64 t = tcg_temp_new_i64();
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[2].offset);
+ tcg_gen_ld_i64(t, tcg_env, decode->op[2].offset);
if (decode->op[0].has_ea) {
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0, s->mem_index, MO_LEUQ);
+ tcg_gen_qemu_st_i64(t, s->A0, s->mem_index, MO_LEUQ);
} else {
/*
* tcg_gen_gvec_dup_i64(MO_64, op0.offset, 8, vec_len, s->tmp1_64) would
@@ -2610,7 +2613,7 @@ static void gen_MOVQ(DisasContext *s, X86DecodedInsn *decode)
* it disqualifies using oprsz < maxsz to emulate VEX128.
*/
tcg_gen_gvec_dup_imm(MO_64, decode->op[0].offset, vec_len, vec_len, 0);
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, lo_ofs);
+ tcg_gen_st_i64(t, tcg_env, lo_ofs);
}
}
@@ -4505,10 +4508,12 @@ static void gen_VMASKMOVPS_st(DisasContext *s, X86DecodedInsn *decode)
static void gen_VMOVHPx_ld(DisasContext *s, X86DecodedInsn *decode)
{
+ TCGv_i64 t = tcg_temp_new_i64();
+
gen_ldq_env_A0(s, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1)));
if (decode->op[0].offset != decode->op[1].offset) {
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0)));
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_ld_i64(t, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
}
}
@@ -4519,33 +4524,39 @@ static void gen_VMOVHPx_st(DisasContext *s, X86DecodedInsn *decode)
static void gen_VMOVHPx(DisasContext *s, X86DecodedInsn *decode)
{
+ TCGv_i64 t = tcg_temp_new_i64();
+
if (decode->op[0].offset != decode->op[2].offset) {
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(1)));
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1)));
+ tcg_gen_ld_i64(t, tcg_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(1)));
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1)));
}
if (decode->op[0].offset != decode->op[1].offset) {
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0)));
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_ld_i64(t, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
}
}
static void gen_VMOVHLPS(DisasContext *s, X86DecodedInsn *decode)
{
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(1)));
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_ld_i64(t, tcg_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(1)));
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
if (decode->op[0].offset != decode->op[1].offset) {
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(1)));
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1)));
+ tcg_gen_ld_i64(t, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(1)));
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1)));
}
}
static void gen_VMOVLHPS(DisasContext *s, X86DecodedInsn *decode)
{
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[2].offset);
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1)));
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_ld_i64(t, tcg_env, decode->op[2].offset);
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(1)));
if (decode->op[0].offset != decode->op[1].offset) {
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0)));
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_ld_i64(t, tcg_env, decode->op[1].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
}
}
@@ -4557,34 +4568,39 @@ static void gen_VMOVLHPS(DisasContext *s, X86DecodedInsn *decode)
static void gen_VMOVLPx(DisasContext *s, X86DecodedInsn *decode)
{
int vec_len = vector_len(s, decode);
+ TCGv_i64 t = tcg_temp_new_i64();
- tcg_gen_ld_i64(s->tmp1_i64, tcg_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_ld_i64(t, tcg_env, decode->op[2].offset + offsetof(XMMReg, XMM_Q(0)));
tcg_gen_gvec_mov(MO_64, decode->op[0].offset, decode->op[1].offset, vec_len, vec_len);
- tcg_gen_st_i64(s->tmp1_i64, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
+ tcg_gen_st_i64(t, tcg_env, decode->op[0].offset + offsetof(XMMReg, XMM_Q(0)));
}
static void gen_VMOVLPx_ld(DisasContext *s, X86DecodedInsn *decode)
{
int vec_len = vector_len(s, decode);
+ TCGv_i64 t = tcg_temp_new_i64();
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, s->mem_index, MO_LEUQ);
+ tcg_gen_qemu_ld_i64(t, s->A0, s->mem_index, MO_LEUQ);
tcg_gen_gvec_mov(MO_64, decode->op[0].offset, decode->op[1].offset, vec_len, vec_len);
- tcg_gen_st_i64(s->tmp1_i64, OP_PTR0, offsetof(ZMMReg, ZMM_Q(0)));
+ tcg_gen_st_i64(t, OP_PTR0, offsetof(ZMMReg, ZMM_Q(0)));
}
static void gen_VMOVLPx_st(DisasContext *s, X86DecodedInsn *decode)
{
- tcg_gen_ld_i64(s->tmp1_i64, OP_PTR2, offsetof(ZMMReg, ZMM_Q(0)));
- tcg_gen_qemu_st_i64(s->tmp1_i64, s->A0, s->mem_index, MO_LEUQ);
+ TCGv_i64 t = tcg_temp_new_i64();
+
+ tcg_gen_ld_i64(t, OP_PTR2, offsetof(ZMMReg, ZMM_Q(0)));
+ tcg_gen_qemu_st_i64(t, s->A0, s->mem_index, MO_LEUQ);
}
static void gen_VMOVSD_ld(DisasContext *s, X86DecodedInsn *decode)
{
TCGv_i64 zero = tcg_constant_i64(0);
+ TCGv_i64 t = tcg_temp_new_i64();
- tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0, s->mem_index, MO_LEUQ);
+ tcg_gen_qemu_ld_i64(t, s->A0, s->mem_index, MO_LEUQ);
tcg_gen_st_i64(zero, OP_PTR0, offsetof(ZMMReg, ZMM_Q(1)));
- tcg_gen_st_i64(s->tmp1_i64, OP_PTR0, offsetof(ZMMReg, ZMM_Q(0)));
+ tcg_gen_st_i64(t, OP_PTR0, offsetof(ZMMReg, ZMM_Q(0)));
}
static void gen_VMOVSS(DisasContext *s, X86DecodedInsn *decode)
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 15/18] target/i386/tcg: kill tmp2_i32
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (13 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 14/18] target/i386/tcg: kill tmp1_i64 Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 16:29 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 16/18] target/i386/tcg: commonize code to compute SF/ZF/PF Paolo Bonzini
` (2 subsequent siblings)
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 121 +++++++++++++++++++++---------------
1 file changed, 71 insertions(+), 50 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 108276f4008..e91715af817 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -134,9 +134,6 @@ typedef struct DisasContext {
TCGv T0;
TCGv T1;
- /* TCG local register indexes (only used inside old micro ops) */
- TCGv_i32 tmp2_i32;
-
sigjmp_buf jmpbuf;
TCGOp *prev_insn_start;
TCGOp *prev_insn_end;
@@ -2455,6 +2452,7 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
TCGv ea = gen_lea_modrm_1(s, decode->mem, false);
TCGv last_addr = tcg_temp_new();
bool update_fdp = true;
+ TCGv_i32 t32;
TCGv_i64 t64;
tcg_gen_mov_tl(last_addr, ea);
@@ -2462,16 +2460,18 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
switch (op) {
case 0x00 ... 0x07: /* fxxxs */
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_qemu_ld_i32(t32, s->A0,
s->mem_index, MO_LEUL);
- gen_helper_flds_FT0(tcg_env, s->tmp2_i32);
+ gen_helper_flds_FT0(tcg_env, t32);
gen_helper_fp_arith_ST0_FT0(op & 7);
break;
case 0x10 ... 0x17: /* fixxxl */
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_qemu_ld_i32(t32, s->A0,
s->mem_index, MO_LEUL);
- gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
+ gen_helper_fildl_FT0(tcg_env, t32);
gen_helper_fp_arith_ST0_FT0(op & 7);
break;
@@ -2484,21 +2484,24 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
break;
case 0x30 ... 0x37: /* fixxx */
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_qemu_ld_i32(t32, s->A0,
s->mem_index, MO_LESW);
- gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
+ gen_helper_fildl_FT0(tcg_env, t32);
gen_helper_fp_arith_ST0_FT0(op & 7);
break;
case 0x08: /* flds */
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_qemu_ld_i32(t32, s->A0,
s->mem_index, MO_LEUL);
- gen_helper_flds_ST0(tcg_env, s->tmp2_i32);
+ gen_helper_flds_ST0(tcg_env, t32);
break;
case 0x18: /* fildl */
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_qemu_ld_i32(t32, s->A0,
s->mem_index, MO_LEUL);
- gen_helper_fildl_ST0(tcg_env, s->tmp2_i32);
+ gen_helper_fildl_ST0(tcg_env, t32);
break;
case 0x28: /* fldl */
t64 = tcg_temp_new_i64();
@@ -2507,14 +2510,16 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
gen_helper_fldl_ST0(tcg_env, t64);
break;
case 0x38: /* filds */
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_qemu_ld_i32(t32, s->A0,
s->mem_index, MO_LESW);
- gen_helper_fildl_ST0(tcg_env, s->tmp2_i32);
+ gen_helper_fildl_ST0(tcg_env, t32);
break;
case 0x19: /* fisttpl */
- gen_helper_fisttl_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ gen_helper_fisttl_ST0(t32, tcg_env);
+ tcg_gen_qemu_st_i32(t32, s->A0,
s->mem_index, MO_LEUL);
gen_helper_fpop(tcg_env);
break;
@@ -2526,23 +2531,26 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
gen_helper_fpop(tcg_env);
break;
case 0x39: /* fisttps */
- gen_helper_fistt_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ gen_helper_fistt_ST0(t32, tcg_env);
+ tcg_gen_qemu_st_i32(t32, s->A0,
s->mem_index, MO_LEUW);
gen_helper_fpop(tcg_env);
break;
case 0x0a: case 0x0b: /* fsts, fstps */
- gen_helper_fsts_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ gen_helper_fsts_ST0(t32, tcg_env);
+ tcg_gen_qemu_st_i32(t32, s->A0,
s->mem_index, MO_LEUL);
if ((op & 7) == 3) {
gen_helper_fpop(tcg_env);
}
break;
case 0x1a: case 0x1b: /* fistl, fistpl */
- gen_helper_fistl_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ gen_helper_fistl_ST0(t32, tcg_env);
+ tcg_gen_qemu_st_i32(t32, s->A0,
s->mem_index, MO_LEUL);
if ((op & 7) == 3) {
gen_helper_fpop(tcg_env);
@@ -2559,8 +2567,9 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
break;
case 0x3a: case 0x3b: /* fists, fistps */
- gen_helper_fist_ST0(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ gen_helper_fist_ST0(t32, tcg_env);
+ tcg_gen_qemu_st_i32(t32, s->A0,
s->mem_index, MO_LEUW);
if ((op & 7) == 3) {
gen_helper_fpop(tcg_env);
@@ -2572,9 +2581,10 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
update_fip = update_fdp = false;
break;
case 0x0d: /* fldcw mem */
- tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_qemu_ld_i32(t32, s->A0,
s->mem_index, MO_LEUW);
- gen_helper_fldcw(tcg_env, s->tmp2_i32);
+ gen_helper_fldcw(tcg_env, t32);
update_fip = update_fdp = false;
break;
case 0x0e: /* fnstenv mem */
@@ -2583,8 +2593,9 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
update_fip = update_fdp = false;
break;
case 0x0f: /* fnstcw mem */
- gen_helper_fnstcw(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ gen_helper_fnstcw(t32, tcg_env);
+ tcg_gen_qemu_st_i32(t32, s->A0,
s->mem_index, MO_LEUW);
update_fip = update_fdp = false;
break;
@@ -2606,8 +2617,9 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
update_fip = update_fdp = false;
break;
case 0x2f: /* fnstsw mem */
- gen_helper_fnstsw(s->tmp2_i32, tcg_env);
- tcg_gen_qemu_st_i32(s->tmp2_i32, s->A0,
+ t32 = tcg_temp_new_i32();
+ gen_helper_fnstsw(t32, tcg_env);
+ tcg_gen_qemu_st_i32(t32, s->A0,
s->mem_index, MO_LEUW);
update_fip = update_fdp = false;
break;
@@ -2638,10 +2650,11 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
if (update_fdp) {
int last_seg = s->override >= 0 ? s->override : decode->mem.def_seg;
- tcg_gen_ld_i32(s->tmp2_i32, tcg_env,
+ t32 = tcg_temp_new_i32();
+ tcg_gen_ld_i32(t32, tcg_env,
offsetof(CPUX86State,
segs[last_seg].selector));
- tcg_gen_st16_i32(s->tmp2_i32, tcg_env,
+ tcg_gen_st16_i32(t32, tcg_env,
offsetof(CPUX86State, fpds));
tcg_gen_st_tl(last_addr, tcg_env,
offsetof(CPUX86State, fpdp));
@@ -2903,8 +2916,9 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
case 0x3c: /* df/4 */
switch (rm) {
case 0:
- gen_helper_fnstsw(s->tmp2_i32, tcg_env);
- tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32);
+ TCGv_i32 t32 = tcg_temp_new_i32();
+ gen_helper_fnstsw(t32, tcg_env);
+ tcg_gen_extu_i32_tl(s->T0, t32);
gen_op_mov_reg_v(s, MO_16, R_EAX, s->T0);
break;
default:
@@ -2940,9 +2954,10 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
}
if (update_fip) {
- tcg_gen_ld_i32(s->tmp2_i32, tcg_env,
+ TCGv_i32 t32 = tcg_temp_new_i32();
+ tcg_gen_ld_i32(t32, tcg_env,
offsetof(CPUX86State, segs[R_CS].selector));
- tcg_gen_st16_i32(s->tmp2_i32, tcg_env,
+ tcg_gen_st16_i32(t32, tcg_env,
offsetof(CPUX86State, fpcs));
tcg_gen_st_tl(eip_cur_tl(s),
tcg_env, offsetof(CPUX86State, fpip));
@@ -2961,6 +2976,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
int modrm = s->modrm;
MemOp ot;
int reg, rm, mod, op;
+ TCGv_i32 t32;
TCGv_i64 t64;
/* now check op code */
@@ -3027,10 +3043,11 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (!PE(s) || VM86(s))
goto illegal_op;
if (check_cpl0(s)) {
+ t32 = tcg_temp_new_i32();
gen_svm_check_intercept(s, SVM_EXIT_LDTR_WRITE);
gen_ld_modrm(s, decode, MO_16);
- tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
- gen_helper_lldt(tcg_env, s->tmp2_i32);
+ tcg_gen_trunc_tl_i32(t32, s->T0);
+ gen_helper_lldt(tcg_env, t32);
}
break;
case 1: /* str */
@@ -3049,10 +3066,11 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (!PE(s) || VM86(s))
goto illegal_op;
if (check_cpl0(s)) {
+ t32 = tcg_temp_new_i32();
gen_svm_check_intercept(s, SVM_EXIT_TR_WRITE);
gen_ld_modrm(s, decode, MO_16);
- tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
- gen_helper_ltr(tcg_env, s->tmp2_i32);
+ tcg_gen_trunc_tl_i32(t32, s->T0);
+ gen_helper_ltr(tcg_env, t32);
}
break;
case 4: /* verr */
@@ -3153,9 +3171,10 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
|| (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) {
goto illegal_op;
}
+ t32 = tcg_temp_new_i32();
t64 = tcg_temp_new_i64();
- tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_xgetbv(t64, tcg_env, s->tmp2_i32);
+ tcg_gen_trunc_tl_i32(t32, cpu_regs[R_ECX]);
+ gen_helper_xgetbv(t64, tcg_env, t32);
tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], t64);
break;
@@ -3168,11 +3187,12 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (!check_cpl0(s)) {
break;
}
+ t32 = tcg_temp_new_i32();
t64 = tcg_temp_new_i64();
tcg_gen_concat_tl_i64(t64, cpu_regs[R_EAX],
cpu_regs[R_EDX]);
- tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_xsetbv(tcg_env, s->tmp2_i32, t64);
+ tcg_gen_trunc_tl_i32(t32, cpu_regs[R_ECX]);
+ gen_helper_xsetbv(tcg_env, t32, t64);
/* End TB because translation flags may change. */
s->base.is_jmp = DISAS_EOB_NEXT;
break;
@@ -3332,20 +3352,22 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
+ t32 = tcg_temp_new_i32();
t64 = tcg_temp_new_i64();
- tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_rdpkru(t64, tcg_env, s->tmp2_i32);
+ tcg_gen_trunc_tl_i32(t32, cpu_regs[R_ECX]);
+ gen_helper_rdpkru(t64, tcg_env, t32);
tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], t64);
break;
case 0xef: /* wrpkru */
if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
+ t32 = tcg_temp_new_i32();
t64 = tcg_temp_new_i64();
tcg_gen_concat_tl_i64(t64, cpu_regs[R_EAX],
cpu_regs[R_EDX]);
- tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
- gen_helper_wrpkru(tcg_env, s->tmp2_i32, t64);
+ tcg_gen_trunc_tl_i32(t32, cpu_regs[R_ECX]);
+ gen_helper_wrpkru(tcg_env, t32, t64);
break;
CASE_MODRM_OP(6): /* lmsw */
@@ -3737,7 +3759,6 @@ static void i386_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cpu)
dc->T1 = tcg_temp_new();
dc->A0 = tcg_temp_new();
- dc->tmp2_i32 = tcg_temp_new_i32();
dc->cc_srcT = tcg_temp_new();
}
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 16/18] target/i386/tcg: commonize code to compute SF/ZF/PF
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (14 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 15/18] target/i386/tcg: kill tmp2_i32 Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 18:46 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 17/18] target/i386/tcg: add a CCOp for SBB x,x Paolo Bonzini
2025-12-10 13:16 ` [PATCH 18/18] target/i386/tcg: move fetch code out of translate.c Paolo Bonzini
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
PF/ZF/SF are computed the same way for almost all CC_OP values (depending
only on the operand size in the case of ZF and SF). The only exception is
PF for CC_OP_BLSI* and CC_OP_BMILG*; but AMD documents that PF should
be computed normally (rather than being undefined) so that is a kind of
bug fix.
Put the common code at the end of helper_cc_compute_all, shaving
another kB from its text.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.h | 4 +-
target/i386/tcg/cc_helper_template.h.inc | 112 +++------
target/i386/tcg/cc_helper.c | 274 +++++++++++++++--------
3 files changed, 209 insertions(+), 181 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index cee1f692a1c..ecca38ed0b5 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1495,12 +1495,12 @@ typedef enum {
CC_OP_SARL,
CC_OP_SARQ,
- CC_OP_BMILGB, /* Z,S via CC_DST, C = SRC==0; O=0; P,A undefined */
+ CC_OP_BMILGB, /* P,Z,S via CC_DST, C = SRC==0; A=O=0 */
CC_OP_BMILGW,
CC_OP_BMILGL,
CC_OP_BMILGQ,
- CC_OP_BLSIB, /* Z,S via CC_DST, C = SRC!=0; O=0; P,A undefined */
+ CC_OP_BLSIB, /* P,Z,S via CC_DST, C = SRC!=0; A=O=0 */
CC_OP_BLSIW,
CC_OP_BLSIL,
CC_OP_BLSIQ,
diff --git a/target/i386/tcg/cc_helper_template.h.inc b/target/i386/tcg/cc_helper_template.h.inc
index d8fd976ca15..af58c2409f7 100644
--- a/target/i386/tcg/cc_helper_template.h.inc
+++ b/target/i386/tcg/cc_helper_template.h.inc
@@ -1,5 +1,5 @@
/*
- * x86 condition code helpers
+ * x86 condition code helpers for AF/CF/OF
*
* Copyright (c) 2008 Fabrice Bellard
*
@@ -44,14 +44,9 @@
/* dynamic flags computation */
-static uint32_t glue(compute_all_cout, SUFFIX)(DATA_TYPE dst, DATA_TYPE carries)
+static uint32_t glue(compute_aco_cout, SUFFIX)(DATA_TYPE carries)
{
- uint32_t af_cf, pf, zf, sf, of;
-
- /* PF, ZF, SF computed from result. */
- pf = compute_pf(dst);
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
+ uint32_t af_cf, of;
/*
* AF, CF, OF computed from carry out vector. To compute AF and CF, rotate it
@@ -62,14 +57,14 @@ static uint32_t glue(compute_all_cout, SUFFIX)(DATA_TYPE dst, DATA_TYPE carries)
*/
af_cf = ((carries << 1) | (carries >> (DATA_BITS - 1))) & (CC_A | CC_C);
of = (lshift(carries, 12 - DATA_BITS) + CC_O / 2) & CC_O;
- return pf + zf + sf + af_cf + of;
+ return af_cf + of;
}
-static uint32_t glue(compute_all_add, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
+static uint32_t glue(compute_aco_add, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
{
DATA_TYPE src2 = dst - src1;
DATA_TYPE carries = ADD_COUT_VEC(src1, src2, dst);
- return glue(compute_all_cout, SUFFIX)(dst, carries);
+ return glue(compute_aco_cout, SUFFIX)(carries);
}
static int glue(compute_c_add, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
@@ -77,12 +72,12 @@ static int glue(compute_c_add, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
return dst < src1;
}
-static uint32_t glue(compute_all_adc, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1,
+static uint32_t glue(compute_aco_adc, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1,
DATA_TYPE src3)
{
DATA_TYPE src2 = dst - src1 - src3;
DATA_TYPE carries = ADD_COUT_VEC(src1, src2, dst);
- return glue(compute_all_cout, SUFFIX)(dst, carries);
+ return glue(compute_aco_cout, SUFFIX)(carries);
}
static int glue(compute_c_adc, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1,
@@ -97,11 +92,11 @@ static int glue(compute_c_adc, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1,
#endif
}
-static uint32_t glue(compute_all_sub, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2)
+static uint32_t glue(compute_aco_sub, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2)
{
DATA_TYPE src1 = dst + src2;
DATA_TYPE carries = SUB_COUT_VEC(src1, src2, dst);
- return glue(compute_all_cout, SUFFIX)(dst, carries);
+ return glue(compute_aco_cout, SUFFIX)(carries);
}
static int glue(compute_c_sub, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2)
@@ -111,12 +106,12 @@ static int glue(compute_c_sub, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2)
return src1 < src2;
}
-static uint32_t glue(compute_all_sbb, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2,
+static uint32_t glue(compute_aco_sbb, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2,
DATA_TYPE src3)
{
DATA_TYPE src1 = dst + src2 + src3;
DATA_TYPE carries = SUB_COUT_VEC(src1, src2, dst);
- return glue(compute_all_cout, SUFFIX)(dst, carries);
+ return glue(compute_aco_cout, SUFFIX)(carries);
}
static int glue(compute_c_sbb, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2,
@@ -134,57 +129,35 @@ static int glue(compute_c_sbb, SUFFIX)(DATA_TYPE dst, DATA_TYPE src2,
#endif
}
-static uint32_t glue(compute_all_logic, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
+static uint32_t glue(compute_aco_inc, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
{
- uint32_t cf, pf, af, zf, sf, of;
-
- cf = 0;
- pf = compute_pf(dst);
- af = 0;
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
- of = 0;
- return cf + pf + af + zf + sf + of;
-}
-
-static uint32_t glue(compute_all_inc, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
-{
- uint32_t cf, pf, af, zf, sf, of;
+ uint32_t cf, af, of;
cf = src1;
- pf = compute_pf(dst);
af = (dst ^ (dst - 1)) & CC_A; /* bits 0..3 are all clear */
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
of = (dst == SIGN_MASK) * CC_O;
- return cf + pf + af + zf + sf + of;
+ return cf + af + of;
}
-static uint32_t glue(compute_all_dec, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
+static uint32_t glue(compute_aco_dec, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
{
- uint32_t cf, pf, af, zf, sf, of;
+ uint32_t cf, af, of;
cf = src1;
- pf = compute_pf(dst);
af = (dst ^ (dst + 1)) & CC_A; /* bits 0..3 are all set */
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
of = (dst == SIGN_MASK - 1) * CC_O;
- return cf + pf + af + zf + sf + of;
+ return cf + af + of;
}
-static uint32_t glue(compute_all_shl, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
+static uint32_t glue(compute_aco_shl, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
{
- uint32_t cf, pf, af, zf, sf, of;
+ uint32_t cf, af, of;
cf = (src1 >> (DATA_BITS - 1)) & CC_C;
- pf = compute_pf(dst);
af = 0; /* undefined */
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
/* of is defined iff shift count == 1 */
of = lshift(src1 ^ dst, 12 - DATA_BITS) & CC_O;
- return cf + pf + af + zf + sf + of;
+ return cf + af + of;
}
static int glue(compute_c_shl, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
@@ -192,47 +165,25 @@ static int glue(compute_c_shl, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
return (src1 >> (DATA_BITS - 1)) & CC_C;
}
-static uint32_t glue(compute_all_sar, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
+static uint32_t glue(compute_aco_sar, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
{
- uint32_t cf, pf, af, zf, sf, of;
+ uint32_t cf, af, of;
cf = src1 & 1;
- pf = compute_pf(dst);
af = 0; /* undefined */
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
/* of is defined iff shift count == 1 */
of = lshift(src1 ^ dst, 12 - DATA_BITS) & CC_O;
- return cf + pf + af + zf + sf + of;
+ return cf + af + of;
}
-/* NOTE: we compute the flags like the P4. On olders CPUs, only OF and
- CF are modified and it is slower to do that. Note as well that we
- don't truncate SRC1 for computing carry to DATA_TYPE. */
-static uint32_t glue(compute_all_mul, SUFFIX)(DATA_TYPE dst, target_long src1)
+static uint32_t glue(compute_aco_bmilg, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
{
- uint32_t cf, pf, af, zf, sf, of;
-
- cf = (src1 != 0);
- pf = compute_pf(dst);
- af = 0; /* undefined */
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
- of = cf * CC_O;
- return cf + pf + af + zf + sf + of;
-}
-
-static uint32_t glue(compute_all_bmilg, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
-{
- uint32_t cf, pf, af, zf, sf, of;
+ uint32_t cf, af, of;
cf = (src1 == 0);
- pf = 0; /* undefined */
af = 0; /* undefined */
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
of = 0;
- return cf + pf + af + zf + sf + of;
+ return cf + af + of;
}
static int glue(compute_c_bmilg, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
@@ -240,17 +191,14 @@ static int glue(compute_c_bmilg, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
return src1 == 0;
}
-static int glue(compute_all_blsi, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
+static int glue(compute_aco_blsi, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
{
- uint32_t cf, pf, af, zf, sf, of;
+ uint32_t cf, af, of;
cf = (src1 != 0);
- pf = 0; /* undefined */
af = 0; /* undefined */
- zf = (dst == 0) * CC_Z;
- sf = lshift(dst, 8 - DATA_BITS) & CC_S;
of = 0;
- return cf + pf + af + zf + sf + of;
+ return cf + af + of;
}
static int glue(compute_c_blsi, SUFFIX)(DATA_TYPE dst, DATA_TYPE src1)
diff --git a/target/i386/tcg/cc_helper.c b/target/i386/tcg/cc_helper.c
index f1940b40927..2c4170b5b77 100644
--- a/target/i386/tcg/cc_helper.c
+++ b/target/i386/tcg/cc_helper.c
@@ -73,9 +73,25 @@ target_ulong helper_cc_compute_nz(target_ulong dst, target_ulong src1,
}
}
+/* NOTE: we compute the flags like the P4. On olders CPUs, only OF and
+ CF are modified and it is slower to do that. Note as well that we
+ don't truncate SRC1 for computing carry to DATA_TYPE. */
+static inline uint32_t compute_aco_mul(target_long src1)
+{
+ uint32_t cf, af, of;
+
+ cf = (src1 != 0);
+ af = 0; /* undefined */
+ of = cf * CC_O;
+ return cf + af + of;
+}
+
target_ulong helper_cc_compute_all(target_ulong dst, target_ulong src1,
target_ulong src2, int op)
{
+ uint32_t flags = 0;
+ int shift = 0;
+
switch (op) {
default: /* should never happen */
return 0;
@@ -85,90 +101,6 @@ target_ulong helper_cc_compute_all(target_ulong dst, target_ulong src1,
case CC_OP_POPCNT:
return dst ? 0 : CC_Z;
- case CC_OP_MULB:
- return compute_all_mulb(dst, src1);
- case CC_OP_MULW:
- return compute_all_mulw(dst, src1);
- case CC_OP_MULL:
- return compute_all_mull(dst, src1);
-
- case CC_OP_ADDB:
- return compute_all_addb(dst, src1);
- case CC_OP_ADDW:
- return compute_all_addw(dst, src1);
- case CC_OP_ADDL:
- return compute_all_addl(dst, src1);
-
- case CC_OP_ADCB:
- return compute_all_adcb(dst, src1, src2);
- case CC_OP_ADCW:
- return compute_all_adcw(dst, src1, src2);
- case CC_OP_ADCL:
- return compute_all_adcl(dst, src1, src2);
-
- case CC_OP_SUBB:
- return compute_all_subb(dst, src1);
- case CC_OP_SUBW:
- return compute_all_subw(dst, src1);
- case CC_OP_SUBL:
- return compute_all_subl(dst, src1);
-
- case CC_OP_SBBB:
- return compute_all_sbbb(dst, src1, src2);
- case CC_OP_SBBW:
- return compute_all_sbbw(dst, src1, src2);
- case CC_OP_SBBL:
- return compute_all_sbbl(dst, src1, src2);
-
- case CC_OP_LOGICB:
- return compute_all_logicb(dst, src1);
- case CC_OP_LOGICW:
- return compute_all_logicw(dst, src1);
- case CC_OP_LOGICL:
- return compute_all_logicl(dst, src1);
-
- case CC_OP_INCB:
- return compute_all_incb(dst, src1);
- case CC_OP_INCW:
- return compute_all_incw(dst, src1);
- case CC_OP_INCL:
- return compute_all_incl(dst, src1);
-
- case CC_OP_DECB:
- return compute_all_decb(dst, src1);
- case CC_OP_DECW:
- return compute_all_decw(dst, src1);
- case CC_OP_DECL:
- return compute_all_decl(dst, src1);
-
- case CC_OP_SHLB:
- return compute_all_shlb(dst, src1);
- case CC_OP_SHLW:
- return compute_all_shlw(dst, src1);
- case CC_OP_SHLL:
- return compute_all_shll(dst, src1);
-
- case CC_OP_SARB:
- return compute_all_sarb(dst, src1);
- case CC_OP_SARW:
- return compute_all_sarw(dst, src1);
- case CC_OP_SARL:
- return compute_all_sarl(dst, src1);
-
- case CC_OP_BMILGB:
- return compute_all_bmilgb(dst, src1);
- case CC_OP_BMILGW:
- return compute_all_bmilgw(dst, src1);
- case CC_OP_BMILGL:
- return compute_all_bmilgl(dst, src1);
-
- case CC_OP_BLSIB:
- return compute_all_blsib(dst, src1);
- case CC_OP_BLSIW:
- return compute_all_blsiw(dst, src1);
- case CC_OP_BLSIL:
- return compute_all_blsil(dst, src1);
-
case CC_OP_ADCX:
return compute_all_adcx(dst, src1, src2);
case CC_OP_ADOX:
@@ -176,33 +108,181 @@ target_ulong helper_cc_compute_all(target_ulong dst, target_ulong src1,
case CC_OP_ADCOX:
return compute_all_adcox(dst, src1, src2);
+ case CC_OP_MULB:
+ flags = compute_aco_mul(src1);
+ goto psz_b;
+ case CC_OP_MULW:
+ flags = compute_aco_mul(src1);
+ goto psz_w;
+ case CC_OP_MULL:
+ flags = compute_aco_mul(src1);
+ goto psz_l;
+
+ case CC_OP_ADDB:
+ flags = compute_aco_addb(dst, src1);
+ goto psz_b;
+ case CC_OP_ADDW:
+ flags = compute_aco_addw(dst, src1);
+ goto psz_w;
+ case CC_OP_ADDL:
+ flags = compute_aco_addl(dst, src1);
+ goto psz_l;
+
+ case CC_OP_ADCB:
+ flags = compute_aco_adcb(dst, src1, src2);
+ goto psz_b;
+ case CC_OP_ADCW:
+ flags = compute_aco_adcw(dst, src1, src2);
+ goto psz_w;
+ case CC_OP_ADCL:
+ flags = compute_aco_adcl(dst, src1, src2);
+ goto psz_l;
+
+ case CC_OP_SUBB:
+ flags = compute_aco_subb(dst, src1);
+ goto psz_b;
+ case CC_OP_SUBW:
+ flags = compute_aco_subw(dst, src1);
+ goto psz_w;
+ case CC_OP_SUBL:
+ flags = compute_aco_subl(dst, src1);
+ goto psz_l;
+
+ case CC_OP_SBBB:
+ flags = compute_aco_sbbb(dst, src1, src2);
+ goto psz_b;
+ case CC_OP_SBBW:
+ flags = compute_aco_sbbw(dst, src1, src2);
+ goto psz_w;
+ case CC_OP_SBBL:
+ flags = compute_aco_sbbl(dst, src1, src2);
+ goto psz_l;
+
+ case CC_OP_LOGICB:
+ flags = 0;
+ goto psz_b;
+ case CC_OP_LOGICW:
+ flags = 0;
+ goto psz_w;
+ case CC_OP_LOGICL:
+ flags = 0;
+ goto psz_l;
+
+ case CC_OP_INCB:
+ flags = compute_aco_incb(dst, src1);
+ goto psz_b;
+ case CC_OP_INCW:
+ flags = compute_aco_incw(dst, src1);
+ goto psz_w;
+ case CC_OP_INCL:
+ flags = compute_aco_incl(dst, src1);
+ goto psz_l;
+
+ case CC_OP_DECB:
+ flags = compute_aco_decb(dst, src1);
+ goto psz_b;
+ case CC_OP_DECW:
+ flags = compute_aco_decw(dst, src1);
+ goto psz_w;
+ case CC_OP_DECL:
+ flags = compute_aco_decl(dst, src1);
+ goto psz_l;
+
+ case CC_OP_SHLB:
+ flags = compute_aco_shlb(dst, src1);
+ goto psz_b;
+ case CC_OP_SHLW:
+ flags = compute_aco_shlw(dst, src1);
+ goto psz_w;
+ case CC_OP_SHLL:
+ flags = compute_aco_shll(dst, src1);
+ goto psz_l;
+
+ case CC_OP_SARB:
+ flags = compute_aco_sarb(dst, src1);
+ goto psz_b;
+ case CC_OP_SARW:
+ flags = compute_aco_sarw(dst, src1);
+ goto psz_w;
+ case CC_OP_SARL:
+ flags = compute_aco_sarl(dst, src1);
+ goto psz_l;
+
+ case CC_OP_BMILGB:
+ flags = compute_aco_bmilgb(dst, src1);
+ goto psz_b;
+ case CC_OP_BMILGW:
+ flags = compute_aco_bmilgw(dst, src1);
+ goto psz_w;
+ case CC_OP_BMILGL:
+ flags = compute_aco_bmilgl(dst, src1);
+ goto psz_l;
+
+ case CC_OP_BLSIB:
+ flags = compute_aco_blsib(dst, src1);
+ goto psz_b;
+ case CC_OP_BLSIW:
+ flags = compute_aco_blsiw(dst, src1);
+ goto psz_w;
+ case CC_OP_BLSIL:
+ flags = compute_aco_blsil(dst, src1);
+ goto psz_l;
+
#ifdef TARGET_X86_64
case CC_OP_MULQ:
- return compute_all_mulq(dst, src1);
+ flags = compute_aco_mul(src1);
+ goto psz_q;
case CC_OP_ADDQ:
- return compute_all_addq(dst, src1);
+ flags = compute_aco_addq(dst, src1);
+ goto psz_q;
case CC_OP_ADCQ:
- return compute_all_adcq(dst, src1, src2);
+ flags = compute_aco_adcq(dst, src1, src2);
+ goto psz_q;
case CC_OP_SUBQ:
- return compute_all_subq(dst, src1);
+ flags = compute_aco_subq(dst, src1);
+ goto psz_q;
case CC_OP_SBBQ:
- return compute_all_sbbq(dst, src1, src2);
- case CC_OP_LOGICQ:
- return compute_all_logicq(dst, src1);
+ flags = compute_aco_sbbq(dst, src1, src2);
+ goto psz_q;
case CC_OP_INCQ:
- return compute_all_incq(dst, src1);
+ flags = compute_aco_incq(dst, src1);
+ goto psz_q;
case CC_OP_DECQ:
- return compute_all_decq(dst, src1);
+ flags = compute_aco_decq(dst, src1);
+ goto psz_q;
+ case CC_OP_LOGICQ:
+ flags = 0;
+ goto psz_q;
case CC_OP_SHLQ:
- return compute_all_shlq(dst, src1);
+ flags = compute_aco_shlq(dst, src1);
+ goto psz_q;
case CC_OP_SARQ:
- return compute_all_sarq(dst, src1);
+ flags = compute_aco_sarq(dst, src1);
+ goto psz_q;
case CC_OP_BMILGQ:
- return compute_all_bmilgq(dst, src1);
+ flags = compute_aco_bmilgq(dst, src1);
+ goto psz_q;
case CC_OP_BLSIQ:
- return compute_all_blsiq(dst, src1);
+ flags = compute_aco_blsiq(dst, src1);
+ goto psz_q;
#endif
}
+
+psz_b:
+ shift += 8;
+psz_w:
+ shift += 16;
+psz_l:
+#ifdef TARGET_X86_64
+ shift += 32;
+psz_q:
+#endif
+
+ flags += compute_pf(dst);
+ dst <<= shift;
+ flags += dst == 0 ? CC_Z : 0;
+ flags += (target_long)dst < 0 ? CC_S : 0;
+ return flags;
}
uint32_t cpu_cc_compute_all(CPUX86State *env)
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 17/18] target/i386/tcg: add a CCOp for SBB x,x
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (15 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 16/18] target/i386/tcg: commonize code to compute SF/ZF/PF Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 19:11 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 18/18] target/i386/tcg: move fetch code out of translate.c Paolo Bonzini
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
This is more efficient both when generating code and when testing
flags.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.h | 13 ++++++++++++-
target/i386/cpu-dump.c | 2 ++
target/i386/tcg/cc_helper.c | 6 ++++++
target/i386/tcg/translate.c | 13 +++++++++++++
target/i386/tcg/emit.c.inc | 33 ++++++---------------------------
5 files changed, 39 insertions(+), 28 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index ecca38ed0b5..314e773a5d4 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1515,7 +1515,18 @@ typedef enum {
CC_OP_POPCNTL__,
CC_OP_POPCNTQ__,
CC_OP_POPCNT = sizeof(target_ulong) == 8 ? CC_OP_POPCNTQ__ : CC_OP_POPCNTL__,
-#define CC_OP_LAST_BWLQ CC_OP_POPCNTQ__
+
+ /*
+ * Note that only CC_OP_SBB_SELF (i.e. the one with MO_TL size)
+ * is used or implemented, because the translation produces a
+ * sign-extended CC_DST.
+ */
+ CC_OP_SBB_SELFB__, /* S/Z/C/A via CC_DST, O clear, P set. */
+ CC_OP_SBB_SELFW__,
+ CC_OP_SBB_SELFL__,
+ CC_OP_SBB_SELFQ__,
+ CC_OP_SBB_SELF = sizeof(target_ulong) == 8 ? CC_OP_SBB_SELFQ__ : CC_OP_SBB_SELFL__,
+#define CC_OP_LAST_BWLQ CC_OP_SBB_SELFQ__
CC_OP_DYNAMIC, /* must use dynamic code to get cc_op */
} CCOp;
diff --git a/target/i386/cpu-dump.c b/target/i386/cpu-dump.c
index 67bf31e0caa..20a3002f013 100644
--- a/target/i386/cpu-dump.c
+++ b/target/i386/cpu-dump.c
@@ -91,6 +91,8 @@ static const char * const cc_op_str[] = {
[CC_OP_BMILGQ] = "BMILGQ",
[CC_OP_POPCNT] = "POPCNT",
+
+ [CC_OP_SBB_SELF] = "SBBx,x",
};
static void
diff --git a/target/i386/tcg/cc_helper.c b/target/i386/tcg/cc_helper.c
index 2c4170b5b77..91e492196af 100644
--- a/target/i386/tcg/cc_helper.c
+++ b/target/i386/tcg/cc_helper.c
@@ -100,6 +100,9 @@ target_ulong helper_cc_compute_all(target_ulong dst, target_ulong src1,
return src1;
case CC_OP_POPCNT:
return dst ? 0 : CC_Z;
+ case CC_OP_SBB_SELF:
+ /* dst is either all zeros (--Z-P-) or all ones (-S-APC) */
+ return (dst & (CC_Z|CC_A|CC_C|CC_S)) ^ (CC_P | CC_Z);
case CC_OP_ADCX:
return compute_all_adcx(dst, src1, src2);
@@ -326,6 +329,9 @@ target_ulong helper_cc_compute_c(target_ulong dst, target_ulong src1,
case CC_OP_MULQ:
return src1 != 0;
+ case CC_OP_SBB_SELF:
+ return dst & 1;
+
case CC_OP_ADCX:
case CC_OP_ADCOX:
return dst;
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index e91715af817..17ad4ccacaf 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -304,6 +304,7 @@ static const uint8_t cc_op_live_[] = {
[CC_OP_ADOX] = USES_CC_SRC | USES_CC_SRC2,
[CC_OP_ADCOX] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
[CC_OP_POPCNT] = USES_CC_DST,
+ [CC_OP_SBB_SELF] = USES_CC_DST,
};
static uint8_t cc_op_live(CCOp op)
@@ -938,6 +939,9 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv reg)
size = cc_op_size(s->cc_op);
return gen_prepare_val_nz(cpu_cc_src, size, false);
+ case CC_OP_SBB_SELF:
+ return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_dst };
+
case CC_OP_ADCX:
case CC_OP_ADCOX:
return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_dst,
@@ -999,6 +1003,7 @@ static CCPrepare gen_prepare_eflags_o(DisasContext *s, TCGv reg)
case CC_OP_ADCOX:
return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_src2,
.no_setcond = true };
+ case CC_OP_SBB_SELF:
case CC_OP_LOGICB ... CC_OP_LOGICQ:
case CC_OP_POPCNT:
return (CCPrepare) { .cond = TCG_COND_NEVER };
@@ -1078,6 +1083,14 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, TCGv reg)
}
break;
+ case CC_OP_SBB_SELF:
+ /* checking for nonzero is usually the most efficient */
+ if (jcc_op == JCC_L || jcc_op == JCC_B || jcc_op == JCC_S) {
+ jcc_op = JCC_Z;
+ inv = !inv;
+ }
+ goto slow_jcc;
+
case CC_OP_LOGICB ... CC_OP_LOGICQ:
/* Mostly used for test+jump */
size = s->cc_op - CC_OP_LOGICB;
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 8dac4d09da1..0fde3d669d9 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -3876,37 +3876,16 @@ static void gen_SBB(DisasContext *s, X86DecodedInsn *decode)
return;
}
- c_in = tcg_temp_new();
- gen_compute_eflags_c(s, c_in);
-
- /*
- * Here the change is as follows:
- * CC_SBB: src1 = T0, src2 = T0, src3 = c_in
- * CC_SUB: src1 = 0, src2 = c_in (no src3)
- *
- * The difference also does not matter:
- * - AF is bit 4 of dst^src1^src2, but bit 4 of src1^src2 is zero in both cases
- * therefore AF comes straight from dst (in fact it is c_in)
- * - for OF, src1 and src2 have the same sign in both cases, meaning there
- * can be no overflow
- */
+ /* SBB x,x has its own CCOp so that's even easier. */
if (decode->e.op2 != X86_TYPE_I && !decode->op[0].has_ea && decode->op[0].n == decode->op[2].n) {
- if (s->cc_op == CC_OP_DYNAMIC) {
- tcg_gen_neg_tl(s->T0, c_in);
- } else {
- /*
- * Do not negate c_in because it will often be dead and only the
- * instruction generated by negsetcond will survive.
- */
- gen_neg_setcc(s, JCC_B << 1, s->T0);
- }
- tcg_gen_movi_tl(s->cc_srcT, 0);
- decode->cc_src = c_in;
- decode->cc_dst = s->T0;
- decode->cc_op = CC_OP_SUBB + ot;
+ gen_neg_setcc(s, JCC_B << 1, s->T0);
+ prepare_update1_cc(decode, s, CC_OP_SBB_SELF);
return;
}
+ c_in = tcg_temp_new();
+ gen_compute_eflags_c(s, c_in);
+
if (s->prefix & PREFIX_LOCK) {
tcg_gen_add_tl(s->T0, s->T1, c_in);
tcg_gen_neg_tl(s->T0, s->T0);
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* [PATCH 18/18] target/i386/tcg: move fetch code out of translate.c
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
` (16 preceding siblings ...)
2025-12-10 13:16 ` [PATCH 17/18] target/i386/tcg: add a CCOp for SBB x,x Paolo Bonzini
@ 2025-12-10 13:16 ` Paolo Bonzini
2025-12-11 19:29 ` Richard Henderson
17 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-10 13:16 UTC (permalink / raw)
To: qemu-devel
Let translate.c only concern itself with TCG code generation. Move everything
that uses CPUX86State*, as well as gen_lea_modrm_0 now that it is only used
to fill decode->mem, to decode-new.c.inc.
While at it also rename gen_lea_modrm_0.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 271 ------------------------------
target/i386/tcg/decode-new.c.inc | 277 ++++++++++++++++++++++++++++++-
2 files changed, 274 insertions(+), 274 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 17ad4ccacaf..a905efdfbbd 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1644,182 +1644,6 @@ static TCGv gen_shiftd_rm_T1(DisasContext *s, MemOp ot,
return cc_src;
}
-#define X86_MAX_INSN_LENGTH 15
-
-static uint64_t advance_pc(CPUX86State *env, DisasContext *s, int num_bytes)
-{
- uint64_t pc = s->pc;
-
- /* This is a subsequent insn that crosses a page boundary. */
- if (s->base.num_insns > 1 &&
- !translator_is_same_page(&s->base, s->pc + num_bytes - 1)) {
- siglongjmp(s->jmpbuf, 2);
- }
-
- s->pc += num_bytes;
- if (unlikely(cur_insn_len(s) > X86_MAX_INSN_LENGTH)) {
- /* If the instruction's 16th byte is on a different page than the 1st, a
- * page fault on the second page wins over the general protection fault
- * caused by the instruction being too long.
- * This can happen even if the operand is only one byte long!
- */
- if (((s->pc - 1) ^ (pc - 1)) & TARGET_PAGE_MASK) {
- (void)translator_ldub(env, &s->base,
- (s->pc - 1) & TARGET_PAGE_MASK);
- }
- siglongjmp(s->jmpbuf, 1);
- }
-
- return pc;
-}
-
-static inline uint8_t x86_ldub_code(CPUX86State *env, DisasContext *s)
-{
- return translator_ldub(env, &s->base, advance_pc(env, s, 1));
-}
-
-static inline uint16_t x86_lduw_code(CPUX86State *env, DisasContext *s)
-{
- return translator_lduw(env, &s->base, advance_pc(env, s, 2));
-}
-
-static inline uint32_t x86_ldl_code(CPUX86State *env, DisasContext *s)
-{
- return translator_ldl(env, &s->base, advance_pc(env, s, 4));
-}
-
-#ifdef TARGET_X86_64
-static inline uint64_t x86_ldq_code(CPUX86State *env, DisasContext *s)
-{
- return translator_ldq(env, &s->base, advance_pc(env, s, 8));
-}
-#endif
-
-/* Decompose an address. */
-
-static AddressParts gen_lea_modrm_0(CPUX86State *env, DisasContext *s,
- int modrm, bool is_vsib)
-{
- int def_seg, base, index, scale, mod, rm;
- target_long disp;
- bool havesib;
-
- def_seg = R_DS;
- index = -1;
- scale = 0;
- disp = 0;
-
- mod = (modrm >> 6) & 3;
- rm = modrm & 7;
- base = rm | REX_B(s);
-
- if (mod == 3) {
- /* Normally filtered out earlier, but including this path
- simplifies multi-byte nop, as well as bndcl, bndcu, bndcn. */
- goto done;
- }
-
- switch (s->aflag) {
- case MO_64:
- case MO_32:
- havesib = 0;
- if (rm == 4) {
- int code = x86_ldub_code(env, s);
- scale = (code >> 6) & 3;
- index = ((code >> 3) & 7) | REX_X(s);
- if (index == 4 && !is_vsib) {
- index = -1; /* no index */
- }
- base = (code & 7) | REX_B(s);
- havesib = 1;
- }
-
- switch (mod) {
- case 0:
- if ((base & 7) == 5) {
- base = -1;
- disp = (int32_t)x86_ldl_code(env, s);
- if (CODE64(s) && !havesib) {
- base = -2;
- disp += s->pc + s->rip_offset;
- }
- }
- break;
- case 1:
- disp = (int8_t)x86_ldub_code(env, s);
- break;
- default:
- case 2:
- disp = (int32_t)x86_ldl_code(env, s);
- break;
- }
-
- /* For correct popl handling with esp. */
- if (base == R_ESP && s->popl_esp_hack) {
- disp += s->popl_esp_hack;
- }
- if (base == R_EBP || base == R_ESP) {
- def_seg = R_SS;
- }
- break;
-
- case MO_16:
- if (mod == 0) {
- if (rm == 6) {
- base = -1;
- disp = x86_lduw_code(env, s);
- break;
- }
- } else if (mod == 1) {
- disp = (int8_t)x86_ldub_code(env, s);
- } else {
- disp = (int16_t)x86_lduw_code(env, s);
- }
-
- switch (rm) {
- case 0:
- base = R_EBX;
- index = R_ESI;
- break;
- case 1:
- base = R_EBX;
- index = R_EDI;
- break;
- case 2:
- base = R_EBP;
- index = R_ESI;
- def_seg = R_SS;
- break;
- case 3:
- base = R_EBP;
- index = R_EDI;
- def_seg = R_SS;
- break;
- case 4:
- base = R_ESI;
- break;
- case 5:
- base = R_EDI;
- break;
- case 6:
- base = R_EBP;
- def_seg = R_SS;
- break;
- default:
- case 7:
- base = R_EBX;
- break;
- }
- break;
-
- default:
- g_assert_not_reached();
- }
-
- done:
- return (AddressParts){ def_seg, base, index, scale, disp };
-}
-
/* Compute the address, with a minimum number of TCG ops. */
static TCGv gen_lea_modrm_1(DisasContext *s, AddressParts a, bool is_vsib)
{
@@ -1904,79 +1728,6 @@ static void gen_st_modrm(DisasContext *s, X86DecodedInsn *decode, MemOp ot)
}
}
-static target_ulong insn_get_addr(CPUX86State *env, DisasContext *s, MemOp ot)
-{
- target_ulong ret;
-
- switch (ot) {
- case MO_8:
- ret = x86_ldub_code(env, s);
- break;
- case MO_16:
- ret = x86_lduw_code(env, s);
- break;
- case MO_32:
- ret = x86_ldl_code(env, s);
- break;
-#ifdef TARGET_X86_64
- case MO_64:
- ret = x86_ldq_code(env, s);
- break;
-#endif
- default:
- g_assert_not_reached();
- }
- return ret;
-}
-
-static inline uint32_t insn_get(CPUX86State *env, DisasContext *s, MemOp ot)
-{
- uint32_t ret;
-
- switch (ot) {
- case MO_8:
- ret = x86_ldub_code(env, s);
- break;
- case MO_16:
- ret = x86_lduw_code(env, s);
- break;
- case MO_32:
-#ifdef TARGET_X86_64
- case MO_64:
-#endif
- ret = x86_ldl_code(env, s);
- break;
- default:
- g_assert_not_reached();
- }
- return ret;
-}
-
-static target_long insn_get_signed(CPUX86State *env, DisasContext *s, MemOp ot)
-{
- target_long ret;
-
- switch (ot) {
- case MO_8:
- ret = (int8_t) x86_ldub_code(env, s);
- break;
- case MO_16:
- ret = (int16_t) x86_lduw_code(env, s);
- break;
- case MO_32:
- ret = (int32_t) x86_ldl_code(env, s);
- break;
-#ifdef TARGET_X86_64
- case MO_64:
- ret = x86_ldq_code(env, s);
- break;
-#endif
- default:
- g_assert_not_reached();
- }
- return ret;
-}
-
static void gen_conditional_jump_labels(DisasContext *s, target_long diff,
TCGLabel *not_taken, TCGLabel *taken)
{
@@ -2221,28 +1972,6 @@ static void gen_leave(DisasContext *s)
gen_op_mov_reg_v(s, a_ot, R_ESP, s->T1);
}
-/* Similarly, except that the assumption here is that we don't decode
- the instruction at all -- either a missing opcode, an unimplemented
- feature, or just a bogus instruction stream. */
-static void gen_unknown_opcode(CPUX86State *env, DisasContext *s)
-{
- gen_illegal_opcode(s);
-
- if (qemu_loglevel_mask(LOG_UNIMP)) {
- FILE *logfile = qemu_log_trylock();
- if (logfile) {
- target_ulong pc = s->base.pc_next, end = s->pc;
-
- fprintf(logfile, "ILLOPC: " TARGET_FMT_lx ":", pc);
- for (; pc < end; ++pc) {
- fprintf(logfile, " %02x", translator_ldub(env, &s->base, pc));
- }
- fprintf(logfile, "\n");
- qemu_log_unlock(logfile);
- }
- }
-}
-
/* an interrupt is different from an exception because of the
privilege checks */
static void gen_interrupt(DisasContext *s, uint8_t intno)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 9d17bae7e75..b4aa300ab47 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -279,6 +279,130 @@
#define UNKNOWN_OPCODE ((X86OpEntry) {})
+#define X86_MAX_INSN_LENGTH 15
+
+static uint64_t advance_pc(CPUX86State *env, DisasContext *s, int num_bytes)
+{
+ uint64_t pc = s->pc;
+
+ /* This is a subsequent insn that crosses a page boundary. */
+ if (s->base.num_insns > 1 &&
+ !translator_is_same_page(&s->base, s->pc + num_bytes - 1)) {
+ siglongjmp(s->jmpbuf, 2);
+ }
+
+ s->pc += num_bytes;
+ if (unlikely(cur_insn_len(s) > X86_MAX_INSN_LENGTH)) {
+ /* If the instruction's 16th byte is on a different page than the 1st, a
+ * page fault on the second page wins over the general protection fault
+ * caused by the instruction being too long.
+ * This can happen even if the operand is only one byte long!
+ */
+ if (((s->pc - 1) ^ (pc - 1)) & TARGET_PAGE_MASK) {
+ (void)translator_ldub(env, &s->base,
+ (s->pc - 1) & TARGET_PAGE_MASK);
+ }
+ siglongjmp(s->jmpbuf, 1);
+ }
+
+ return pc;
+}
+
+static inline uint8_t x86_ldub_code(CPUX86State *env, DisasContext *s)
+{
+ return translator_ldub(env, &s->base, advance_pc(env, s, 1));
+}
+
+static inline uint16_t x86_lduw_code(CPUX86State *env, DisasContext *s)
+{
+ return translator_lduw(env, &s->base, advance_pc(env, s, 2));
+}
+
+static inline uint32_t x86_ldl_code(CPUX86State *env, DisasContext *s)
+{
+ return translator_ldl(env, &s->base, advance_pc(env, s, 4));
+}
+
+#ifdef TARGET_X86_64
+static inline uint64_t x86_ldq_code(CPUX86State *env, DisasContext *s)
+{
+ return translator_ldq(env, &s->base, advance_pc(env, s, 8));
+}
+#endif
+
+static target_ulong insn_get_addr(CPUX86State *env, DisasContext *s, MemOp ot)
+{
+ target_ulong ret;
+
+ switch (ot) {
+ case MO_8:
+ ret = x86_ldub_code(env, s);
+ break;
+ case MO_16:
+ ret = x86_lduw_code(env, s);
+ break;
+ case MO_32:
+ ret = x86_ldl_code(env, s);
+ break;
+#ifdef TARGET_X86_64
+ case MO_64:
+ ret = x86_ldq_code(env, s);
+ break;
+#endif
+ default:
+ g_assert_not_reached();
+ }
+ return ret;
+}
+
+static inline uint32_t insn_get(CPUX86State *env, DisasContext *s, MemOp ot)
+{
+ uint32_t ret;
+
+ switch (ot) {
+ case MO_8:
+ ret = x86_ldub_code(env, s);
+ break;
+ case MO_16:
+ ret = x86_lduw_code(env, s);
+ break;
+ case MO_32:
+#ifdef TARGET_X86_64
+ case MO_64:
+#endif
+ ret = x86_ldl_code(env, s);
+ break;
+ default:
+ g_assert_not_reached();
+ }
+ return ret;
+}
+
+static target_long insn_get_signed(CPUX86State *env, DisasContext *s, MemOp ot)
+{
+ target_long ret;
+
+ switch (ot) {
+ case MO_8:
+ ret = (int8_t) x86_ldub_code(env, s);
+ break;
+ case MO_16:
+ ret = (int16_t) x86_lduw_code(env, s);
+ break;
+ case MO_32:
+ ret = (int32_t) x86_ldl_code(env, s);
+ break;
+#ifdef TARGET_X86_64
+ case MO_64:
+ ret = x86_ldq_code(env, s);
+ break;
+#endif
+ default:
+ g_assert_not_reached();
+ }
+ return ret;
+}
+
static uint8_t get_modrm(DisasContext *s, CPUX86State *env)
{
if (!s->has_modrm) {
@@ -1883,6 +2007,130 @@ static void decode_root(DisasContext *s, CPUX86State *env, X86OpEntry *entry, ui
*entry = opcodes_root[*b];
}
+/* Decompose an address. */
+static AddressParts decode_modrm_address(CPUX86State *env, DisasContext *s,
+ int modrm, bool is_vsib)
+{
+ int def_seg, base, index, scale, mod, rm;
+ target_long disp;
+ bool havesib;
+
+ def_seg = R_DS;
+ index = -1;
+ scale = 0;
+ disp = 0;
+
+ mod = (modrm >> 6) & 3;
+ rm = modrm & 7;
+ base = rm | REX_B(s);
+
+ if (mod == 3) {
+ /* Normally filtered out earlier, but including this path
+ simplifies multi-byte nop, as well as bndcl, bndcu, bndcn. */
+ goto done;
+ }
+
+ switch (s->aflag) {
+ case MO_64:
+ case MO_32:
+ havesib = 0;
+ if (rm == 4) {
+ int code = x86_ldub_code(env, s);
+ scale = (code >> 6) & 3;
+ index = ((code >> 3) & 7) | REX_X(s);
+ if (index == 4 && !is_vsib) {
+ index = -1; /* no index */
+ }
+ base = (code & 7) | REX_B(s);
+ havesib = 1;
+ }
+
+ switch (mod) {
+ case 0:
+ if ((base & 7) == 5) {
+ base = -1;
+ disp = (int32_t)x86_ldl_code(env, s);
+ if (CODE64(s) && !havesib) {
+ base = -2;
+ disp += s->pc + s->rip_offset;
+ }
+ }
+ break;
+ case 1:
+ disp = (int8_t)x86_ldub_code(env, s);
+ break;
+ default:
+ case 2:
+ disp = (int32_t)x86_ldl_code(env, s);
+ break;
+ }
+
+ /* For correct popl handling with esp. */
+ if (base == R_ESP && s->popl_esp_hack) {
+ disp += s->popl_esp_hack;
+ }
+ if (base == R_EBP || base == R_ESP) {
+ def_seg = R_SS;
+ }
+ break;
+
+ case MO_16:
+ if (mod == 0) {
+ if (rm == 6) {
+ base = -1;
+ disp = x86_lduw_code(env, s);
+ break;
+ }
+ } else if (mod == 1) {
+ disp = (int8_t)x86_ldub_code(env, s);
+ } else {
+ disp = (int16_t)x86_lduw_code(env, s);
+ }
+
+ switch (rm) {
+ case 0:
+ base = R_EBX;
+ index = R_ESI;
+ break;
+ case 1:
+ base = R_EBX;
+ index = R_EDI;
+ break;
+ case 2:
+ base = R_EBP;
+ index = R_ESI;
+ def_seg = R_SS;
+ break;
+ case 3:
+ base = R_EBP;
+ index = R_EDI;
+ def_seg = R_SS;
+ break;
+ case 4:
+ base = R_ESI;
+ break;
+ case 5:
+ base = R_EDI;
+ break;
+ case 6:
+ base = R_EBP;
+ def_seg = R_SS;
+ break;
+ default:
+ case 7:
+ base = R_EBX;
+ break;
+ }
+ break;
+
+ default:
+ g_assert_not_reached();
+ }
+
+ done:
+ return (AddressParts){ def_seg, base, index, scale, disp };
+}
+
static int decode_modrm(DisasContext *s, CPUX86State *env,
X86DecodedInsn *decode, X86DecodedOp *op)
{
@@ -1895,8 +2143,8 @@ static int decode_modrm(DisasContext *s, CPUX86State *env,
} else {
op->has_ea = true;
op->n = -1;
- decode->mem = gen_lea_modrm_0(env, s, modrm,
- decode->e.vex_class == 12);
+ decode->mem = decode_modrm_address(env, s, get_modrm(s, env),
+ decode->e.vex_class == 12);
}
return modrm;
}
@@ -2516,6 +2764,23 @@ illegal:
return false;
}
+static void dump_unknown_opcode(CPUX86State *env, DisasContext *s)
+{
+ if (qemu_loglevel_mask(LOG_UNIMP)) {
+ FILE *logfile = qemu_log_trylock();
+ if (logfile) {
+ target_ulong pc = s->base.pc_next, end = s->pc;
+
+ fprintf(logfile, "ILLOPC: " TARGET_FMT_lx ":", pc);
+ for (; pc < end; ++pc) {
+ fprintf(logfile, " %02x", translator_ldub(env, &s->base, pc));
+ }
+ fprintf(logfile, "\n");
+ qemu_log_unlock(logfile);
+ }
+ }
+}
+
/*
* Convert one instruction. s->base.is_jmp is set if the translation must
* be stopped.
@@ -2902,5 +3167,11 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
gen_illegal_opcode(s);
return;
unknown_op:
- gen_unknown_opcode(env, s);
+ /*
+ * Similarly, except that the assumption here is that we don't decode
+ * the instruction at all -- either a missing opcode, an unimplemented
+ * feature, or just a bogus instruction stream.
+ */
+ gen_illegal_opcode(s);
+ dump_unknown_opcode(env, s);
}
--
2.52.0
^ permalink raw reply related [flat|nested] 41+ messages in thread
* Re: [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction
2025-12-10 13:16 ` [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction Paolo Bonzini
@ 2025-12-11 15:47 ` Richard Henderson
2025-12-11 20:28 ` Paolo Bonzini
0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 15:47 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: qemu-stable
On 12/10/25 07:16, Paolo Bonzini wrote:
> VSIB instructions (VEX class 12) must not have an address prefix.
> Checking s->aflag == MO_16 is not enough because in 64-bit mode
> the address prefix changes aflag to MO_32. Add a specific check
> bit instead.
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.h | 3 +++
> target/i386/tcg/decode-new.c.inc | 27 +++++++++++++--------------
> 2 files changed, 16 insertions(+), 14 deletions(-)
Where do you see this? I think this is wrong.
In particular,
Table 2-27. Type 12 Class Exception Conditions
- If address size attribute is 16 bit.
and
2.3.12 Vector SIB (VSIB) Memory Addressing
In 16-bit protected mode, VSIB memory addressing is permitted if address size attribute is
overridden to 32 bits.
Therefore, in 16-bit mode, one *must* use the address prefix.
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 02/18] target/i386/tcg: ignore V3 in 32-bit mode
2025-12-10 13:16 ` [PATCH 02/18] target/i386/tcg: ignore V3 in 32-bit mode Paolo Bonzini
@ 2025-12-11 15:52 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 15:52 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: qemu-stable
On 12/10/25 07:16, Paolo Bonzini wrote:
> From the manual: "In 64-bit mode all 4 bits may be used. [...]
> In 32-bit and 16-bit modes bit 6 must be 1 (if bit 6 is not 1, the
> 2-byte VEX version will generate LDS instruction and the 3-byte VEX
> version will ignore this bit)."
>
> Cc:qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 03/18] target/i386/tcg: update cc_op after PUSHF
2025-12-10 13:16 ` [PATCH 03/18] target/i386/tcg: update cc_op after PUSHF Paolo Bonzini
@ 2025-12-11 15:55 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 15:55 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> PUSHF needs to compute the full eflags, set the cc_op to
> CC_OP_EFLAGS.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/emit.c.inc | 2 ++
> 1 file changed, 2 insertions(+)
>
> diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
> index 1a7fab9333a..22e53f5b000 100644
> --- a/target/i386/tcg/emit.c.inc
> +++ b/target/i386/tcg/emit.c.inc
> @@ -3250,6 +3250,8 @@ static void gen_PUSHF(DisasContext *s, X86DecodedInsn *decode)
> gen_update_cc_op(s);
> gen_helper_read_eflags(s->T0, tcg_env);
> gen_push_v(s, s->T0);
> + decode->cc_src = s->T0;
> + decode->cc_op = CC_OP_EFLAGS;
> }
>
> static MemOp gen_shift_count(DisasContext *s, X86DecodedInsn *decode,
Ah, as an optimization to not duplicate computation of these flags, not a bug fix. You
might expand the commit message by a few words there. Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 04/18] target/i386/tcg: mark more instructions that are invalid in 64-bit mode
2025-12-10 13:16 ` [PATCH 04/18] target/i386/tcg: mark more instructions that are invalid in 64-bit mode Paolo Bonzini
@ 2025-12-11 15:59 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 15:59 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 8 ++++----
> 1 file changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
>
> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
> index c9b4d5ffa32..213dbb9637c 100644
> --- a/target/i386/tcg/decode-new.c.inc
> +++ b/target/i386/tcg/decode-new.c.inc
> @@ -1698,9 +1698,9 @@ static const X86OpEntry opcodes_root[256] = {
> [0xD1] = X86_OP_GROUP1(group2, E,v),
> [0xD2] = X86_OP_GROUP2(group2, E,b, 1,b), /* CL */
> [0xD3] = X86_OP_GROUP2(group2, E,v, 1,b), /* CL */
> - [0xD4] = X86_OP_ENTRY2(AAM, 0,w, I,b),
> - [0xD5] = X86_OP_ENTRY2(AAD, 0,w, I,b),
> - [0xD6] = X86_OP_ENTRYw(SALC, 0,b),
> + [0xD4] = X86_OP_ENTRY2(AAM, 0,w, I,b, chk(i64)),
> + [0xD5] = X86_OP_ENTRY2(AAD, 0,w, I,b, chk(i64)),
> + [0xD6] = X86_OP_ENTRYw(SALC, 0,b, chk(i64)),
> [0xD7] = X86_OP_ENTRY1(XLAT, 0,b, zextT0), /* AL read/written */
>
> [0xE0] = X86_OP_ENTRYr(LOOPNE, J,b), /* implicit: CX with aflag size */
> @@ -1834,7 +1834,7 @@ static const X86OpEntry opcodes_root[256] = {
> [0xCB] = X86_OP_ENTRY0(RETF),
> [0xCC] = X86_OP_ENTRY0(INT3),
> [0xCD] = X86_OP_ENTRYr(INT, I,b, chk(vm86_iopl)),
> - [0xCE] = X86_OP_ENTRY0(INTO),
> + [0xCE] = X86_OP_ENTRY0(INTO, chk(i64)),
> [0xCF] = X86_OP_ENTRY0(IRET, chk(vm86_iopl) svm(IRET)),
>
> /*
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 05/18] target/i386/tcg: do not compute all flags for SAHF
2025-12-10 13:16 ` [PATCH 05/18] target/i386/tcg: do not compute all flags for SAHF Paolo Bonzini
@ 2025-12-11 16:03 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:03 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> Only OF is needed, the others are overwritten.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/emit.c.inc | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
> index 22e53f5b000..131aefce53c 100644
> --- a/target/i386/tcg/emit.c.inc
> +++ b/target/i386/tcg/emit.c.inc
> @@ -3778,7 +3778,7 @@ static void gen_SAHF(DisasContext *s, X86DecodedInsn *decode)
> return gen_illegal_opcode(s);
> }
> tcg_gen_shri_tl(s->T0, cpu_regs[R_EAX], 8);
> - gen_compute_eflags(s);
> + gen_neg_setcc(s, JCC_O << 1, cpu_cc_src);
> tcg_gen_andi_tl(cpu_cc_src, cpu_cc_src, CC_O);
> tcg_gen_andi_tl(s->T0, s->T0, CC_S | CC_Z | CC_A | CC_P | CC_C);
> tcg_gen_or_tl(cpu_cc_src, cpu_cc_src, s->T0);
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 06/18] target/i386/tcg: remove do_decode_0F
2025-12-10 13:16 ` [PATCH 06/18] target/i386/tcg: remove do_decode_0F Paolo Bonzini
@ 2025-12-11 16:03 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:03 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> It is not needed anymore since all prefixes are handled by the
> new decoder.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 7 +------
> 1 file changed, 1 insertion(+), 6 deletions(-)
>
> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
> index 213dbb9637c..ea8e26f7f98 100644
> --- a/target/i386/tcg/decode-new.c.inc
> +++ b/target/i386/tcg/decode-new.c.inc
> @@ -1430,15 +1430,10 @@ static const X86OpEntry opcodes_0F[256] = {
> [0xff] = X86_OP_ENTRYr(UD, nop,v), /* UD0 */
> };
>
> -static void do_decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
> -{
> - *entry = opcodes_0F[*b];
> -}
> -
> static void decode_0F(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
> {
> *b = x86_ldub_code(env, s);
> - do_decode_0F(s, env, entry, b);
> + *entry = opcodes_0F[*b];
> }
>
> static void decode_63(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 07/18] target/i386/tcg: move and expand misplaced comment
2025-12-10 13:16 ` [PATCH 07/18] target/i386/tcg: move and expand misplaced comment Paolo Bonzini
@ 2025-12-11 16:04 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:04 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> @@ -2222,6 +2217,10 @@ static bool decode_insn(DisasContext *s, CPUX86State *env, X86DecodeFunc decode_
> {
> X86OpEntry *e = &decode->e;
>
> + /*
> + * Each step decodes part of the opcode and place the last not-fully-decoded
places
> + * byte in decode->b. If the modrm byte is read, it is placed in s->modrm.
> + */
> decode_func(s, env, e, &decode->b);
> while (e->is_decode) {
> e->is_decode = false;
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 08/18] target/i386/tcg: simplify effective address calculation
2025-12-10 13:16 ` [PATCH 08/18] target/i386/tcg: simplify effective address calculation Paolo Bonzini
@ 2025-12-11 16:15 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:15 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> Split gen_lea_v_seg_dest into three simple phases (extend from
> 16 bits, add, final extend), with optimization for known-zero bases
> to avoid back-to-back extensions.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 64 ++++++++++++-------------------------
> 1 file changed, 20 insertions(+), 44 deletions(-)
>
> diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
> index 0cb87d02012..2ab3c2ac663 100644
> --- a/target/i386/tcg/translate.c
> +++ b/target/i386/tcg/translate.c
> @@ -627,54 +627,30 @@ static TCGv eip_cur_tl(DisasContext *s)
> static void gen_lea_v_seg_dest(DisasContext *s, MemOp aflag, TCGv dest, TCGv a0,
> int def_seg, int ovr_seg)
> {
> - switch (aflag) {
> -#ifdef TARGET_X86_64
> - case MO_64:
> - if (ovr_seg < 0) {
> - tcg_gen_mov_tl(dest, a0);
> - return;
> + int easize;
> + bool has_base;
> +
> + if (ovr_seg < 0) {
> + ovr_seg = def_seg;
> + }
> +
> + has_base = ovr_seg >= 0 && (ADDSEG(s) || ovr_seg >= R_FS);
I guess def_seg is -1 for LEA, so ovr_seg can still be -1.
I wonder if it would be clearer to avoid this duplication of segment earlier in decode?
Anyway, for here, maybe clearer as
has_base = ovr_seg >= R_FS || (ovr_seg >= 0 && ADDSEG(s));
even though the end result is the same.
Nice cleanup.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 09/18] target/i386/tcg: unnest switch statements in disas_insn_x87
2025-12-10 13:16 ` [PATCH 09/18] target/i386/tcg: unnest switch statements in disas_insn_x87 Paolo Bonzini
@ 2025-12-11 16:20 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:20 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> @@ -2801,22 +2785,16 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
> }
> break;
> case 0x00: case 0x01: case 0x04 ... 0x07: /* fxxx st, sti */
> + gen_helper_fmov_FT0_STN(tcg_env,
> + tcg_constant_i32(opreg));
> + gen_helper_fp_arith_ST0_FT0(op & 7);
> + break;
> +
> case 0x20: case 0x21: case 0x24 ... 0x27: /* fxxx sti, st */
> case 0x30: case 0x31: case 0x34 ... 0x37: /* fxxxp sti, st */
> - {
> - int op1;
> -
> - op1 = op & 7;
> - if (op >= 0x20) {
> - gen_helper_fp_arith_STN_ST0(op1, opreg);
> - if (op >= 0x30) {
> - gen_helper_fpop(tcg_env);
> - }
> - } else {
> - gen_helper_fmov_FT0_STN(tcg_env,
> - tcg_constant_i32(opreg));
> - gen_helper_fp_arith_ST0_FT0(op1);
> - }
> + gen_helper_fp_arith_STN_ST0(op & 7, opreg);
> + if (op >= 0x30) {
> + gen_helper_fpop(tcg_env);
> }
> break;
Leaving the op >= 30 check here?
I'd have expected
case 0x20: case 0x21: case 0x24 ... 0x27: /* fxxx sti, st */
gen_helper_fp_arith_STN_ST0(op & 7, opreg);
break;
case 0x30: case 0x31: case 0x34 ... 0x37: /* fxxxp sti, st */
gen_helper_fp_arith_STN_ST0(op & 7, opreg);
gen_helper_fpop(tcg_env);
break;
Anyway,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 10/18] target/i386/tcg: move fcom/fcomp differentiation to gen_helper_fp_arith_ST0_FT0
2025-12-10 13:16 ` [PATCH 10/18] target/i386/tcg: move fcom/fcomp differentiation to gen_helper_fp_arith_ST0_FT0 Paolo Bonzini
@ 2025-12-11 16:21 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:21 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> There is only one call site for gen_helper_fp_arith_ST0_FT0(), therefore
> there is no need to check the op1 == 3 in the caller. Once this is done,
> eliminate the goto to that call site.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 23 ++++++++---------------
> 1 file changed, 8 insertions(+), 15 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
>
> diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
> index c755329b3d9..3c55b62bdec 100644
> --- a/target/i386/tcg/translate.c
> +++ b/target/i386/tcg/translate.c
> @@ -1485,6 +1485,7 @@ static void gen_helper_fp_arith_ST0_FT0(int op)
> break;
> case 3:
> gen_helper_fcom_ST0_FT0(tcg_env);
> + gen_helper_fpop(tcg_env);
> break;
> case 4:
> gen_helper_fsub_ST0_FT0(tcg_env);
> @@ -2460,36 +2461,28 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
> tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
> s->mem_index, MO_LEUL);
> gen_helper_flds_FT0(tcg_env, s->tmp2_i32);
> - goto fp_arith_ST0_FT0;
> + gen_helper_fp_arith_ST0_FT0(op & 7);
> + break;
>
> case 0x10 ... 0x17: /* fixxxl */
> tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
> s->mem_index, MO_LEUL);
> gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
> - goto fp_arith_ST0_FT0;
> + gen_helper_fp_arith_ST0_FT0(op & 7);
> + break;
>
> case 0x20 ... 0x27: /* fxxxl */
> tcg_gen_qemu_ld_i64(s->tmp1_i64, s->A0,
> s->mem_index, MO_LEUQ);
> gen_helper_fldl_FT0(tcg_env, s->tmp1_i64);
> - goto fp_arith_ST0_FT0;
> + gen_helper_fp_arith_ST0_FT0(op & 7);
> + break;
>
> case 0x30 ... 0x37: /* fixxx */
> tcg_gen_qemu_ld_i32(s->tmp2_i32, s->A0,
> s->mem_index, MO_LESW);
> gen_helper_fildl_FT0(tcg_env, s->tmp2_i32);
> - goto fp_arith_ST0_FT0;
> -
> -fp_arith_ST0_FT0:
> - {
> - int op1 = op & 7;
> -
> - gen_helper_fp_arith_ST0_FT0(op1);
> - if (op1 == 3) {
> - /* fcomp needs pop */
> - gen_helper_fpop(tcg_env);
> - }
> - }
> + gen_helper_fp_arith_ST0_FT0(op & 7);
> break;
>
> case 0x08: /* flds */
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 11/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for fcom STn and fcomp STn
2025-12-10 13:16 ` [PATCH 11/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for fcom STn and fcomp STn Paolo Bonzini
@ 2025-12-11 16:24 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:24 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> Treat specially the undocumented ops, instead of treating specially the
> two d8/0 opcodes that have undocumented variants: just call
> gen_helper_fp_arith_ST0_FT0 for all opcodes in the d8/0 encoding.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 4 +---
> 1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
> index 3c55b62bdec..8f50071a4f4 100644
> --- a/target/i386/tcg/translate.c
> +++ b/target/i386/tcg/translate.c
> @@ -2777,7 +2777,7 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
> break;
> }
> break;
> - case 0x00: case 0x01: case 0x04 ... 0x07: /* fxxx st, sti */
> + case 0x00 ... 0x07: /* fxxx st, sti */
> gen_helper_fmov_FT0_STN(tcg_env,
> tcg_constant_i32(opreg));
> gen_helper_fp_arith_ST0_FT0(op & 7);
> @@ -2790,12 +2790,10 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
> gen_helper_fpop(tcg_env);
> }
> break;
> - case 0x02: /* fcom */
> case 0x22: /* fcom2, undocumented op */
> gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
> gen_helper_fcom_ST0_FT0(tcg_env);
> break;
> - case 0x03: /* fcomp */
> case 0x23: /* fcomp3, undocumented op */
> case 0x32: /* fcomp5, undocumented op */
> gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 12/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for undocumented fcom/fcomp variants
2025-12-10 13:16 ` [PATCH 12/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for undocumented fcom/fcomp variants Paolo Bonzini
@ 2025-12-11 16:26 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:26 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> For 0x32 hack the op to be fcomp; for the others there isn't even anything special
> to do.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 15 +++++----------
> 1 file changed, 5 insertions(+), 10 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
>
> diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
> index 8f50071a4f4..f47bb5de8b3 100644
> --- a/target/i386/tcg/translate.c
> +++ b/target/i386/tcg/translate.c
> @@ -2777,7 +2777,12 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
> break;
> }
> break;
> + case 0x32: /* fcomp5, undocumented op */
> + /* map to fcomp; op & 7 == 2 would not pop */
> + op = 0x03;
> + /* fallthrough */
> case 0x00 ... 0x07: /* fxxx st, sti */
> + case 0x22 ... 0x23: /* fcom2 and fcomp3, undocumented ops */
> gen_helper_fmov_FT0_STN(tcg_env,
> tcg_constant_i32(opreg));
> gen_helper_fp_arith_ST0_FT0(op & 7);
> @@ -2790,16 +2795,6 @@ static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
> gen_helper_fpop(tcg_env);
> }
> break;
> - case 0x22: /* fcom2, undocumented op */
> - gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
> - gen_helper_fcom_ST0_FT0(tcg_env);
> - break;
> - case 0x23: /* fcomp3, undocumented op */
> - case 0x32: /* fcomp5, undocumented op */
> - gen_helper_fmov_FT0_STN(tcg_env, tcg_constant_i32(opreg));
> - gen_helper_fcom_ST0_FT0(tcg_env);
> - gen_helper_fpop(tcg_env);
> - break;
> case 0x15: /* da/5 */
> switch (rm) {
> case 1: /* fucompp */
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 14/18] target/i386/tcg: kill tmp1_i64
2025-12-10 13:16 ` [PATCH 14/18] target/i386/tcg: kill tmp1_i64 Paolo Bonzini
@ 2025-12-11 16:28 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:28 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 66 ++++++++++++++++++++--------------
> target/i386/tcg/emit.c.inc | 72 ++++++++++++++++++++++---------------
> 2 files changed, 84 insertions(+), 54 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 15/18] target/i386/tcg: kill tmp2_i32
2025-12-10 13:16 ` [PATCH 15/18] target/i386/tcg: kill tmp2_i32 Paolo Bonzini
@ 2025-12-11 16:29 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 16:29 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 121 +++++++++++++++++++++---------------
> 1 file changed, 71 insertions(+), 50 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 16/18] target/i386/tcg: commonize code to compute SF/ZF/PF
2025-12-10 13:16 ` [PATCH 16/18] target/i386/tcg: commonize code to compute SF/ZF/PF Paolo Bonzini
@ 2025-12-11 18:46 ` Richard Henderson
2025-12-12 15:45 ` Paolo Bonzini
0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 18:46 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> +psz_b:
> + shift += 8;
> +psz_w:
> + shift += 16;
> +psz_l:
> +#ifdef TARGET_X86_64
> + shift += 32;
> +psz_q:
> +#endif
Oof. Use cc_op_size instead of a set of gotos.
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 17/18] target/i386/tcg: add a CCOp for SBB x,x
2025-12-10 13:16 ` [PATCH 17/18] target/i386/tcg: add a CCOp for SBB x,x Paolo Bonzini
@ 2025-12-11 19:11 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 19:11 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> This is more efficient both when generating code and when testing
> flags.
I guess sbb x,x appears quite frequently in x86 setcc computation, and the testing of the
flags is much less important than the straight line code generation?
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index ecca38ed0b5..314e773a5d4 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1515,7 +1515,18 @@ typedef enum {
> CC_OP_POPCNTL__,
> CC_OP_POPCNTQ__,
> CC_OP_POPCNT = sizeof(target_ulong) == 8 ? CC_OP_POPCNTQ__ : CC_OP_POPCNTL__,
> -#define CC_OP_LAST_BWLQ CC_OP_POPCNTQ__
> +
> + /*
> + * Note that only CC_OP_SBB_SELF (i.e. the one with MO_TL size)
> + * is used or implemented, because the translation produces a
> + * sign-extended CC_DST.
> + */
> + CC_OP_SBB_SELFB__, /* S/Z/C/A via CC_DST, O clear, P set. */
> + CC_OP_SBB_SELFW__,
> + CC_OP_SBB_SELFL__,
> + CC_OP_SBB_SELFQ__,
> + CC_OP_SBB_SELF = sizeof(target_ulong) == 8 ? CC_OP_SBB_SELFQ__ : CC_OP_SBB_SELFL__,
> +#define CC_OP_LAST_BWLQ CC_OP_SBB_SELFQ__
The documentation here could be improved to note that CC_DST is always in {-1, 0}. The
fact that you can derive all other flags via masking less immediately relevant.
Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 18/18] target/i386/tcg: move fetch code out of translate.c
2025-12-10 13:16 ` [PATCH 18/18] target/i386/tcg: move fetch code out of translate.c Paolo Bonzini
@ 2025-12-11 19:29 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 19:29 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/10/25 07:16, Paolo Bonzini wrote:
> Let translate.c only concern itself with TCG code generation. Move everything
> that uses CPUX86State*, as well as gen_lea_modrm_0 now that it is only used
> to fill decode->mem, to decode-new.c.inc.
>
> While at it also rename gen_lea_modrm_0.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 271 ------------------------------
> target/i386/tcg/decode-new.c.inc | 277 ++++++++++++++++++++++++++++++-
> 2 files changed, 274 insertions(+), 274 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction
2025-12-11 15:47 ` Richard Henderson
@ 2025-12-11 20:28 ` Paolo Bonzini
2025-12-11 22:22 ` Richard Henderson
0 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-11 20:28 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-stable
On Thu, Dec 11, 2025 at 4:47 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> On 12/10/25 07:16, Paolo Bonzini wrote:
> > VSIB instructions (VEX class 12) must not have an address prefix.
> > Checking s->aflag == MO_16 is not enough because in 64-bit mode
> > the address prefix changes aflag to MO_32. Add a specific check
> > bit instead.
> >
> > Cc: qemu-stable@nongnu.org
> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> > ---
> > target/i386/tcg/decode-new.h | 3 +++
> > target/i386/tcg/decode-new.c.inc | 27 +++++++++++++--------------
> > 2 files changed, 16 insertions(+), 14 deletions(-)
>
> Where do you see this? I think this is wrong.
Yes, I was confused by the comment and by QEMU's incorrect decoding logic:
if (CODE32(s) && !VM86(s)) {
which should be changed to
if (PE(s) && !VM86(s)) {
And by the way, this also means that we need either separate helpers
for 32- and 64-bit addresses, or a mask argument.
Paolo
> In particular,
>
> Table 2-27. Type 12 Class Exception Conditions
> - If address size attribute is 16 bit.
>
> and
>
> 2.3.12 Vector SIB (VSIB) Memory Addressing
> In 16-bit protected mode, VSIB memory addressing is permitted if address size attribute is
> overridden to 32 bits.
>
> Therefore, in 16-bit mode, one *must* use the address prefix.
>
>
>
> r~
>
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction
2025-12-11 20:28 ` Paolo Bonzini
@ 2025-12-11 22:22 ` Richard Henderson
2025-12-12 2:06 ` Paolo Bonzini
0 siblings, 1 reply; 41+ messages in thread
From: Richard Henderson @ 2025-12-11 22:22 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, qemu-stable
On 12/11/25 14:28, Paolo Bonzini wrote:
> On Thu, Dec 11, 2025 at 4:47 PM Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> On 12/10/25 07:16, Paolo Bonzini wrote:
>>> VSIB instructions (VEX class 12) must not have an address prefix.
>>> Checking s->aflag == MO_16 is not enough because in 64-bit mode
>>> the address prefix changes aflag to MO_32. Add a specific check
>>> bit instead.
>>>
>>> Cc: qemu-stable@nongnu.org
>>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>>> ---
>>> target/i386/tcg/decode-new.h | 3 +++
>>> target/i386/tcg/decode-new.c.inc | 27 +++++++++++++--------------
>>> 2 files changed, 16 insertions(+), 14 deletions(-)
>>
>> Where do you see this? I think this is wrong.
>
> Yes, I was confused by the comment and by QEMU's incorrect decoding logic:
>
> if (CODE32(s) && !VM86(s)) {
>
> which should be changed to
>
> if (PE(s) && !VM86(s)) {
I can't find the language for that. Can you point me at it?
> And by the way, this also means that we need either separate helpers
> for 32- and 64-bit addresses, or a mask argument.
Of course.
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction
2025-12-11 22:22 ` Richard Henderson
@ 2025-12-12 2:06 ` Paolo Bonzini
2025-12-12 14:37 ` Richard Henderson
0 siblings, 1 reply; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-12 2:06 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel, qemu-stable
[-- Attachment #1: Type: text/plain, Size: 757 bytes --]
Il gio 11 dic 2025, 23:22 Richard Henderson <richard.henderson@linaro.org>
ha scritto:
> > Yes, I was confused by the comment and by QEMU's incorrect decoding
> logic:
> >
> > if (CODE32(s) && !VM86(s)) {
> >
> > which should be changed to
> >
> > if (PE(s) && !VM86(s)) {
>
> I can't find the language for that. Can you point me at it?
>
It's the exception condition tables. They all mention that you get #UD for
the VEX prefix in real or vm86 mode.
Several BMI instructions also have language like "This instruction is not
supported in real mode and virtual-8086 mode".
Paolo
> > And by the way, this also means that we need either separate helpers
> > for 32- and 64-bit addresses, or a mask argument.
>
> Of course.
>
>
> r~
>
>
[-- Attachment #2: Type: text/html, Size: 1578 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction
2025-12-12 2:06 ` Paolo Bonzini
@ 2025-12-12 14:37 ` Richard Henderson
0 siblings, 0 replies; 41+ messages in thread
From: Richard Henderson @ 2025-12-12 14:37 UTC (permalink / raw)
To: Paolo Bonzini; +Cc: qemu-devel, qemu-stable
On 12/11/25 20:06, Paolo Bonzini wrote:
>
>
> Il gio 11 dic 2025, 23:22 Richard Henderson <richard.henderson@linaro.org
> <mailto:richard.henderson@linaro.org>> ha scritto:
>
> > Yes, I was confused by the comment and by QEMU's incorrect decoding logic:
> >
> > if (CODE32(s) && !VM86(s)) {
> >
> > which should be changed to
> >
> > if (PE(s) && !VM86(s)) {
>
> I can't find the language for that. Can you point me at it?
>
>
> It's the exception condition tables. They all mention that you get #UD for the VEX prefix
> in real or vm86 mode.
Ah right, found it. Thanks.
> Several BMI instructions also have language like "This instruction is not supported in
> real mode and virtual-8086 mode".
Amusingly, some of them dropped the "not" in that sentence -- see ADCX.
r~
^ permalink raw reply [flat|nested] 41+ messages in thread
* Re: [PATCH 16/18] target/i386/tcg: commonize code to compute SF/ZF/PF
2025-12-11 18:46 ` Richard Henderson
@ 2025-12-12 15:45 ` Paolo Bonzini
0 siblings, 0 replies; 41+ messages in thread
From: Paolo Bonzini @ 2025-12-12 15:45 UTC (permalink / raw)
To: Richard Henderson; +Cc: qemu-devel
[-- Attachment #1: Type: text/plain, Size: 425 bytes --]
Il gio 11 dic 2025, 19:46 Richard Henderson <richard.henderson@linaro.org>
ha scritto:
> On 12/10/25 07:16, Paolo Bonzini wrote:
> > +psz_b:
> > + shift += 8;
> > +psz_w:
> > + shift += 16;
> > +psz_l:
> > +#ifdef TARGET_X86_64
> > + shift += 32;
> > +psz_q:
> > +#endif
>
> Oof. Use cc_op_size instead of a set of gotos.
>
I was so proud :) I will check what the code produced with cc_op_size looks
like.
Paolo
[-- Attachment #2: Type: text/html, Size: 912 bytes --]
^ permalink raw reply [flat|nested] 41+ messages in thread
end of thread, other threads:[~2025-12-12 15:46 UTC | newest]
Thread overview: 41+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-12-10 13:16 [PATCH 00/18] First round of target/i386/tcg patches for QEMU 11.0 Paolo Bonzini
2025-12-10 13:16 ` [PATCH 01/18] target/i386/tcg: fix check for invalid VSIB instruction Paolo Bonzini
2025-12-11 15:47 ` Richard Henderson
2025-12-11 20:28 ` Paolo Bonzini
2025-12-11 22:22 ` Richard Henderson
2025-12-12 2:06 ` Paolo Bonzini
2025-12-12 14:37 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 02/18] target/i386/tcg: ignore V3 in 32-bit mode Paolo Bonzini
2025-12-11 15:52 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 03/18] target/i386/tcg: update cc_op after PUSHF Paolo Bonzini
2025-12-11 15:55 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 04/18] target/i386/tcg: mark more instructions that are invalid in 64-bit mode Paolo Bonzini
2025-12-11 15:59 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 05/18] target/i386/tcg: do not compute all flags for SAHF Paolo Bonzini
2025-12-11 16:03 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 06/18] target/i386/tcg: remove do_decode_0F Paolo Bonzini
2025-12-11 16:03 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 07/18] target/i386/tcg: move and expand misplaced comment Paolo Bonzini
2025-12-11 16:04 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 08/18] target/i386/tcg: simplify effective address calculation Paolo Bonzini
2025-12-11 16:15 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 09/18] target/i386/tcg: unnest switch statements in disas_insn_x87 Paolo Bonzini
2025-12-11 16:20 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 10/18] target/i386/tcg: move fcom/fcomp differentiation to gen_helper_fp_arith_ST0_FT0 Paolo Bonzini
2025-12-11 16:21 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 11/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for fcom STn and fcomp STn Paolo Bonzini
2025-12-11 16:24 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 12/18] target/i386/tcg: reuse gen_helper_fp_arith_ST0_FT0 for undocumented fcom/fcomp variants Paolo Bonzini
2025-12-11 16:26 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 13/18] target/i386/tcg: unify more pop/no-pop x87 instructions Paolo Bonzini
2025-12-10 13:16 ` [PATCH 14/18] target/i386/tcg: kill tmp1_i64 Paolo Bonzini
2025-12-11 16:28 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 15/18] target/i386/tcg: kill tmp2_i32 Paolo Bonzini
2025-12-11 16:29 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 16/18] target/i386/tcg: commonize code to compute SF/ZF/PF Paolo Bonzini
2025-12-11 18:46 ` Richard Henderson
2025-12-12 15:45 ` Paolo Bonzini
2025-12-10 13:16 ` [PATCH 17/18] target/i386/tcg: add a CCOp for SBB x,x Paolo Bonzini
2025-12-11 19:11 ` Richard Henderson
2025-12-10 13:16 ` [PATCH 18/18] target/i386/tcg: move fetch code out of translate.c Paolo Bonzini
2025-12-11 19:29 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).