* [PATCH 00/10] target/i386: make decoding entirely table based
@ 2024-06-20 9:54 Paolo Bonzini
2024-06-20 9:54 ` [PATCH 01/10] target/i386: use cpu_cc_dst for CC_OP_POPCNT Paolo Bonzini
` (9 more replies)
0 siblings, 10 replies; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel
The trailing part of the previous series I sent; with fixes for
BT/BTS/BTR/BTC, plus moving code generation of CMPXCHG8B/CMPXCHG16B to
the new decoder. This way all LOCKable instructions are converted, and
the patch "target/i386: do not check PREFIX_LOCK in old-style decoder"
is correct.
Sneak in a couple cleanups for CC_OP_POPCNT. They don't really make
the generated code any more efficient, but they simplify a bit the
logic for the BT/BTS/BTR/BTC flags.
Supersedes: <20240608084113.2770363-1-pbonzini@redhat.com>
Paolo Bonzini (10):
target/i386: use cpu_cc_dst for CC_OP_POPCNT
target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL
target/i386: convert bit test instructions to new decoder
target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX
target/i386: decode address before going back to translate.c
target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder
target/i386: do not check PREFIX_LOCK in old-style decoder
target/i386: list instructions still in translate.c
target/i386: assert that cc_op* and pc_save are preserved
target/i386: remove gen_ext_tl
target/i386/cpu.h | 13 +-
target/i386/tcg/decode-new.h | 19 +-
target/i386/tcg/cc_helper.c | 2 +-
target/i386/tcg/translate.c | 492 ++++++-------------------------
target/i386/tcg/decode-new.c.inc | 136 ++++++---
target/i386/tcg/emit.c.inc | 249 +++++++++++++++-
6 files changed, 467 insertions(+), 444 deletions(-)
--
2.45.2
^ permalink raw reply [flat|nested] 16+ messages in thread
* [PATCH 01/10] target/i386: use cpu_cc_dst for CC_OP_POPCNT
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 15:02 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 02/10] target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL Paolo Bonzini
` (8 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel
It is the only POPCNT that computes ZF from one of the cc_op_* registers,
but it uses cpu_cc_src instead of cpu_cc_dst like the others. Do not
make it the odd one off.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.h | 2 +-
target/i386/tcg/cc_helper.c | 2 +-
target/i386/tcg/translate.c | 2 +-
target/i386/tcg/emit.c.inc | 4 ++--
4 files changed, 5 insertions(+), 5 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 7e2a9b56aea..f54cd93b3f9 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1332,7 +1332,7 @@ typedef enum {
CC_OP_BMILGQ,
CC_OP_CLR, /* Z set, all other flags clear. */
- CC_OP_POPCNT, /* Z via CC_SRC, all other flags clear. */
+ CC_OP_POPCNT, /* Z via CC_DST, all other flags clear. */
CC_OP_NB,
} CCOp;
diff --git a/target/i386/tcg/cc_helper.c b/target/i386/tcg/cc_helper.c
index f76e9cb8cfb..301ed954064 100644
--- a/target/i386/tcg/cc_helper.c
+++ b/target/i386/tcg/cc_helper.c
@@ -107,7 +107,7 @@ target_ulong helper_cc_compute_all(target_ulong dst, target_ulong src1,
case CC_OP_CLR:
return CC_Z | CC_P;
case CC_OP_POPCNT:
- return src1 ? 0 : CC_Z;
+ return dst ? 0 : CC_Z;
case CC_OP_MULB:
return compute_all_mulb(dst, src1);
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index ad1819815ab..f32cda4e169 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -324,7 +324,7 @@ static const uint8_t cc_op_live[CC_OP_NB] = {
[CC_OP_ADOX] = USES_CC_SRC | USES_CC_SRC2,
[CC_OP_ADCOX] = USES_CC_DST | USES_CC_SRC | USES_CC_SRC2,
[CC_OP_CLR] = 0,
- [CC_OP_POPCNT] = USES_CC_SRC,
+ [CC_OP_POPCNT] = USES_CC_DST,
};
static void set_cc_op_1(DisasContext *s, CCOp op, bool dirty)
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 11faa70b5e2..fc7477833bc 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -2804,10 +2804,10 @@ static void gen_POPA(DisasContext *s, X86DecodedInsn *decode)
static void gen_POPCNT(DisasContext *s, X86DecodedInsn *decode)
{
- decode->cc_src = tcg_temp_new();
+ decode->cc_dst = tcg_temp_new();
decode->cc_op = CC_OP_POPCNT;
- tcg_gen_mov_tl(decode->cc_src, s->T0);
+ tcg_gen_mov_tl(decode->cc_dst, s->T0);
tcg_gen_ctpop_tl(s->T0, s->T0);
}
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 02/10] target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
2024-06-20 9:54 ` [PATCH 01/10] target/i386: use cpu_cc_dst for CC_OP_POPCNT Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 15:10 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 03/10] target/i386: convert bit test instructions to new decoder Paolo Bonzini
` (7 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel
Handle it like the other arithmetic cc_ops. This simplifies a
bit the implementation of bit test instructions.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.h | 13 +++++++++++--
target/i386/tcg/translate.c | 3 +--
2 files changed, 12 insertions(+), 4 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index f54cd93b3f9..8504a7998fd 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1275,6 +1275,7 @@ typedef enum {
CC_OP_ADCX, /* CC_DST = C, CC_SRC = rest. */
CC_OP_ADOX, /* CC_SRC2 = O, CC_SRC = rest. */
CC_OP_ADCOX, /* CC_DST = C, CC_SRC2 = O, CC_SRC = rest. */
+ CC_OP_CLR, /* Z and P set, all other flags clear. */
CC_OP_MULB, /* modify all flags, C, O = (CC_SRC != 0) */
CC_OP_MULW,
@@ -1331,8 +1332,16 @@ typedef enum {
CC_OP_BMILGL,
CC_OP_BMILGQ,
- CC_OP_CLR, /* Z set, all other flags clear. */
- CC_OP_POPCNT, /* Z via CC_DST, all other flags clear. */
+ /*
+ * Note that only CC_OP_POPCNT (i.e. the one with MO_TL size)
+ * is used or implemented, because the translation needs
+ * to zero-extend CC_DST anyway.
+ */
+ CC_OP_POPCNTB__, /* Z via CC_DST, all other flags clear. */
+ CC_OP_POPCNTW__,
+ CC_OP_POPCNTL__,
+ CC_OP_POPCNTQ__,
+ CC_OP_POPCNT = sizeof(target_ulong) == 8 ? CC_OP_POPCNTQ__ : CC_OP_POPCNTL__,
CC_OP_NB,
} CCOp;
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index f32cda4e169..934c514e64f 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1019,8 +1019,6 @@ static CCPrepare gen_prepare_eflags_z(DisasContext *s, TCGv reg)
.imm = CC_Z };
case CC_OP_CLR:
return (CCPrepare) { .cond = TCG_COND_ALWAYS };
- case CC_OP_POPCNT:
- return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src };
default:
{
MemOp size = (s->cc_op - CC_OP_ADDB) & 3;
@@ -3177,6 +3175,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
case CC_OP_SHLB ... CC_OP_SHLQ:
case CC_OP_SARB ... CC_OP_SARQ:
case CC_OP_BMILGB ... CC_OP_BMILGQ:
+ case CC_OP_POPCNT:
/* Z was going to be computed from the non-zero status of CC_DST.
We can get that same Z value (and the new C value) by leaving
CC_DST alone, setting CC_SRC, and using a CC_OP_SAR of the
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 03/10] target/i386: convert bit test instructions to new decoder
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
2024-06-20 9:54 ` [PATCH 01/10] target/i386: use cpu_cc_dst for CC_OP_POPCNT Paolo Bonzini
2024-06-20 9:54 ` [PATCH 02/10] target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 15:22 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 04/10] target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX Paolo Bonzini
` (6 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.h | 3 +
target/i386/tcg/translate.c | 147 +-----------------------------
target/i386/tcg/decode-new.c.inc | 40 ++++++---
target/i386/tcg/emit.c.inc | 149 ++++++++++++++++++++++++++++++-
4 files changed, 181 insertions(+), 158 deletions(-)
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index f9bf9a60411..e4cdf5e3c4f 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -190,6 +190,9 @@ typedef enum X86InsnSpecial {
/* Always locked if it has a memory operand (XCHG) */
X86_SPECIAL_Locked,
+ /* Like HasLock, but also operand 2 provides bit displacement into memory. */
+ X86_SPECIAL_BitTest,
+
/* Do not load effective address in s->A0 */
X86_SPECIAL_NoLoadEA,
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 934c514e64f..257110ac703 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -708,11 +708,6 @@ static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, bool sign)
return dst;
}
-static void gen_exts(MemOp ot, TCGv reg)
-{
- gen_ext_tl(reg, reg, ot, true);
-}
-
static void gen_op_j_ecx(DisasContext *s, TCGCond cond, TCGLabel *label1)
{
TCGv tmp = gen_ext_tl(NULL, cpu_regs[R_ECX], s->aflag, false);
@@ -2985,7 +2980,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
int prefixes = s->prefix;
MemOp dflag = s->dflag;
MemOp ot;
- int modrm, reg, rm, mod, op, val;
+ int modrm, reg, rm, mod, op;
/* now check op code */
switch (b) {
@@ -3051,146 +3046,6 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
}
break;
- /************************/
- /* bit operations */
- case 0x1ba: /* bt/bts/btr/btc Gv, im */
- ot = dflag;
- modrm = x86_ldub_code(env, s);
- op = (modrm >> 3) & 7;
- mod = (modrm >> 6) & 3;
- rm = (modrm & 7) | REX_B(s);
- if (mod != 3) {
- s->rip_offset = 1;
- gen_lea_modrm(env, s, modrm);
- if (!(s->prefix & PREFIX_LOCK)) {
- gen_op_ld_v(s, ot, s->T0, s->A0);
- }
- } else {
- gen_op_mov_v_reg(s, ot, s->T0, rm);
- }
- /* load shift */
- val = x86_ldub_code(env, s);
- tcg_gen_movi_tl(s->T1, val);
- if (op < 4)
- goto unknown_op;
- op -= 4;
- goto bt_op;
- case 0x1a3: /* bt Gv, Ev */
- op = 0;
- goto do_btx;
- case 0x1ab: /* bts */
- op = 1;
- goto do_btx;
- case 0x1b3: /* btr */
- op = 2;
- goto do_btx;
- case 0x1bb: /* btc */
- op = 3;
- do_btx:
- ot = dflag;
- modrm = x86_ldub_code(env, s);
- reg = ((modrm >> 3) & 7) | REX_R(s);
- mod = (modrm >> 6) & 3;
- rm = (modrm & 7) | REX_B(s);
- gen_op_mov_v_reg(s, MO_32, s->T1, reg);
- if (mod != 3) {
- AddressParts a = gen_lea_modrm_0(env, s, modrm);
- /* specific case: we need to add a displacement */
- gen_exts(ot, s->T1);
- tcg_gen_sari_tl(s->tmp0, s->T1, 3 + ot);
- tcg_gen_shli_tl(s->tmp0, s->tmp0, ot);
- tcg_gen_add_tl(s->A0, gen_lea_modrm_1(s, a, false), s->tmp0);
- gen_lea_v_seg(s, s->A0, a.def_seg, s->override);
- if (!(s->prefix & PREFIX_LOCK)) {
- gen_op_ld_v(s, ot, s->T0, s->A0);
- }
- } else {
- gen_op_mov_v_reg(s, ot, s->T0, rm);
- }
- bt_op:
- tcg_gen_andi_tl(s->T1, s->T1, (1 << (3 + ot)) - 1);
- tcg_gen_movi_tl(s->tmp0, 1);
- tcg_gen_shl_tl(s->tmp0, s->tmp0, s->T1);
- if (s->prefix & PREFIX_LOCK) {
- switch (op) {
- case 0: /* bt */
- /* Needs no atomic ops; we suppressed the normal
- memory load for LOCK above so do it now. */
- gen_op_ld_v(s, ot, s->T0, s->A0);
- break;
- case 1: /* bts */
- tcg_gen_atomic_fetch_or_tl(s->T0, s->A0, s->tmp0,
- s->mem_index, ot | MO_LE);
- break;
- case 2: /* btr */
- tcg_gen_not_tl(s->tmp0, s->tmp0);
- tcg_gen_atomic_fetch_and_tl(s->T0, s->A0, s->tmp0,
- s->mem_index, ot | MO_LE);
- break;
- default:
- case 3: /* btc */
- tcg_gen_atomic_fetch_xor_tl(s->T0, s->A0, s->tmp0,
- s->mem_index, ot | MO_LE);
- break;
- }
- tcg_gen_shr_tl(s->tmp4, s->T0, s->T1);
- } else {
- tcg_gen_shr_tl(s->tmp4, s->T0, s->T1);
- switch (op) {
- case 0: /* bt */
- /* Data already loaded; nothing to do. */
- break;
- case 1: /* bts */
- tcg_gen_or_tl(s->T0, s->T0, s->tmp0);
- break;
- case 2: /* btr */
- tcg_gen_andc_tl(s->T0, s->T0, s->tmp0);
- break;
- default:
- case 3: /* btc */
- tcg_gen_xor_tl(s->T0, s->T0, s->tmp0);
- break;
- }
- if (op != 0) {
- if (mod != 3) {
- gen_op_st_v(s, ot, s->T0, s->A0);
- } else {
- gen_op_mov_reg_v(s, ot, rm, s->T0);
- }
- }
- }
-
- /* Delay all CC updates until after the store above. Note that
- C is the result of the test, Z is unchanged, and the others
- are all undefined. */
- switch (s->cc_op) {
- case CC_OP_MULB ... CC_OP_MULQ:
- case CC_OP_ADDB ... CC_OP_ADDQ:
- case CC_OP_ADCB ... CC_OP_ADCQ:
- case CC_OP_SUBB ... CC_OP_SUBQ:
- case CC_OP_SBBB ... CC_OP_SBBQ:
- case CC_OP_LOGICB ... CC_OP_LOGICQ:
- case CC_OP_INCB ... CC_OP_INCQ:
- case CC_OP_DECB ... CC_OP_DECQ:
- case CC_OP_SHLB ... CC_OP_SHLQ:
- case CC_OP_SARB ... CC_OP_SARQ:
- case CC_OP_BMILGB ... CC_OP_BMILGQ:
- case CC_OP_POPCNT:
- /* Z was going to be computed from the non-zero status of CC_DST.
- We can get that same Z value (and the new C value) by leaving
- CC_DST alone, setting CC_SRC, and using a CC_OP_SAR of the
- same width. */
- tcg_gen_mov_tl(cpu_cc_src, s->tmp4);
- set_cc_op(s, ((s->cc_op - CC_OP_MULB) & 3) + CC_OP_SARB);
- break;
- default:
- /* Otherwise, generate EFLAGS and replace the C bit. */
- gen_compute_eflags(s);
- tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src, s->tmp4,
- ctz32(CC_C), 1);
- break;
- }
- break;
case 0x100:
modrm = x86_ldub_code(env, s);
mod = (modrm >> 6) & 3;
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 0d846c32c22..33ffcf092ec 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -205,6 +205,7 @@
#define sextT0 .special = X86_SPECIAL_SExtT0,
#define zextT0 .special = X86_SPECIAL_ZExtT0,
#define op0_Mw .special = X86_SPECIAL_Op0_Mw,
+#define btEvGv .special = X86_SPECIAL_BitTest,
#define vex1 .vex_class = 1,
#define vex1_rep3 .vex_class = 1, .vex_special = X86_VEX_REPScalar,
@@ -269,6 +270,24 @@ static inline const X86OpEntry *decode_by_prefix(DisasContext *s, const X86OpEnt
}
}
+static void decode_group8(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
+{
+ static const X86GenFunc group8_gen[8] = {
+ NULL, NULL, NULL, NULL,
+ gen_BT, gen_BTS, gen_BTR, gen_BTC,
+ };
+ int op = (get_modrm(s, env) >> 3) & 7;
+ entry->gen = group8_gen[op];
+ if (op == 4) {
+ /* prevent writeback and LOCK for BT */
+ entry->op1 = entry->op0;
+ entry->op0 = X86_TYPE_None;
+ entry->s0 = X86_SIZE_None;
+ } else {
+ entry->special = X86_SPECIAL_HasLock;
+ }
+}
+
static void decode_group15(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
{
static const X86OpEntry group15_reg[8] = {
@@ -1162,12 +1181,14 @@ static const X86OpEntry opcodes_0F[256] = {
[0xa0] = X86_OP_ENTRYr(PUSH, FS, w),
[0xa1] = X86_OP_ENTRYw(POP, FS, w),
[0xa2] = X86_OP_ENTRY0(CPUID),
+ [0xa3] = X86_OP_ENTRYrr(BT, E,v, G,v, btEvGv),
[0xa4] = X86_OP_ENTRY4(SHLD, E,v, 2op,v, G,v),
[0xa5] = X86_OP_ENTRY3(SHLD, E,v, 2op,v, G,v),
[0xb0] = X86_OP_ENTRY2(CMPXCHG,E,b, G,b, lock),
[0xb1] = X86_OP_ENTRY2(CMPXCHG,E,v, G,v, lock),
[0xb2] = X86_OP_ENTRY3(LSS, G,v, EM,p, None, None),
+ [0xb3] = X86_OP_ENTRY2(BTR, E,v, G,v, btEvGv),
[0xb4] = X86_OP_ENTRY3(LFS, G,v, EM,p, None, None),
[0xb5] = X86_OP_ENTRY3(LGS, G,v, EM,p, None, None),
[0xb6] = X86_OP_ENTRY3(MOV, G,v, E,b, None, None, zextT0), /* MOVZX */
@@ -1294,6 +1315,7 @@ static const X86OpEntry opcodes_0F[256] = {
[0xa8] = X86_OP_ENTRYr(PUSH, GS, w),
[0xa9] = X86_OP_ENTRYw(POP, GS, w),
[0xaa] = X86_OP_ENTRY0(RSM, chk(smm) svm(RSM)),
+ [0xab] = X86_OP_ENTRY2(BTS, E,v, G,v, btEvGv),
[0xac] = X86_OP_ENTRY4(SHRD, E,v, 2op,v, G,v),
[0xad] = X86_OP_ENTRY3(SHRD, E,v, 2op,v, G,v),
[0xae] = X86_OP_GROUP0(group15),
@@ -1306,6 +1328,8 @@ static const X86OpEntry opcodes_0F[256] = {
[0xb8] = X86_OP_GROUP0(0FB8),
/* decoded as modrm, which is visible as a difference between page fault and #UD */
[0xb9] = X86_OP_ENTRYr(UD, nop,v), /* UD1 */
+ [0xba] = X86_OP_GROUP2(group8, E,v, I,b),
+ [0xbb] = X86_OP_ENTRY2(BTC, E,v, G,v, btEvGv),
[0xbc] = X86_OP_GROUP0(0FBC),
[0xbd] = X86_OP_GROUP0(0FBD),
[0xbe] = X86_OP_ENTRY3(MOV, G,v, E,b, None, None, sextT0), /* MOVSX */
@@ -2424,6 +2448,7 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
CPUX86State *env = cpu_env(cpu);
X86DecodedInsn decode;
X86DecodeFunc decode_func = decode_root;
+ bool accept_lock = false;
uint8_t cc_live, b;
s->pc = s->base.pc_next;
@@ -2597,10 +2622,6 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
switch (b) {
case 0x00 ... 0x01: /* mostly privileged instructions */
case 0x1a ... 0x1b: /* MPX */
- case 0xa3: /* bt */
- case 0xab: /* bts */
- case 0xb3: /* btr */
- case 0xba ... 0xbb: /* grp8, btc */
case 0xc7: /* grp9 */
disas_insn_old(s, cpu, b + 0x100);
return;
@@ -2662,9 +2683,10 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
if (decode.op[0].has_ea) {
s->prefix |= PREFIX_LOCK;
}
- decode.e.special = X86_SPECIAL_HasLock;
/* fallthrough */
case X86_SPECIAL_HasLock:
+ case X86_SPECIAL_BitTest:
+ accept_lock = decode.op[0].has_ea;
break;
case X86_SPECIAL_Op0_Rd:
@@ -2706,10 +2728,8 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
break;
}
- if (s->prefix & PREFIX_LOCK) {
- if (decode.e.special != X86_SPECIAL_HasLock || !decode.op[0].has_ea) {
- goto illegal_op;
- }
+ if ((s->prefix & PREFIX_LOCK) && !accept_lock) {
+ goto illegal_op;
}
if (!validate_vex(s, &decode)) {
@@ -2755,7 +2775,7 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
if (decode.e.special != X86_SPECIAL_NoLoadEA &&
(decode.op[0].has_ea || decode.op[1].has_ea || decode.op[2].has_ea)) {
- gen_load_ea(s, &decode.mem, decode.e.vex_class == 12);
+ gen_load_ea(s, &decode);
}
if (s->prefix & PREFIX_LOCK) {
gen_load(s, &decode, 2, s->T1);
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index fc7477833bc..a25dff0570c 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -73,9 +73,25 @@ static void gen_NM_exception(DisasContext *s)
gen_exception(s, EXCP07_PREX);
}
-static void gen_load_ea(DisasContext *s, AddressParts *mem, bool is_vsib)
+static void gen_load_ea(DisasContext *s, X86DecodedInsn *decode)
{
- TCGv ea = gen_lea_modrm_1(s, *mem, is_vsib);
+ AddressParts *mem = &decode->mem;
+ TCGv ea;
+
+ ea = gen_lea_modrm_1(s, *mem, decode->e.vex_class == 12);
+ if (decode->e.special == X86_SPECIAL_BitTest) {
+ MemOp ot = decode->op[1].ot;
+ int poslen = 8 << ot;
+ TCGv ofs = tcg_temp_new();
+
+ /* Extract memory displacement from T1. */
+ assert(decode->op[2].unit == X86_OP_INT);
+ tcg_gen_sextract_tl(ofs, s->T1, 3, poslen - 3);
+ tcg_gen_andi_tl(ofs, ofs, -1 << ot);
+ tcg_gen_add_tl(s->A0, ea, ofs);
+ ea = s->A0;
+ }
+
gen_lea_v_seg(s, ea, mem->def_seg, s->override);
}
@@ -407,6 +423,32 @@ static void prepare_update3_cc(X86DecodedInsn *decode, DisasContext *s, CCOp op,
decode->cc_op = op;
}
+/* Set up decode->cc_* to modify CF while keeping other flags unchanged. */
+static void prepare_update_cf(X86DecodedInsn *decode, DisasContext *s, TCGv cf)
+{
+ switch (s->cc_op) {
+ case CC_OP_ADOX:
+ case CC_OP_ADCOX:
+ decode->cc_src2 = cpu_cc_src2;
+ decode->cc_src = cpu_cc_src;
+ decode->cc_op = CC_OP_ADCOX;
+ break;
+
+ case CC_OP_EFLAGS:
+ case CC_OP_ADCX:
+ decode->cc_src = cpu_cc_src;
+ decode->cc_op = CC_OP_ADCX;
+ break;
+
+ default:
+ decode->cc_src = tcg_temp_new();
+ gen_mov_eflags(s, decode->cc_src);
+ decode->cc_op = CC_OP_ADCX;
+ break;
+ }
+ decode->cc_dst = cf;
+}
+
static void gen_store_sse(DisasContext *s, X86DecodedInsn *decode, int src_ofs)
{
MemOp ot = decode->op[0].ot;
@@ -1385,6 +1427,109 @@ static void gen_BSWAP(DisasContext *s, X86DecodedInsn *decode)
tcg_gen_bswap32_tl(s->T0, s->T0, TCG_BSWAP_OZ);
}
+static TCGv gen_bt_mask(DisasContext *s, X86DecodedInsn *decode)
+{
+ MemOp ot = decode->op[1].ot;
+ TCGv mask = tcg_temp_new();
+
+ tcg_gen_andi_tl(s->T1, s->T1, (8 << ot) - 1);
+ tcg_gen_shl_tl(mask, tcg_constant_tl(1), s->T1);
+ return mask;
+}
+
+/* Expects truncated bit index in s->T1, 1 << s->T1 in MASK. */
+static void gen_bt_flags(DisasContext *s, X86DecodedInsn *decode, TCGv src, TCGv mask)
+{
+ TCGv cf;
+
+ /*
+ * C is the result of the test, Z is unchanged, and the others
+ * are all undefined.
+ */
+ switch (s->cc_op) {
+ case CC_OP_DYNAMIC:
+ case CC_OP_CLR:
+ case CC_OP_EFLAGS:
+ case CC_OP_ADCX:
+ case CC_OP_ADOX:
+ case CC_OP_ADCOX:
+ /* Generate EFLAGS and replace the C bit. */
+ cf = tcg_temp_new();
+ tcg_gen_setcond_tl(TCG_COND_TSTNE, cf, src, mask);
+ prepare_update_cf(decode, s, cf);
+ break;
+ default:
+ /*
+ * Z was going to be computed from the non-zero status of CC_DST.
+ * We can get that same Z value (and the new C value) by leaving
+ * CC_DST alone, setting CC_SRC, and using a CC_OP_SAR of the
+ * same width.
+ */
+ decode->cc_src = tcg_temp_new();
+ decode->cc_dst = cpu_cc_dst;
+ decode->cc_op = ((s->cc_op - CC_OP_MULB) & 3) + CC_OP_SARB;
+ tcg_gen_shr_tl(decode->cc_src, src, s->T1);
+ break;
+ }
+}
+
+static void gen_BT(DisasContext *s, X86DecodedInsn *decode)
+{
+ TCGv mask = gen_bt_mask(s, decode);
+
+ gen_bt_flags(s, decode, s->T0, mask);
+}
+
+static void gen_BTC(DisasContext *s, X86DecodedInsn *decode)
+{
+ MemOp ot = decode->op[0].ot;
+ TCGv old = tcg_temp_new();
+ TCGv mask = gen_bt_mask(s, decode);
+
+ if (s->prefix & PREFIX_LOCK) {
+ tcg_gen_atomic_fetch_xor_tl(old, s->A0, mask, s->mem_index, ot | MO_LE);
+ } else {
+ tcg_gen_mov_tl(old, s->T0);
+ tcg_gen_xor_tl(s->T0, s->T0, mask);
+ }
+
+ gen_bt_flags(s, decode, old, mask);
+}
+
+static void gen_BTR(DisasContext *s, X86DecodedInsn *decode)
+{
+ MemOp ot = decode->op[0].ot;
+ TCGv old = tcg_temp_new();
+ TCGv mask = gen_bt_mask(s, decode);
+
+ if (s->prefix & PREFIX_LOCK) {
+ TCGv maskc = tcg_temp_new();
+ tcg_gen_not_tl(maskc, mask);
+ tcg_gen_atomic_fetch_and_tl(old, s->A0, maskc, s->mem_index, ot | MO_LE);
+ } else {
+ tcg_gen_mov_tl(old, s->T0);
+ tcg_gen_andc_tl(s->T0, s->T0, mask);
+ }
+
+ gen_bt_flags(s, decode, old, mask);
+}
+
+static void gen_BTS(DisasContext *s, X86DecodedInsn *decode)
+{
+ MemOp ot = decode->op[0].ot;
+ TCGv old = tcg_temp_new();
+ TCGv mask = gen_bt_mask(s, decode);
+
+ if (s->prefix & PREFIX_LOCK) {
+ tcg_gen_atomic_fetch_or_tl(old, s->A0, mask, s->mem_index, ot | MO_LE);
+ } else {
+ tcg_gen_mov_tl(old, s->T0);
+ tcg_gen_or_tl(s->T0, s->T0, mask);
+ }
+
+ gen_bt_flags(s, decode, old, mask);
+}
+
static void gen_BZHI(DisasContext *s, X86DecodedInsn *decode)
{
MemOp ot = decode->op[0].ot;
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 04/10] target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
` (2 preceding siblings ...)
2024-06-20 9:54 ` [PATCH 03/10] target/i386: convert bit test instructions to new decoder Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 15:56 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 05/10] target/i386: decode address before going back to translate.c Paolo Bonzini
` (5 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel
When computing the "other" flag (CF for CC_OP_ADOX, OF for CC_OP_ADCX),
take into account that it is already in the right position of cpu_cc_src,
just like for CC_OP_EFLAGS. There is no need to call gen_compute_eflags().
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 257110ac703..08db40681fa 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -928,6 +928,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv reg)
.no_setcond = true };
case CC_OP_EFLAGS:
+ case CC_OP_ADOX:
case CC_OP_SARB ... CC_OP_SARQ:
/* CC_SRC & 1 */
return (CCPrepare) { .cond = TCG_COND_TSTNE,
@@ -994,6 +995,9 @@ static CCPrepare gen_prepare_eflags_o(DisasContext *s, TCGv reg)
return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_src };
default:
gen_compute_eflags(s);
+ /* fallthrough */
+ case CC_OP_EFLAGS:
+ case CC_OP_ADCX:
return (CCPrepare) { .cond = TCG_COND_TSTNE, .reg = cpu_cc_src,
.imm = CC_O };
}
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 05/10] target/i386: decode address before going back to translate.c
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
` (3 preceding siblings ...)
2024-06-20 9:54 ` [PATCH 04/10] target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 9:54 ` [PATCH 06/10] target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder Paolo Bonzini
` (4 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
There are now relatively few unconverted opcodes in translate.c (there
are 13 of them including 8 for x87), and all of them have the same
format with a mod/rm byte and no immediate. A good next step is
to remove the early bail out to disas_insn_x87/disas_insn_old,
instead giving these legacy translator functions the same prototype
as the other gen_* functions.
To do this, the X86DecodeInsn can be passed down to the places that
used to fetch address bytes from the instruction stream. To make
sure that everything is done cleanly, the CPUX86State* argument is
removed.
As part of the unification, the gen_lea_modrm() name is now free,
so rename gen_load_ea() to gen_lea_modrm(). This is as good a name
and it makes the changes to translate.c easier to review.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.h | 14 ++-
target/i386/tcg/translate.c | 152 +++++++++++++------------------
target/i386/tcg/decode-new.c.inc | 53 ++++++-----
target/i386/tcg/emit.c.inc | 2 +-
4 files changed, 103 insertions(+), 118 deletions(-)
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index e4cdf5e3c4f..bebc77bd54b 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -264,12 +264,13 @@ typedef enum X86VEXSpecial {
typedef struct X86OpEntry X86OpEntry;
typedef struct X86DecodedInsn X86DecodedInsn;
+struct DisasContext;
/* Decode function for multibyte opcodes. */
-typedef void (*X86DecodeFunc)(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b);
+typedef void (*X86DecodeFunc)(struct DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b);
/* Code generation function. */
-typedef void (*X86GenFunc)(DisasContext *s, X86DecodedInsn *decode);
+typedef void (*X86GenFunc)(struct DisasContext *s, X86DecodedInsn *decode);
struct X86OpEntry {
/* Based on the is_decode flags. */
@@ -316,6 +317,14 @@ typedef struct X86DecodedOp {
};
} X86DecodedOp;
+typedef struct AddressParts {
+ int def_seg;
+ int base;
+ int index;
+ int scale;
+ target_long disp;
+} AddressParts;
+
struct X86DecodedInsn {
X86OpEntry e;
X86DecodedOp op[3];
@@ -333,3 +342,4 @@ struct X86DecodedInsn {
uint8_t b;
};
+static void gen_lea_modrm(struct DisasContext *s, X86DecodedInsn *decode);
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 08db40681fa..1d845ff66bb 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -29,6 +29,7 @@
#include "exec/helper-proto.h"
#include "exec/helper-gen.h"
#include "helper-tcg.h"
+#include "decode-new.h"
#include "exec/log.h"
@@ -1529,14 +1530,6 @@ static inline uint64_t x86_ldq_code(CPUX86State *env, DisasContext *s)
/* Decompose an address. */
-typedef struct AddressParts {
- int def_seg;
- int base;
- int index;
- int scale;
- target_long disp;
-} AddressParts;
-
static AddressParts gen_lea_modrm_0(CPUX86State *env, DisasContext *s,
int modrm)
{
@@ -1695,24 +1688,11 @@ static TCGv gen_lea_modrm_1(DisasContext *s, AddressParts a, bool is_vsib)
return ea;
}
-static void gen_lea_modrm(CPUX86State *env, DisasContext *s, int modrm)
-{
- AddressParts a = gen_lea_modrm_0(env, s, modrm);
- TCGv ea = gen_lea_modrm_1(s, a, false);
- gen_lea_v_seg(s, ea, a.def_seg, s->override);
-}
-
-static void gen_nop_modrm(CPUX86State *env, DisasContext *s, int modrm)
-{
- (void)gen_lea_modrm_0(env, s, modrm);
-}
-
/* Used for BNDCL, BNDCU, BNDCN. */
-static void gen_bndck(CPUX86State *env, DisasContext *s, int modrm,
+static void gen_bndck(DisasContext *s, X86DecodedInsn *decode,
TCGCond cond, TCGv_i64 bndv)
{
- AddressParts a = gen_lea_modrm_0(env, s, modrm);
- TCGv ea = gen_lea_modrm_1(s, a, false);
+ TCGv ea = gen_lea_modrm_1(s, decode->mem, false);
tcg_gen_extu_tl_i64(s->tmp1_i64, ea);
if (!CODE64(s)) {
@@ -1724,8 +1704,9 @@ static void gen_bndck(CPUX86State *env, DisasContext *s, int modrm,
}
/* generate modrm load of memory or register. */
-static void gen_ld_modrm(CPUX86State *env, DisasContext *s, int modrm, MemOp ot)
+static void gen_ld_modrm(DisasContext *s, X86DecodedInsn *decode, MemOp ot)
{
+ int modrm = s->modrm;
int mod, rm;
mod = (modrm >> 6) & 3;
@@ -1733,14 +1714,15 @@ static void gen_ld_modrm(CPUX86State *env, DisasContext *s, int modrm, MemOp ot)
if (mod == 3) {
gen_op_mov_v_reg(s, ot, s->T0, rm);
} else {
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
gen_op_ld_v(s, ot, s->T0, s->A0);
}
}
/* generate modrm store of memory or register. */
-static void gen_st_modrm(CPUX86State *env, DisasContext *s, int modrm, MemOp ot)
+static void gen_st_modrm(DisasContext *s, X86DecodedInsn *decode, MemOp ot)
{
+ int modrm = s->modrm;
int mod, rm;
mod = (modrm >> 6) & 3;
@@ -1748,7 +1730,7 @@ static void gen_st_modrm(CPUX86State *env, DisasContext *s, int modrm, MemOp ot)
if (mod == 3) {
gen_op_mov_reg_v(s, ot, rm, s->T0);
} else {
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
gen_op_st_v(s, ot, s->T0, s->A0);
}
}
@@ -2316,12 +2298,12 @@ static void gen_sty_env_A0(DisasContext *s, int offset, bool align)
tcg_gen_qemu_st_i128(t, s->tmp0, mem_index, mop);
}
-static void gen_cmpxchg8b(DisasContext *s, CPUX86State *env, int modrm)
+static void gen_cmpxchg8b(DisasContext *s, X86DecodedInsn *decode)
{
TCGv_i64 cmp, val, old;
TCGv Z;
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
cmp = tcg_temp_new_i64();
val = tcg_temp_new_i64();
@@ -2370,13 +2352,13 @@ static void gen_cmpxchg8b(DisasContext *s, CPUX86State *env, int modrm)
}
#ifdef TARGET_X86_64
-static void gen_cmpxchg16b(DisasContext *s, CPUX86State *env, int modrm)
+static void gen_cmpxchg16b(DisasContext *s, X86DecodedInsn *decode)
{
MemOp mop = MO_TE | MO_128 | MO_ALIGN;
TCGv_i64 t0, t1;
TCGv_i128 cmp, val;
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
cmp = tcg_temp_new_i128();
val = tcg_temp_new_i128();
@@ -2414,31 +2396,32 @@ static void gen_cmpxchg16b(DisasContext *s, CPUX86State *env, int modrm)
}
#endif
-static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
+#include "emit.c.inc"
+
+static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
{
- CPUX86State *env = cpu_env(cpu);
bool update_fip = true;
- int modrm, mod, rm, op;
+ int b = decode->b;
+ int modrm = s->modrm;
+ int mod, rm, op;
if (s->flags & (HF_EM_MASK | HF_TS_MASK)) {
/* if CR0.EM or CR0.TS are set, generate an FPU exception */
/* XXX: what to do if illegal op ? */
gen_exception(s, EXCP07_PREX);
- return true;
+ return;
}
- modrm = x86_ldub_code(env, s);
mod = (modrm >> 6) & 3;
rm = modrm & 7;
op = ((b & 7) << 3) | ((modrm >> 3) & 7);
if (mod != 3) {
/* memory op */
- AddressParts a = gen_lea_modrm_0(env, s, modrm);
- TCGv ea = gen_lea_modrm_1(s, a, false);
+ TCGv ea = gen_lea_modrm_1(s, decode->mem, false);
TCGv last_addr = tcg_temp_new();
bool update_fdp = true;
tcg_gen_mov_tl(last_addr, ea);
- gen_lea_v_seg(s, ea, a.def_seg, s->override);
+ gen_lea_v_seg(s, ea, decode->mem.def_seg, s->override);
switch (op) {
case 0x00 ... 0x07: /* fxxxs */
@@ -2628,11 +2611,11 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
gen_helper_fpop(tcg_env);
break;
default:
- return false;
+ goto illegal_op;
}
if (update_fdp) {
- int last_seg = s->override >= 0 ? s->override : a.def_seg;
+ int last_seg = s->override >= 0 ? s->override : decode->mem.def_seg;
tcg_gen_ld_i32(s->tmp2_i32, tcg_env,
offsetof(CPUX86State,
@@ -2669,7 +2652,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
update_fip = false;
break;
default:
- return false;
+ goto illegal_op;
}
break;
case 0x0c: /* grp d9/4 */
@@ -2688,7 +2671,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
gen_helper_fxam_ST0(tcg_env);
break;
default:
- return false;
+ goto illegal_op;
}
break;
case 0x0d: /* grp d9/5 */
@@ -2723,7 +2706,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
gen_helper_fldz_ST0(tcg_env);
break;
default:
- return false;
+ goto illegal_op;
}
}
break;
@@ -2825,7 +2808,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
gen_helper_fpop(tcg_env);
break;
default:
- return false;
+ goto illegal_op;
}
break;
case 0x1c:
@@ -2845,7 +2828,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
case 4: /* fsetpm (287 only, just do nop here) */
break;
default:
- return false;
+ goto illegal_op;
}
break;
case 0x1d: /* fucomi */
@@ -2897,7 +2880,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
gen_helper_fpop(tcg_env);
break;
default:
- return false;
+ goto illegal_op;
}
break;
case 0x38: /* ffreep sti, undocumented op */
@@ -2912,7 +2895,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
gen_op_mov_reg_v(s, MO_16, R_EAX, s->T0);
break;
default:
- return false;
+ goto illegal_op;
}
break;
case 0x3d: /* fucomip */
@@ -2959,7 +2942,7 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
}
break;
default:
- return false;
+ goto illegal_op;
}
}
@@ -2971,25 +2954,24 @@ static bool disas_insn_x87(DisasContext *s, CPUState *cpu, int b)
tcg_gen_st_tl(eip_cur_tl(s),
tcg_env, offsetof(CPUX86State, fpip));
}
- return true;
+ return;
illegal_op:
gen_illegal_opcode(s);
- return true;
}
-static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
+static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
{
- CPUX86State *env = cpu_env(cpu);
int prefixes = s->prefix;
MemOp dflag = s->dflag;
+ int b = decode->b + 0x100;
+ int modrm = s->modrm;
MemOp ot;
- int modrm, reg, rm, mod, op;
+ int reg, rm, mod, op;
/* now check op code */
switch (b) {
case 0x1c7: /* cmpxchg8b */
- modrm = x86_ldub_code(env, s);
mod = (modrm >> 6) & 3;
switch ((modrm >> 3) & 7) {
case 1: /* CMPXCHG8, CMPXCHG16 */
@@ -3001,14 +2983,14 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
if (!(s->cpuid_ext_features & CPUID_EXT_CX16)) {
goto illegal_op;
}
- gen_cmpxchg16b(s, env, modrm);
+ gen_cmpxchg16b(s, decode);
break;
}
#endif
if (!(s->cpuid_features & CPUID_CX8)) {
goto illegal_op;
}
- gen_cmpxchg8b(s, env, modrm);
+ gen_cmpxchg8b(s, decode);
break;
case 7: /* RDSEED, RDPID with f3 prefix */
@@ -3051,7 +3033,6 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
break;
case 0x100:
- modrm = x86_ldub_code(env, s);
mod = (modrm >> 6) & 3;
op = (modrm >> 3) & 7;
switch(op) {
@@ -3065,14 +3046,14 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
tcg_gen_ld32u_tl(s->T0, tcg_env,
offsetof(CPUX86State, ldt.selector));
ot = mod == 3 ? dflag : MO_16;
- gen_st_modrm(env, s, modrm, ot);
+ gen_st_modrm(s, decode, ot);
break;
case 2: /* lldt */
if (!PE(s) || VM86(s))
goto illegal_op;
if (check_cpl0(s)) {
gen_svm_check_intercept(s, SVM_EXIT_LDTR_WRITE);
- gen_ld_modrm(env, s, modrm, MO_16);
+ gen_ld_modrm(s, decode, MO_16);
tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
gen_helper_lldt(tcg_env, s->tmp2_i32);
}
@@ -3087,14 +3068,14 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
tcg_gen_ld32u_tl(s->T0, tcg_env,
offsetof(CPUX86State, tr.selector));
ot = mod == 3 ? dflag : MO_16;
- gen_st_modrm(env, s, modrm, ot);
+ gen_st_modrm(s, decode, ot);
break;
case 3: /* ltr */
if (!PE(s) || VM86(s))
goto illegal_op;
if (check_cpl0(s)) {
gen_svm_check_intercept(s, SVM_EXIT_TR_WRITE);
- gen_ld_modrm(env, s, modrm, MO_16);
+ gen_ld_modrm(s, decode, MO_16);
tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
gen_helper_ltr(tcg_env, s->tmp2_i32);
}
@@ -3103,7 +3084,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
case 5: /* verw */
if (!PE(s) || VM86(s))
goto illegal_op;
- gen_ld_modrm(env, s, modrm, MO_16);
+ gen_ld_modrm(s, decode, MO_16);
gen_update_cc_op(s);
if (op == 4) {
gen_helper_verr(tcg_env, s->T0);
@@ -3113,19 +3094,18 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
assume_cc_op(s, CC_OP_EFLAGS);
break;
default:
- goto unknown_op;
+ goto illegal_op;
}
break;
case 0x101:
- modrm = x86_ldub_code(env, s);
switch (modrm) {
CASE_MODRM_MEM_OP(0): /* sgdt */
if (s->flags & HF_UMIP_MASK && !check_cpl0(s)) {
break;
}
gen_svm_check_intercept(s, SVM_EXIT_GDTR_READ);
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
tcg_gen_ld32u_tl(s->T0,
tcg_env, offsetof(CPUX86State, gdt.limit));
gen_op_st_v(s, MO_16, s->T0, s->A0);
@@ -3181,7 +3161,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
break;
}
gen_svm_check_intercept(s, SVM_EXIT_IDTR_READ);
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
tcg_gen_ld32u_tl(s->T0, tcg_env, offsetof(CPUX86State, idt.limit));
gen_op_st_v(s, MO_16, s->T0, s->A0);
gen_add_A0_im(s, 2);
@@ -3331,7 +3311,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
break;
}
gen_svm_check_intercept(s, SVM_EXIT_GDTR_WRITE);
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
gen_op_ld_v(s, MO_16, s->T1, s->A0);
gen_add_A0_im(s, 2);
gen_op_ld_v(s, CODE64(s) + MO_32, s->T0, s->A0);
@@ -3347,7 +3327,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
break;
}
gen_svm_check_intercept(s, SVM_EXIT_IDTR_WRITE);
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
gen_op_ld_v(s, MO_16, s->T1, s->A0);
gen_add_A0_im(s, 2);
gen_op_ld_v(s, CODE64(s) + MO_32, s->T0, s->A0);
@@ -3371,7 +3351,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
*/
mod = (modrm >> 6) & 3;
ot = (mod != 3 ? MO_16 : s->dflag);
- gen_st_modrm(env, s, modrm, ot);
+ gen_st_modrm(s, decode, ot);
break;
case 0xee: /* rdpkru */
if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
@@ -3398,7 +3378,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
break;
}
gen_svm_check_intercept(s, SVM_EXIT_WRITE_CR0);
- gen_ld_modrm(env, s, modrm, MO_16);
+ gen_ld_modrm(s, decode, MO_16);
/*
* Only the 4 lower bits of CR0 are modified.
* PE cannot be set to zero if already set to one.
@@ -3416,7 +3396,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
break;
}
gen_svm_check_intercept(s, SVM_EXIT_INVLPG);
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
gen_helper_flush_page(tcg_env, s->A0);
s->base.is_jmp = DISAS_EOB_NEXT;
break;
@@ -3449,12 +3429,11 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
break;
default:
- goto unknown_op;
+ goto illegal_op;
}
break;
case 0x11a:
- modrm = x86_ldub_code(env, s);
if (s->flags & HF_MPX_EN_MASK) {
mod = (modrm >> 6) & 3;
reg = ((modrm >> 3) & 7) | REX_R(s);
@@ -3465,7 +3444,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
|| s->aflag == MO_16) {
goto illegal_op;
}
- gen_bndck(env, s, modrm, TCG_COND_LTU, cpu_bndl[reg]);
+ gen_bndck(s, decode, TCG_COND_LTU, cpu_bndl[reg]);
} else if (prefixes & PREFIX_REPNZ) {
/* bndcu */
if (reg >= 4
@@ -3475,7 +3454,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
}
TCGv_i64 notu = tcg_temp_new_i64();
tcg_gen_not_i64(notu, cpu_bndu[reg]);
- gen_bndck(env, s, modrm, TCG_COND_GTU, notu);
+ gen_bndck(s, decode, TCG_COND_GTU, notu);
} else if (prefixes & PREFIX_DATA) {
/* bndmov -- from reg/mem */
if (reg >= 4 || s->aflag == MO_16) {
@@ -3491,7 +3470,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
tcg_gen_mov_i64(cpu_bndu[reg], cpu_bndu[reg2]);
}
} else {
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
if (CODE64(s)) {
tcg_gen_qemu_ld_i64(cpu_bndl[reg], s->A0,
s->mem_index, MO_LEUQ);
@@ -3510,7 +3489,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
}
} else if (mod != 3) {
/* bndldx */
- AddressParts a = gen_lea_modrm_0(env, s, modrm);
+ AddressParts a = decode->mem;
if (reg >= 4
|| (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16
@@ -3540,10 +3519,8 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
gen_set_hflag(s, HF_MPX_IU_MASK);
}
}
- gen_nop_modrm(env, s, modrm);
break;
case 0x11b:
- modrm = x86_ldub_code(env, s);
if (s->flags & HF_MPX_EN_MASK) {
mod = (modrm >> 6) & 3;
reg = ((modrm >> 3) & 7) | REX_R(s);
@@ -3554,7 +3531,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
|| s->aflag == MO_16) {
goto illegal_op;
}
- AddressParts a = gen_lea_modrm_0(env, s, modrm);
+ AddressParts a = decode->mem;
if (a.base >= 0) {
tcg_gen_extu_tl_i64(cpu_bndl[reg], cpu_regs[a.base]);
if (!CODE64(s)) {
@@ -3567,7 +3544,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
/* rip-relative generates #ud */
goto illegal_op;
}
- tcg_gen_not_tl(s->A0, gen_lea_modrm_1(s, a, false));
+ tcg_gen_not_tl(s->A0, gen_lea_modrm_1(s, decode->mem, false));
if (!CODE64(s)) {
tcg_gen_ext32u_tl(s->A0, s->A0);
}
@@ -3582,7 +3559,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
|| s->aflag == MO_16) {
goto illegal_op;
}
- gen_bndck(env, s, modrm, TCG_COND_GTU, cpu_bndu[reg]);
+ gen_bndck(s, decode, TCG_COND_GTU, cpu_bndu[reg]);
} else if (prefixes & PREFIX_DATA) {
/* bndmov -- to reg/mem */
if (reg >= 4 || s->aflag == MO_16) {
@@ -3598,7 +3575,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
tcg_gen_mov_i64(cpu_bndu[reg2], cpu_bndu[reg]);
}
} else {
- gen_lea_modrm(env, s, modrm);
+ gen_lea_modrm(s, decode);
if (CODE64(s)) {
tcg_gen_qemu_st_i64(cpu_bndl[reg], s->A0,
s->mem_index, MO_LEUQ);
@@ -3615,7 +3592,7 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
}
} else if (mod != 3) {
/* bndstx */
- AddressParts a = gen_lea_modrm_0(env, s, modrm);
+ AddressParts a = decode->mem;
if (reg >= 4
|| (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16
@@ -3642,7 +3619,6 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
}
}
}
- gen_nop_modrm(env, s, modrm);
break;
default:
g_assert_not_reached();
@@ -3651,12 +3627,8 @@ static void disas_insn_old(DisasContext *s, CPUState *cpu, int b)
illegal_op:
gen_illegal_opcode(s);
return;
- unknown_op:
- gen_unknown_opcode(env, s);
}
-#include "decode-new.h"
-#include "emit.c.inc"
#include "decode-new.c.inc"
void tcg_x86_init(void)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 33ffcf092ec..45f4aed4611 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1092,6 +1092,8 @@ static void decode_MOV_CR_DR(DisasContext *s, CPUX86State *env, X86OpEntry *entr
}
static const X86OpEntry opcodes_0F[256] = {
+ [0x00] = X86_OP_ENTRY1(multi0F, nop,v, nolea), /* unconverted */
+ [0x01] = X86_OP_ENTRY1(multi0F, nop,v, nolea), /* unconverted */
[0x02] = X86_OP_ENTRYwr(LAR, G,v, E,w, chk(prot)),
[0x03] = X86_OP_ENTRYwr(LSL, G,v, E,w, chk(prot)),
[0x05] = X86_OP_ENTRY0(SYSCALL, chk(o64_intel)),
@@ -1201,6 +1203,7 @@ static const X86OpEntry opcodes_0F[256] = {
[0xc4] = X86_OP_ENTRY4(PINSRW, V,dq,H,dq,E,w, vex5 mmx p_00_66),
[0xc5] = X86_OP_ENTRY3(PEXTRW, G,d, U,dq,I,b, vex5 mmx p_00_66),
[0xc6] = X86_OP_ENTRY4(VSHUF, V,x, H,x, W,x, vex4 p_00_66),
+ [0xc7] = X86_OP_ENTRY1(multi0F, nop,v, nolea), /* unconverted */
[0xd0] = X86_OP_ENTRY3(VADDSUB, V,x, H,x, W,x, vex2 cpuid(SSE3) p_66_f2),
[0xd1] = X86_OP_ENTRY3(PSRLW_r, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66),
@@ -1243,6 +1246,8 @@ static const X86OpEntry opcodes_0F[256] = {
[0x18] = X86_OP_ENTRY1(NOP, nop,v), /* prefetch/reserved NOP */
[0x19] = X86_OP_ENTRY1(NOP, nop,v), /* reserved NOP */
+ [0x1a] = X86_OP_ENTRY1(multi0F, nop,v, nolea), /* unconverted MPX */
+ [0x1b] = X86_OP_ENTRY1(multi0F, nop,v, nolea), /* unconverted MPX */
[0x1c] = X86_OP_ENTRY1(NOP, nop,v), /* reserved NOP */
[0x1d] = X86_OP_ENTRY1(NOP, nop,v), /* reserved NOP */
[0x1e] = X86_OP_ENTRY1(NOP, nop,v), /* reserved NOP */
@@ -1780,6 +1785,19 @@ static const X86OpEntry opcodes_root[256] = {
[0xCE] = X86_OP_ENTRY0(INTO),
[0xCF] = X86_OP_ENTRY0(IRET, chk(vm86_iopl) svm(IRET)),
+ /*
+ * x87 is nolea because it needs the address without segment base,
+ * in order to store it in fdp.
+ */
+ [0xD8] = X86_OP_ENTRY1(x87, nop,v, nolea),
+ [0xD9] = X86_OP_ENTRY1(x87, nop,v, nolea),
+ [0xDA] = X86_OP_ENTRY1(x87, nop,v, nolea),
+ [0xDB] = X86_OP_ENTRY1(x87, nop,v, nolea),
+ [0xDC] = X86_OP_ENTRY1(x87, nop,v, nolea),
+ [0xDD] = X86_OP_ENTRY1(x87, nop,v, nolea),
+ [0xDE] = X86_OP_ENTRY1(x87, nop,v, nolea),
+ [0xDF] = X86_OP_ENTRY1(x87, nop,v, nolea),
+
[0xE8] = X86_OP_ENTRYr(CALL, J,z_f64),
[0xE9] = X86_OP_ENTRYr(JMP, J,z_f64),
[0xEA] = X86_OP_ENTRYrr(JMPF, I_unsigned,p, I_unsigned,w, chk(i64)),
@@ -2608,30 +2626,6 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
}
}
- /* Go back to old decoder for unconverted opcodes. */
- if (!(s->prefix & PREFIX_VEX)) {
- if ((b & ~7) == 0xd8) {
- if (!disas_insn_x87(s, cpu, b)) {
- goto unknown_op;
- }
- return;
- }
-
- if (b == 0x0f) {
- b = x86_ldub_code(env, s);
- switch (b) {
- case 0x00 ... 0x01: /* mostly privileged instructions */
- case 0x1a ... 0x1b: /* MPX */
- case 0xc7: /* grp9 */
- disas_insn_old(s, cpu, b + 0x100);
- return;
- default:
- decode_func = do_decode_0F;
- break;
- }
- }
- }
-
memset(&decode, 0, sizeof(decode));
decode.cc_op = -1;
decode.b = b;
@@ -2728,6 +2722,15 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
break;
}
+ /*
+ * hack for old decoder: 0F C7 has both instructions that accept LOCK
+ * and instructions that don't, but also needs X86_SPECIAL_NoLoadEA.
+ * Keep this here until CMPXCHG8B/CMPXCHG16B is separated from the
+ * other unconverted opcodes.
+ */
+ if (decode.e.gen == gen_multi0F) {
+ accept_lock = true;
+ }
if ((s->prefix & PREFIX_LOCK) && !accept_lock) {
goto illegal_op;
}
@@ -2775,7 +2778,7 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
if (decode.e.special != X86_SPECIAL_NoLoadEA &&
(decode.op[0].has_ea || decode.op[1].has_ea || decode.op[2].has_ea)) {
- gen_load_ea(s, &decode);
+ gen_lea_modrm(s, &decode);
}
if (s->prefix & PREFIX_LOCK) {
gen_load(s, &decode, 2, s->T1);
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index a25dff0570c..edadb51ae89 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -73,7 +73,7 @@ static void gen_NM_exception(DisasContext *s)
gen_exception(s, EXCP07_PREX);
}
-static void gen_load_ea(DisasContext *s, X86DecodedInsn *decode)
+static void gen_lea_modrm(DisasContext *s, X86DecodedInsn *decode)
{
AddressParts *mem = &decode->mem;
TCGv ea;
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 06/10] target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
` (4 preceding siblings ...)
2024-06-20 9:54 ` [PATCH 05/10] target/i386: decode address before going back to translate.c Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 16:14 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 07/10] target/i386: do not check PREFIX_LOCK in old-style decoder Paolo Bonzini
` (3 subsequent siblings)
9 siblings, 1 reply; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel
This moves the last LOCK-enabled instructions to the new decoder. It is now
possible to assume that PREFIX_LOCK gen_multi0F is called only after checking
that LOCK was not specified.
The gen_cmpxchg8b and gen_cmpxchg16b functions even have the correct
prototype already; the only thing that needs to be done is removing the
gen_lea_modrm() call.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.h | 2 +
target/i386/tcg/translate.c | 121 +------------------------------
target/i386/tcg/decode-new.c.inc | 34 ++++++---
target/i386/tcg/emit.c.inc | 96 ++++++++++++++++++++++++
4 files changed, 124 insertions(+), 129 deletions(-)
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index bebc77bd54b..7f23d373ea7 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -114,6 +114,8 @@ typedef enum X86CPUIDFeature {
X86_FEAT_CLWB,
X86_FEAT_CMOV,
X86_FEAT_CMPCCXADD,
+ X86_FEAT_CX8,
+ X86_FEAT_CX16,
X86_FEAT_F16C,
X86_FEAT_FMA,
X86_FEAT_FSGSBASE,
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 1d845ff66bb..c60f18c7482 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2298,104 +2298,6 @@ static void gen_sty_env_A0(DisasContext *s, int offset, bool align)
tcg_gen_qemu_st_i128(t, s->tmp0, mem_index, mop);
}
-static void gen_cmpxchg8b(DisasContext *s, X86DecodedInsn *decode)
-{
- TCGv_i64 cmp, val, old;
- TCGv Z;
-
- gen_lea_modrm(s, decode);
-
- cmp = tcg_temp_new_i64();
- val = tcg_temp_new_i64();
- old = tcg_temp_new_i64();
-
- /* Construct the comparison values from the register pair. */
- tcg_gen_concat_tl_i64(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
- tcg_gen_concat_tl_i64(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
-
- /* Only require atomic with LOCK; non-parallel handled in generator. */
- if (s->prefix & PREFIX_LOCK) {
- tcg_gen_atomic_cmpxchg_i64(old, s->A0, cmp, val, s->mem_index, MO_TEUQ);
- } else {
- tcg_gen_nonatomic_cmpxchg_i64(old, s->A0, cmp, val,
- s->mem_index, MO_TEUQ);
- }
-
- /* Set tmp0 to match the required value of Z. */
- tcg_gen_setcond_i64(TCG_COND_EQ, cmp, old, cmp);
- Z = tcg_temp_new();
- tcg_gen_trunc_i64_tl(Z, cmp);
-
- /*
- * Extract the result values for the register pair.
- * For 32-bit, we may do this unconditionally, because on success (Z=1),
- * the old value matches the previous value in EDX:EAX. For x86_64,
- * the store must be conditional, because we must leave the source
- * registers unchanged on success, and zero-extend the writeback
- * on failure (Z=0).
- */
- if (TARGET_LONG_BITS == 32) {
- tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], old);
- } else {
- TCGv zero = tcg_constant_tl(0);
-
- tcg_gen_extr_i64_tl(s->T0, s->T1, old);
- tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EAX], Z, zero,
- s->T0, cpu_regs[R_EAX]);
- tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EDX], Z, zero,
- s->T1, cpu_regs[R_EDX]);
- }
-
- /* Update Z. */
- gen_compute_eflags(s);
- tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src, Z, ctz32(CC_Z), 1);
-}
-
-#ifdef TARGET_X86_64
-static void gen_cmpxchg16b(DisasContext *s, X86DecodedInsn *decode)
-{
- MemOp mop = MO_TE | MO_128 | MO_ALIGN;
- TCGv_i64 t0, t1;
- TCGv_i128 cmp, val;
-
- gen_lea_modrm(s, decode);
-
- cmp = tcg_temp_new_i128();
- val = tcg_temp_new_i128();
- tcg_gen_concat_i64_i128(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
- tcg_gen_concat_i64_i128(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
-
- /* Only require atomic with LOCK; non-parallel handled in generator. */
- if (s->prefix & PREFIX_LOCK) {
- tcg_gen_atomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, mop);
- } else {
- tcg_gen_nonatomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, mop);
- }
-
- tcg_gen_extr_i128_i64(s->T0, s->T1, val);
-
- /* Determine success after the fact. */
- t0 = tcg_temp_new_i64();
- t1 = tcg_temp_new_i64();
- tcg_gen_xor_i64(t0, s->T0, cpu_regs[R_EAX]);
- tcg_gen_xor_i64(t1, s->T1, cpu_regs[R_EDX]);
- tcg_gen_or_i64(t0, t0, t1);
-
- /* Update Z. */
- gen_compute_eflags(s);
- tcg_gen_setcondi_i64(TCG_COND_EQ, t0, t0, 0);
- tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src, t0, ctz32(CC_Z), 1);
-
- /*
- * Extract the result values for the register pair. We may do this
- * unconditionally, because on success (Z=1), the old value matches
- * the previous value in RDX:RAX.
- */
- tcg_gen_mov_i64(cpu_regs[R_EAX], s->T0);
- tcg_gen_mov_i64(cpu_regs[R_EDX], s->T1);
-}
-#endif
-
#include "emit.c.inc"
static void gen_x87(DisasContext *s, X86DecodedInsn *decode)
@@ -2971,29 +2873,10 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
/* now check op code */
switch (b) {
- case 0x1c7: /* cmpxchg8b */
+ case 0x1c7: /* RDSEED, RDPID with f3 prefix */
mod = (modrm >> 6) & 3;
switch ((modrm >> 3) & 7) {
- case 1: /* CMPXCHG8, CMPXCHG16 */
- if (mod == 3) {
- goto illegal_op;
- }
-#ifdef TARGET_X86_64
- if (dflag == MO_64) {
- if (!(s->cpuid_ext_features & CPUID_EXT_CX16)) {
- goto illegal_op;
- }
- gen_cmpxchg16b(s, decode);
- break;
- }
-#endif
- if (!(s->cpuid_features & CPUID_CX8)) {
- goto illegal_op;
- }
- gen_cmpxchg8b(s, decode);
- break;
-
- case 7: /* RDSEED, RDPID with f3 prefix */
+ case 7:
if (mod != 3 ||
(s->prefix & (PREFIX_LOCK | PREFIX_REPNZ))) {
goto illegal_op;
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 45f4aed4611..fa51aadfcf2 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -288,6 +288,25 @@ static void decode_group8(DisasContext *s, CPUX86State *env, X86OpEntry *entry,
}
}
+static void decode_group9(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
+{
+ static const X86OpEntry group9_reg =
+ X86_OP_ENTRY0(multi0F); /* unconverted */
+ static const X86OpEntry cmpxchg8b =
+ X86_OP_ENTRY1(CMPXCHG8B, M,q, lock p_00 cpuid(CX8));
+ static const X86OpEntry cmpxchg16b =
+ X86_OP_ENTRY1(CMPXCHG16B, M,dq, lock p_00 cpuid(CX16));
+
+ int modrm = get_modrm(s, env);
+ int op = (modrm >> 3) & 7;
+
+ if ((modrm >> 6) == 3) {
+ *entry = group9_reg;
+ } else if (op == 1) {
+ *entry = REX_W(s) ? cmpxchg16b : cmpxchg8b;
+ }
+}
+
static void decode_group15(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
{
static const X86OpEntry group15_reg[8] = {
@@ -1203,7 +1222,7 @@ static const X86OpEntry opcodes_0F[256] = {
[0xc4] = X86_OP_ENTRY4(PINSRW, V,dq,H,dq,E,w, vex5 mmx p_00_66),
[0xc5] = X86_OP_ENTRY3(PEXTRW, G,d, U,dq,I,b, vex5 mmx p_00_66),
[0xc6] = X86_OP_ENTRY4(VSHUF, V,x, H,x, W,x, vex4 p_00_66),
- [0xc7] = X86_OP_ENTRY1(multi0F, nop,v, nolea), /* unconverted */
+ [0xc7] = X86_OP_GROUP0(group9),
[0xd0] = X86_OP_ENTRY3(VADDSUB, V,x, H,x, W,x, vex2 cpuid(SSE3) p_66_f2),
[0xd1] = X86_OP_ENTRY3(PSRLW_r, V,x, H,x, W,x, vex4 mmx avx2_256 p_00_66),
@@ -2241,8 +2260,12 @@ static bool has_cpuid_feature(DisasContext *s, X86CPUIDFeature cpuid)
return (s->cpuid_features & CPUID_CMOV);
case X86_FEAT_CLFLUSH:
return (s->cpuid_features & CPUID_CLFLUSH);
+ case X86_FEAT_CX8:
+ return (s->cpuid_features & CPUID_CX8);
case X86_FEAT_FXSR:
return (s->cpuid_features & CPUID_FXSR);
+ case X86_FEAT_CX16:
+ return (s->cpuid_ext_features & CPUID_EXT_CX16);
case X86_FEAT_F16C:
return (s->cpuid_ext_features & CPUID_EXT_F16C);
case X86_FEAT_FMA:
@@ -2722,15 +2745,6 @@ static void disas_insn(DisasContext *s, CPUState *cpu)
break;
}
- /*
- * hack for old decoder: 0F C7 has both instructions that accept LOCK
- * and instructions that don't, but also needs X86_SPECIAL_NoLoadEA.
- * Keep this here until CMPXCHG8B/CMPXCHG16B is separated from the
- * other unconverted opcodes.
- */
- if (decode.e.gen == gen_multi0F) {
- accept_lock = true;
- }
if ((s->prefix & PREFIX_LOCK) && !accept_lock) {
goto illegal_op;
}
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index edadb51ae89..98f10ad8b4e 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -1782,6 +1782,102 @@ static void gen_CMPXCHG(DisasContext *s, X86DecodedInsn *decode)
decode->cc_op = CC_OP_SUBB + ot;
}
+static void gen_CMPXCHG16B(DisasContext *s, X86DecodedInsn *decode)
+{
+#ifdef TARGET_X86_64
+ MemOp mop = MO_TE | MO_128 | MO_ALIGN;
+ TCGv_i64 t0, t1;
+ TCGv_i128 cmp, val;
+
+ cmp = tcg_temp_new_i128();
+ val = tcg_temp_new_i128();
+ tcg_gen_concat_i64_i128(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
+ tcg_gen_concat_i64_i128(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
+
+ /* Only require atomic with LOCK; non-parallel handled in generator. */
+ if (s->prefix & PREFIX_LOCK) {
+ tcg_gen_atomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, mop);
+ } else {
+ tcg_gen_nonatomic_cmpxchg_i128(val, s->A0, cmp, val, s->mem_index, mop);
+ }
+
+ tcg_gen_extr_i128_i64(s->T0, s->T1, val);
+
+ /* Determine success after the fact. */
+ t0 = tcg_temp_new_i64();
+ t1 = tcg_temp_new_i64();
+ tcg_gen_xor_i64(t0, s->T0, cpu_regs[R_EAX]);
+ tcg_gen_xor_i64(t1, s->T1, cpu_regs[R_EDX]);
+ tcg_gen_or_i64(t0, t0, t1);
+
+ /* Update Z. */
+ gen_compute_eflags(s);
+ tcg_gen_setcondi_i64(TCG_COND_EQ, t0, t0, 0);
+ tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src, t0, ctz32(CC_Z), 1);
+
+ /*
+ * Extract the result values for the register pair. We may do this
+ * unconditionally, because on success (Z=1), the old value matches
+ * the previous value in RDX:RAX.
+ */
+ tcg_gen_mov_i64(cpu_regs[R_EAX], s->T0);
+ tcg_gen_mov_i64(cpu_regs[R_EDX], s->T1);
+#else
+ abort();
+#endif
+}
+
+static void gen_CMPXCHG8B(DisasContext *s, X86DecodedInsn *decode)
+{
+ TCGv_i64 cmp, val, old;
+ TCGv Z;
+
+ cmp = tcg_temp_new_i64();
+ val = tcg_temp_new_i64();
+ old = tcg_temp_new_i64();
+
+ /* Construct the comparison values from the register pair. */
+ tcg_gen_concat_tl_i64(cmp, cpu_regs[R_EAX], cpu_regs[R_EDX]);
+ tcg_gen_concat_tl_i64(val, cpu_regs[R_EBX], cpu_regs[R_ECX]);
+
+ /* Only require atomic with LOCK; non-parallel handled in generator. */
+ if (s->prefix & PREFIX_LOCK) {
+ tcg_gen_atomic_cmpxchg_i64(old, s->A0, cmp, val, s->mem_index, MO_TEUQ);
+ } else {
+ tcg_gen_nonatomic_cmpxchg_i64(old, s->A0, cmp, val,
+ s->mem_index, MO_TEUQ);
+ }
+
+ /* Set tmp0 to match the required value of Z. */
+ tcg_gen_setcond_i64(TCG_COND_EQ, cmp, old, cmp);
+ Z = tcg_temp_new();
+ tcg_gen_trunc_i64_tl(Z, cmp);
+
+ /*
+ * Extract the result values for the register pair.
+ * For 32-bit, we may do this unconditionally, because on success (Z=1),
+ * the old value matches the previous value in EDX:EAX. For x86_64,
+ * the store must be conditional, because we must leave the source
+ * registers unchanged on success, and zero-extend the writeback
+ * on failure (Z=0).
+ */
+ if (TARGET_LONG_BITS == 32) {
+ tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], old);
+ } else {
+ TCGv zero = tcg_constant_tl(0);
+
+ tcg_gen_extr_i64_tl(s->T0, s->T1, old);
+ tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EAX], Z, zero,
+ s->T0, cpu_regs[R_EAX]);
+ tcg_gen_movcond_tl(TCG_COND_EQ, cpu_regs[R_EDX], Z, zero,
+ s->T1, cpu_regs[R_EDX]);
+ }
+
+ /* Update Z. */
+ gen_compute_eflags(s);
+ tcg_gen_deposit_tl(cpu_cc_src, cpu_cc_src, Z, ctz32(CC_Z), 1);
+}
+
static void gen_CPUID(DisasContext *s, X86DecodedInsn *decode)
{
gen_update_cc_op(s);
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 07/10] target/i386: do not check PREFIX_LOCK in old-style decoder
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
` (5 preceding siblings ...)
2024-06-20 9:54 ` [PATCH 06/10] target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 9:54 ` [PATCH 08/10] target/i386: list instructions still in translate.c Paolo Bonzini
` (2 subsequent siblings)
9 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
It is already checked before getting there.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 26 ++++++++------------------
1 file changed, 8 insertions(+), 18 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index c60f18c7482..501a1ef9313 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2878,7 +2878,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
switch ((modrm >> 3) & 7) {
case 7:
if (mod != 3 ||
- (s->prefix & (PREFIX_LOCK | PREFIX_REPNZ))) {
+ (s->prefix & PREFIX_REPNZ)) {
goto illegal_op;
}
if (s->prefix & PREFIX_REPZ) {
@@ -2898,7 +2898,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
case 6: /* RDRAND */
if (mod != 3 ||
- (s->prefix & (PREFIX_LOCK | PREFIX_REPZ | PREFIX_REPNZ)) ||
+ (s->prefix & (PREFIX_REPZ | PREFIX_REPNZ)) ||
!(s->cpuid_ext_features & CPUID_EXT_RDRAND)) {
goto illegal_op;
}
@@ -3058,8 +3058,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
case 0xd0: /* xgetbv */
if ((s->cpuid_ext_features & CPUID_EXT_XSAVE) == 0
- || (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ))) {
+ || (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) {
goto illegal_op;
}
tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
@@ -3069,8 +3068,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
case 0xd1: /* xsetbv */
if ((s->cpuid_ext_features & CPUID_EXT_XSAVE) == 0
- || (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ))) {
+ || (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ))) {
goto illegal_op;
}
gen_svm_check_intercept(s, SVM_EXIT_XSETBV);
@@ -3237,8 +3235,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
gen_st_modrm(s, decode, ot);
break;
case 0xee: /* rdpkru */
- if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ)) {
+ if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
tcg_gen_trunc_tl_i32(s->tmp2_i32, cpu_regs[R_ECX]);
@@ -3246,8 +3243,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
tcg_gen_extr_i64_tl(cpu_regs[R_EAX], cpu_regs[R_EDX], s->tmp1_i64);
break;
case 0xef: /* wrpkru */
- if (s->prefix & (PREFIX_LOCK | PREFIX_DATA
- | PREFIX_REPZ | PREFIX_REPNZ)) {
+ if (s->prefix & (PREFIX_DATA | PREFIX_REPZ | PREFIX_REPNZ)) {
goto illegal_op;
}
tcg_gen_concat_tl_i64(s->tmp1_i64, cpu_regs[R_EAX],
@@ -3323,7 +3319,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (prefixes & PREFIX_REPZ) {
/* bndcl */
if (reg >= 4
- || (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16) {
goto illegal_op;
}
@@ -3331,7 +3326,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
} else if (prefixes & PREFIX_REPNZ) {
/* bndcu */
if (reg >= 4
- || (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16) {
goto illegal_op;
}
@@ -3345,7 +3339,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
}
if (mod == 3) {
int reg2 = (modrm & 7) | REX_B(s);
- if (reg2 >= 4 || (prefixes & PREFIX_LOCK)) {
+ if (reg2 >= 4) {
goto illegal_op;
}
if (s->flags & HF_MPX_IU_MASK) {
@@ -3374,7 +3368,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
/* bndldx */
AddressParts a = decode->mem;
if (reg >= 4
- || (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16
|| a.base < -1) {
goto illegal_op;
@@ -3410,7 +3403,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
if (mod != 3 && (prefixes & PREFIX_REPZ)) {
/* bndmk */
if (reg >= 4
- || (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16) {
goto illegal_op;
}
@@ -3438,7 +3430,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
} else if (prefixes & PREFIX_REPNZ) {
/* bndcn */
if (reg >= 4
- || (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16) {
goto illegal_op;
}
@@ -3450,7 +3441,7 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
}
if (mod == 3) {
int reg2 = (modrm & 7) | REX_B(s);
- if (reg2 >= 4 || (prefixes & PREFIX_LOCK)) {
+ if (reg2 >= 4) {
goto illegal_op;
}
if (s->flags & HF_MPX_IU_MASK) {
@@ -3477,7 +3468,6 @@ static void gen_multi0F(DisasContext *s, X86DecodedInsn *decode)
/* bndstx */
AddressParts a = decode->mem;
if (reg >= 4
- || (prefixes & PREFIX_LOCK)
|| s->aflag == MO_16
|| a.base < -1) {
goto illegal_op;
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 08/10] target/i386: list instructions still in translate.c
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
` (6 preceding siblings ...)
2024-06-20 9:54 ` [PATCH 07/10] target/i386: do not check PREFIX_LOCK in old-style decoder Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 9:54 ` [PATCH 09/10] target/i386: assert that cc_op* and pc_save are preserved Paolo Bonzini
2024-06-20 9:54 ` [PATCH 10/10] target/i386: remove gen_ext_tl Paolo Bonzini
9 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
Group them so that it is easier to figure out which two-byte opcodes to
tackle together.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 31 +++++++++++++++++++++++++++++++
1 file changed, 31 insertions(+)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index fa51aadfcf2..f01a4f1f1fe 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -129,6 +129,37 @@
*
* (^) these are the two cases in which Intel and AMD disagree on the
* primary exception class
+ *
+ * Instructions still in translate.c
+ * ---------------------------------
+ * Generation of TCG opcodes for almost all instructions is in emit.c.inc;
+ * this file interprets the prefixes and opcode bytes down to individual
+ * instruction mnemonics. There is only a handful of opcodes still using
+ * a switch statement to decode modrm bits 3-5 and prefixes after decoding
+ * is complete; these are relics of the older x86 decoder and their code
+ * generation is performed in translate.c.
+ *
+ * These unconverted opcodes also perform their own effective address
+ * generation using the gen_lea_modrm() function.
+ *
+ * There is nothing particularly complicated about them; simply, they don't
+ * need any nasty hacks in the decoder, and they shouldn't get in the way
+ * of the implementation of new x86 instructions, so they are left alone
+ * for the time being.
+ *
+ * x87:
+ * 0xD8 - 0xDF
+ *
+ * privileged/system:
+ * 0x0F 0x00 group 6 (SLDT, STR, LLDT, LTR, VERR, VERW)
+ * 0x0F 0x01 group 7 (SGDT, SIDT, LGDT, LIDT, SMSW, LMSW, INVLPG,
+ * MONITOR, MWAIT, CLAC, STAC, XGETBV, XSETBV,
+ * SWAPGS, RDTSCP)
+ * 0x0F 0xC7 (reg operand) group 9 (RDRAND, RDSEED, RDPID)
+ *
+ * MPX:
+ * 0x0F 0x1A BNDLDX, BNDMOV, BNDCL, BNDCU
+ * 0x0F 0x1B BNDSTX, BNDMOV, BNDMK, BNDCN
*/
#define X86_OP_NONE { 0 },
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 09/10] target/i386: assert that cc_op* and pc_save are preserved
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
` (7 preceding siblings ...)
2024-06-20 9:54 ` [PATCH 08/10] target/i386: list instructions still in translate.c Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
2024-06-20 9:54 ` [PATCH 10/10] target/i386: remove gen_ext_tl Paolo Bonzini
9 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
Now all decoding has been done before any code generation.
There is no need anymore to save and restore cc_op* and
pc_save but, for the time being, assert that this is indeed
the case.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 12 +++---------
1 file changed, 3 insertions(+), 9 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 501a1ef9313..d11c5e1dc13 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -3709,15 +3709,9 @@ static void i386_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
case 2:
/* Restore state that may affect the next instruction. */
dc->pc = dc->base.pc_next;
- /*
- * TODO: These save/restore can be removed after the table-based
- * decoder is complete; we will be decoding the insn completely
- * before any code generation that might affect these variables.
- */
- dc->cc_op_dirty = orig_cc_op_dirty;
- dc->cc_op = orig_cc_op;
- dc->pc_save = orig_pc_save;
- /* END TODO */
+ assert(dc->cc_op_dirty == orig_cc_op_dirty);
+ assert(dc->cc_op == orig_cc_op);
+ assert(dc->pc_save == orig_pc_save);
dc->base.num_insns--;
tcg_remove_ops_after(dc->prev_insn_end);
dc->base.insn_start = dc->prev_insn_start;
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* [PATCH 10/10] target/i386: remove gen_ext_tl
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
` (8 preceding siblings ...)
2024-06-20 9:54 ` [PATCH 09/10] target/i386: assert that cc_op* and pc_save are preserved Paolo Bonzini
@ 2024-06-20 9:54 ` Paolo Bonzini
9 siblings, 0 replies; 16+ messages in thread
From: Paolo Bonzini @ 2024-06-20 9:54 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
With the introduction of tcg_gen_ext_tl, most uses can be converted directly
because they do not have a NULL destination. tcg_gen_ext_tl is able to drop
no-ops like "tcg_gen_ext_tl(tcgv, tcgv, MO_TL)" just fine, and the only thing
that gen_ext_tl was adding on top was avoiding the creation of a useless
temporary. This can be done in the only place where it matters, which is
gen_op_j_ecx.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 41 +++++++++++++++----------------------
1 file changed, 17 insertions(+), 24 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index d11c5e1dc13..5c9c992400e 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -697,23 +697,16 @@ static inline TCGv gen_compute_Dshift(DisasContext *s, MemOp ot)
return dshift;
};
-static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, bool sign)
-{
- if (size == MO_TL) {
- return src;
- }
- if (!dst) {
- dst = tcg_temp_new();
- }
- tcg_gen_ext_tl(dst, src, size | (sign ? MO_SIGN : 0));
- return dst;
-}
-
static void gen_op_j_ecx(DisasContext *s, TCGCond cond, TCGLabel *label1)
{
- TCGv tmp = gen_ext_tl(NULL, cpu_regs[R_ECX], s->aflag, false);
-
- tcg_gen_brcondi_tl(cond, tmp, 0, label1);
+ TCGv lhs;
+ if (s->aflag == MO_TL) {
+ lhs = cpu_regs[R_ECX];
+ } else {
+ lhs = tcg_temp_new();
+ tcg_gen_ext_tl(lhs, cpu_regs[R_ECX], s->aflag);
+ }
+ tcg_gen_brcondi_tl(cond, lhs, 0, label1);
}
static inline void gen_op_jz_ecx(DisasContext *s, TCGLabel *label1)
@@ -886,16 +879,16 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv reg)
case CC_OP_SUBB ... CC_OP_SUBQ:
/* (DATA_TYPE)CC_SRCT < (DATA_TYPE)CC_SRC */
size = s->cc_op - CC_OP_SUBB;
- gen_ext_tl(s->cc_srcT, s->cc_srcT, size, false);
- gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+ tcg_gen_ext_tl(s->cc_srcT, s->cc_srcT, size);
+ tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
return (CCPrepare) { .cond = TCG_COND_LTU, .reg = s->cc_srcT,
.reg2 = cpu_cc_src, .use_reg2 = true };
case CC_OP_ADDB ... CC_OP_ADDQ:
/* (DATA_TYPE)CC_DST < (DATA_TYPE)CC_SRC */
size = s->cc_op - CC_OP_ADDB;
- gen_ext_tl(cpu_cc_dst, cpu_cc_dst, size, false);
- gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+ tcg_gen_ext_tl(cpu_cc_dst, cpu_cc_dst, size);
+ tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
return (CCPrepare) { .cond = TCG_COND_LTU, .reg = cpu_cc_dst,
.reg2 = cpu_cc_src, .use_reg2 = true };
@@ -920,7 +913,7 @@ static CCPrepare gen_prepare_eflags_c(DisasContext *s, TCGv reg)
case CC_OP_BMILGB ... CC_OP_BMILGQ:
size = s->cc_op - CC_OP_BMILGB;
- gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+ tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src };
case CC_OP_ADCX:
@@ -1050,8 +1043,8 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, TCGv reg)
size = s->cc_op - CC_OP_SUBB;
switch (jcc_op) {
case JCC_BE:
- gen_ext_tl(s->cc_srcT, s->cc_srcT, size, false);
- gen_ext_tl(cpu_cc_src, cpu_cc_src, size, false);
+ tcg_gen_ext_tl(s->cc_srcT, s->cc_srcT, size);
+ tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size);
cc = (CCPrepare) { .cond = TCG_COND_LEU, .reg = s->cc_srcT,
.reg2 = cpu_cc_src, .use_reg2 = true };
break;
@@ -1061,8 +1054,8 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, TCGv reg)
case JCC_LE:
cond = TCG_COND_LE;
fast_jcc_l:
- gen_ext_tl(s->cc_srcT, s->cc_srcT, size, true);
- gen_ext_tl(cpu_cc_src, cpu_cc_src, size, true);
+ tcg_gen_ext_tl(s->cc_srcT, s->cc_srcT, size | MO_SIGN);
+ tcg_gen_ext_tl(cpu_cc_src, cpu_cc_src, size | MO_SIGN);
cc = (CCPrepare) { .cond = cond, .reg = s->cc_srcT,
.reg2 = cpu_cc_src, .use_reg2 = true };
break;
--
2.45.2
^ permalink raw reply related [flat|nested] 16+ messages in thread
* Re: [PATCH 01/10] target/i386: use cpu_cc_dst for CC_OP_POPCNT
2024-06-20 9:54 ` [PATCH 01/10] target/i386: use cpu_cc_dst for CC_OP_POPCNT Paolo Bonzini
@ 2024-06-20 15:02 ` Richard Henderson
0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2024-06-20 15:02 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 6/20/24 02:54, Paolo Bonzini wrote:
> It is the only POPCNT that computes ZF from one of the cc_op_* registers,
> but it uses cpu_cc_src instead of cpu_cc_dst like the others. Do not
> make it the odd one off.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/cpu.h | 2 +-
> target/i386/tcg/cc_helper.c | 2 +-
> target/i386/tcg/translate.c | 2 +-
> target/i386/tcg/emit.c.inc | 4 ++--
> 4 files changed, 5 insertions(+), 5 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 02/10] target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL
2024-06-20 9:54 ` [PATCH 02/10] target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL Paolo Bonzini
@ 2024-06-20 15:10 ` Richard Henderson
0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2024-06-20 15:10 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 6/20/24 02:54, Paolo Bonzini wrote:
> Handle it like the other arithmetic cc_ops. This simplifies a
> bit the implementation of bit test instructions.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/cpu.h | 13 +++++++++++--
> target/i386/tcg/translate.c | 3 +--
> 2 files changed, 12 insertions(+), 4 deletions(-)
>
> diff --git a/target/i386/cpu.h b/target/i386/cpu.h
> index f54cd93b3f9..8504a7998fd 100644
> --- a/target/i386/cpu.h
> +++ b/target/i386/cpu.h
> @@ -1275,6 +1275,7 @@ typedef enum {
> CC_OP_ADCX, /* CC_DST = C, CC_SRC = rest. */
> CC_OP_ADOX, /* CC_SRC2 = O, CC_SRC = rest. */
> CC_OP_ADCOX, /* CC_DST = C, CC_SRC2 = O, CC_SRC = rest. */
> + CC_OP_CLR, /* Z and P set, all other flags clear. */
>
> CC_OP_MULB, /* modify all flags, C, O = (CC_SRC != 0) */
> CC_OP_MULW,
> @@ -1331,8 +1332,16 @@ typedef enum {
> CC_OP_BMILGL,
> CC_OP_BMILGQ,
>
> - CC_OP_CLR, /* Z set, all other flags clear. */
> - CC_OP_POPCNT, /* Z via CC_DST, all other flags clear. */
> + /*
> + * Note that only CC_OP_POPCNT (i.e. the one with MO_TL size)
> + * is used or implemented, because the translation needs
> + * to zero-extend CC_DST anyway.
> + */
> + CC_OP_POPCNTB__, /* Z via CC_DST, all other flags clear. */
> + CC_OP_POPCNTW__,
> + CC_OP_POPCNTL__,
> + CC_OP_POPCNTQ__,
> + CC_OP_POPCNT = sizeof(target_ulong) == 8 ? CC_OP_POPCNTQ__ : CC_OP_POPCNTL__,
>
> CC_OP_NB,
> } CCOp;
> diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
> index f32cda4e169..934c514e64f 100644
> --- a/target/i386/tcg/translate.c
> +++ b/target/i386/tcg/translate.c
> @@ -1019,8 +1019,6 @@ static CCPrepare gen_prepare_eflags_z(DisasContext *s, TCGv reg)
> .imm = CC_Z };
> case CC_OP_CLR:
> return (CCPrepare) { .cond = TCG_COND_ALWAYS };
> - case CC_OP_POPCNT:
> - return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_src };
The previous patch needs to have changed this to dst.
Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 03/10] target/i386: convert bit test instructions to new decoder
2024-06-20 9:54 ` [PATCH 03/10] target/i386: convert bit test instructions to new decoder Paolo Bonzini
@ 2024-06-20 15:22 ` Richard Henderson
0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2024-06-20 15:22 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 6/20/24 02:54, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.h | 3 +
> target/i386/tcg/translate.c | 147 +-----------------------------
> target/i386/tcg/decode-new.c.inc | 40 ++++++---
> target/i386/tcg/emit.c.inc | 149 ++++++++++++++++++++++++++++++-
> 4 files changed, 181 insertions(+), 158 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 04/10] target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX
2024-06-20 9:54 ` [PATCH 04/10] target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX Paolo Bonzini
@ 2024-06-20 15:56 ` Richard Henderson
0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2024-06-20 15:56 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 6/20/24 02:54, Paolo Bonzini wrote:
> When computing the "other" flag (CF for CC_OP_ADOX, OF for CC_OP_ADCX),
> take into account that it is already in the right position of cpu_cc_src,
> just like for CC_OP_EFLAGS. There is no need to call gen_compute_eflags().
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 4 ++++
> 1 file changed, 4 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: [PATCH 06/10] target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder
2024-06-20 9:54 ` [PATCH 06/10] target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder Paolo Bonzini
@ 2024-06-20 16:14 ` Richard Henderson
0 siblings, 0 replies; 16+ messages in thread
From: Richard Henderson @ 2024-06-20 16:14 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 6/20/24 02:54, Paolo Bonzini wrote:
> This moves the last LOCK-enabled instructions to the new decoder. It is now
> possible to assume that PREFIX_LOCK gen_multi0F is called only after checking
> that LOCK was not specified.
>
> The gen_cmpxchg8b and gen_cmpxchg16b functions even have the correct
> prototype already; the only thing that needs to be done is removing the
> gen_lea_modrm() call.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.h | 2 +
> target/i386/tcg/translate.c | 121 +------------------------------
> target/i386/tcg/decode-new.c.inc | 34 ++++++---
> target/i386/tcg/emit.c.inc | 96 ++++++++++++++++++++++++
> 4 files changed, 124 insertions(+), 129 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2024-06-20 16:15 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-20 9:54 [PATCH 00/10] target/i386: make decoding entirely table based Paolo Bonzini
2024-06-20 9:54 ` [PATCH 01/10] target/i386: use cpu_cc_dst for CC_OP_POPCNT Paolo Bonzini
2024-06-20 15:02 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 02/10] target/i386: give CC_OP_POPCNT low bits corresponding to MO_TL Paolo Bonzini
2024-06-20 15:10 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 03/10] target/i386: convert bit test instructions to new decoder Paolo Bonzini
2024-06-20 15:22 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 04/10] target/i386: try not to force EFLAGS computation for CC_OP_ADOX/ADCX Paolo Bonzini
2024-06-20 15:56 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 05/10] target/i386: decode address before going back to translate.c Paolo Bonzini
2024-06-20 9:54 ` [PATCH 06/10] target/i386: convert CMPXCHG8B/CMPXCHG16B to new decoder Paolo Bonzini
2024-06-20 16:14 ` Richard Henderson
2024-06-20 9:54 ` [PATCH 07/10] target/i386: do not check PREFIX_LOCK in old-style decoder Paolo Bonzini
2024-06-20 9:54 ` [PATCH 08/10] target/i386: list instructions still in translate.c Paolo Bonzini
2024-06-20 9:54 ` [PATCH 09/10] target/i386: assert that cc_op* and pc_save are preserved Paolo Bonzini
2024-06-20 9:54 ` [PATCH 10/10] target/i386: remove gen_ext_tl Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).