* [PATCH 00/22] target/i386: first part of TCG changes for 9.0
@ 2023-12-22 18:15 Paolo Bonzini
2023-12-22 18:15 ` [PATCH 01/22] target/i386: optimize computation of JL and JLE from flags Paolo Bonzini
` (21 more replies)
0 siblings, 22 replies; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
The main things here are a bunch of cleanups to the common logic of
the new decoder, some changes to translate.c that prepare for redoing
one-byte opcodes in the new framework, and the implementation of the
CMPccXADD instruction that is new in Sierra Forest processors.
These are all relatively innocuous changes, and easy to bisect in
case things go wrong.
Paolo
Paolo Bonzini (22):
target/i386: optimize computation of JL and JLE from flags
target/i386: speedup JO/SETO after MUL or IMUL
target/i386: remove unnecessary arguments from raise_interrupt
target/i386: remove unnecessary truncations
target/i386: clean up cpu_cc_compute_all
target/i386: document more deviations from the manual
target/i386: reimplement check for validity of LOCK prefix
target/i386: avoid trunc and ext for MULX and RORX
target/i386: rename zext0/zext2 and make them closer to the manual
target/i386: add X86_SPECIALs for MOVSX and MOVZX
target/i386: do not decode string source/destination into decode->mem
target/i386: do not clobber A0 in POP translation
target/i386: do not clobber T0 on string operations
target/i386: split eflags computation out of gen_compute_eflags
target/i386: do not use s->tmp4 for push
target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0
target/i386: extract gen_far_call/jmp, reordering temporaries
target/i386: prepare for implementation of STOS/SCAS in new decoder
target/i386: move operand load and writeback out of gen_cmovcc1
target/i386: adjust decoding of J operand
target/i386: introduce flags writeback mechanism
target/i386: implement CMPccXADD
target/i386/cpu.c | 2 +-
target/i386/cpu.h | 5 +-
target/i386/tcg/cc_helper.c | 6 +-
target/i386/tcg/decode-new.c.inc | 152 +++++++++++++------
target/i386/tcg/decode-new.h | 29 +++-
target/i386/tcg/emit.c.inc | 224 +++++++++++++++++++++------
target/i386/tcg/excp_helper.c | 7 +-
target/i386/tcg/fpu_helper.c | 10 +-
target/i386/tcg/helper-tcg.h | 3 +-
target/i386/tcg/int_helper.c | 8 +-
target/i386/tcg/misc_helper.c | 4 +-
target/i386/tcg/seg_helper.c | 8 +-
target/i386/tcg/translate.c | 250 ++++++++++++++++++-------------
tests/tcg/i386/Makefile.target | 2 +-
tests/tcg/i386/test-flags.c | 37 +++++
15 files changed, 509 insertions(+), 238 deletions(-)
create mode 100644 tests/tcg/i386/test-flags.c
--
2.43.0
^ permalink raw reply [flat|nested] 43+ messages in thread
* [PATCH 01/22] target/i386: optimize computation of JL and JLE from flags
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 20:53 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 02/22] target/i386: speedup JO/SETO after MUL or IMUL Paolo Bonzini
` (20 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Take advantage of the fact that there can be no 1 bits between SF and OF.
If they were adjacent, you could sum SF and get a carry only if SF was
already set. Then the value of OF in the sum is the XOR of OF itself,
the carry (which is SF) and 0 (the value of the OF bit in the addend):
this is OF^SF exactly.
Because OF and SF are not adjacent, just place more 1 bits to the
left so that the carry propagates, which means summing CC_O - CC_S.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 10 ++++------
1 file changed, 4 insertions(+), 6 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 037bc47e7c2..8fb80011a22 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1126,10 +1126,9 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, TCGv reg)
if (reg == cpu_cc_src) {
reg = s->tmp0;
}
- tcg_gen_shri_tl(reg, cpu_cc_src, 4); /* CC_O -> CC_S */
- tcg_gen_xor_tl(reg, reg, cpu_cc_src);
+ tcg_gen_addi_tl(reg, cpu_cc_src, CC_O - CC_S);
cc = (CCPrepare) { .cond = TCG_COND_NE, .reg = reg,
- .mask = CC_S };
+ .mask = CC_O };
break;
default:
case JCC_LE:
@@ -1137,10 +1136,9 @@ static CCPrepare gen_prepare_cc(DisasContext *s, int b, TCGv reg)
if (reg == cpu_cc_src) {
reg = s->tmp0;
}
- tcg_gen_shri_tl(reg, cpu_cc_src, 4); /* CC_O -> CC_S */
- tcg_gen_xor_tl(reg, reg, cpu_cc_src);
+ tcg_gen_addi_tl(reg, cpu_cc_src, CC_O - CC_S);
cc = (CCPrepare) { .cond = TCG_COND_NE, .reg = reg,
- .mask = CC_S | CC_Z };
+ .mask = CC_O | CC_Z };
break;
}
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 02/22] target/i386: speedup JO/SETO after MUL or IMUL
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
2023-12-22 18:15 ` [PATCH 01/22] target/i386: optimize computation of JL and JLE from flags Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 20:56 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 03/22] target/i386: remove unnecessary arguments from raise_interrupt Paolo Bonzini
` (19 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
OF is equal to the carry flag, so use the same CCPrepare.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 8fb80011a22..a16eb8d4008 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1020,6 +1020,9 @@ static CCPrepare gen_prepare_eflags_o(DisasContext *s, TCGv reg)
case CC_OP_CLR:
case CC_OP_POPCNT:
return (CCPrepare) { .cond = TCG_COND_NEVER, .mask = -1 };
+ case CC_OP_MULB ... CC_OP_MULQ:
+ return (CCPrepare) { .cond = TCG_COND_NE,
+ .reg = cpu_cc_src, .mask = -1 };
default:
gen_compute_eflags(s);
return (CCPrepare) { .cond = TCG_COND_NE, .reg = cpu_cc_src,
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 03/22] target/i386: remove unnecessary arguments from raise_interrupt
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
2023-12-22 18:15 ` [PATCH 01/22] target/i386: optimize computation of JL and JLE from flags Paolo Bonzini
2023-12-22 18:15 ` [PATCH 02/22] target/i386: speedup JO/SETO after MUL or IMUL Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 20:58 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 04/22] target/i386: remove unnecessary truncations Paolo Bonzini
` (18 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
is_int is always 1, and error_code is always zero.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/excp_helper.c | 7 +++----
target/i386/tcg/helper-tcg.h | 3 +--
target/i386/tcg/misc_helper.c | 2 +-
3 files changed, 5 insertions(+), 7 deletions(-)
diff --git a/target/i386/tcg/excp_helper.c b/target/i386/tcg/excp_helper.c
index 7c3c8dc7fe8..65e37ae2a0c 100644
--- a/target/i386/tcg/excp_helper.c
+++ b/target/i386/tcg/excp_helper.c
@@ -28,7 +28,7 @@
G_NORETURN void helper_raise_interrupt(CPUX86State *env, int intno,
int next_eip_addend)
{
- raise_interrupt(env, intno, 1, 0, next_eip_addend);
+ raise_interrupt(env, intno, next_eip_addend);
}
G_NORETURN void helper_raise_exception(CPUX86State *env, int exception_index)
@@ -112,10 +112,9 @@ void raise_interrupt2(CPUX86State *env, int intno,
/* shortcuts to generate exceptions */
-G_NORETURN void raise_interrupt(CPUX86State *env, int intno, int is_int,
- int error_code, int next_eip_addend)
+G_NORETURN void raise_interrupt(CPUX86State *env, int intno, int next_eip_addend)
{
- raise_interrupt2(env, intno, is_int, error_code, next_eip_addend, 0);
+ raise_interrupt2(env, intno, 1, 0, next_eip_addend, 0);
}
G_NORETURN void raise_exception_err(CPUX86State *env, int exception_index,
diff --git a/target/i386/tcg/helper-tcg.h b/target/i386/tcg/helper-tcg.h
index cd1723389ad..ce34b737bb0 100644
--- a/target/i386/tcg/helper-tcg.h
+++ b/target/i386/tcg/helper-tcg.h
@@ -65,8 +65,7 @@ G_NORETURN void raise_exception_err(CPUX86State *env, int exception_index,
int error_code);
G_NORETURN void raise_exception_err_ra(CPUX86State *env, int exception_index,
int error_code, uintptr_t retaddr);
-G_NORETURN void raise_interrupt(CPUX86State *nenv, int intno, int is_int,
- int error_code, int next_eip_addend);
+G_NORETURN void raise_interrupt(CPUX86State *nenv, int intno, int next_eip_addend);
G_NORETURN void handle_unaligned_access(CPUX86State *env, vaddr vaddr,
MMUAccessType access_type,
uintptr_t retaddr);
diff --git a/target/i386/tcg/misc_helper.c b/target/i386/tcg/misc_helper.c
index babff061864..66b332a83c1 100644
--- a/target/i386/tcg/misc_helper.c
+++ b/target/i386/tcg/misc_helper.c
@@ -43,7 +43,7 @@ void helper_into(CPUX86State *env, int next_eip_addend)
eflags = cpu_cc_compute_all(env, CC_OP);
if (eflags & CC_O) {
- raise_interrupt(env, EXCP04_INTO, 1, 0, next_eip_addend);
+ raise_interrupt(env, EXCP04_INTO, next_eip_addend);
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 04/22] target/i386: remove unnecessary truncations
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (2 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 03/22] target/i386: remove unnecessary arguments from raise_interrupt Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 21:13 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 05/22] target/i386: clean up cpu_cc_compute_all Paolo Bonzini
` (17 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
gen_lea_v_seg (called by gen_add_A0_ds_seg) already zeroes any
bits of s->A0 beyond s->aflag. It does so before summing the
segment base and, if not in 64-bit mode, also after summing it.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/emit.c.inc | 4 +---
target/i386/tcg/translate.c | 2 --
2 files changed, 1 insertion(+), 5 deletions(-)
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 82da5488d47..d444d83e534 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -1242,9 +1242,7 @@ static void gen_LDMXCSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod
static void gen_MASKMOV(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
- tcg_gen_mov_tl(s->A0, cpu_regs[R_EDI]);
- gen_extu(s->aflag, s->A0);
- gen_add_A0_ds_seg(s);
+ gen_lea_v_seg(s, s->aflag, cpu_regs[R_EDI], R_DS, s->override);
if (s->prefix & PREFIX_DATA) {
gen_helper_maskmov_xmm(tcg_env, OP_PTR1, OP_PTR2, s->A0);
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index a16eb8d4008..73b83e07e23 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -4183,7 +4183,6 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
tcg_gen_mov_tl(s->A0, cpu_regs[R_EBX]);
tcg_gen_ext8u_tl(s->T0, cpu_regs[R_EAX]);
tcg_gen_add_tl(s->A0, s->A0, s->T0);
- gen_extu(s->aflag, s->A0);
gen_add_A0_ds_seg(s);
gen_op_ld_v(s, MO_8, s->T0, s->A0);
gen_op_mov_reg_v(s, MO_8, R_EAX, s->T0);
@@ -5835,7 +5834,6 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
gen_update_cc_op(s);
gen_update_eip_cur(s);
tcg_gen_mov_tl(s->A0, cpu_regs[R_EAX]);
- gen_extu(s->aflag, s->A0);
gen_add_A0_ds_seg(s);
gen_helper_monitor(tcg_env, s->A0);
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 05/22] target/i386: clean up cpu_cc_compute_all
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (3 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 04/22] target/i386: remove unnecessary truncations Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 21:27 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 06/22] target/i386: document more deviations from the manual Paolo Bonzini
` (16 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
cpu_cc_compute_all() has an argument that is always equal to CC_OP for historical
reasons (dating back to commit a7812ae4123, "TCG variable type checking.", 2008-11-17,
which added the argument to helper_cc_compute_all). It does not make sense for the
argumnt to have any other value, so remove it and clean up some lines that are not
too long anymore.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.h | 4 ++--
target/i386/tcg/cc_helper.c | 6 +++---
target/i386/tcg/fpu_helper.c | 10 ++++------
target/i386/tcg/int_helper.c | 8 ++++----
target/i386/tcg/misc_helper.c | 2 +-
target/i386/tcg/seg_helper.c | 8 ++++----
6 files changed, 18 insertions(+), 20 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index ef987f344cf..ecdd4518c64 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -2344,13 +2344,13 @@ void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
uint64_t status, uint64_t mcg_status, uint64_t addr,
uint64_t misc, int flags);
-uint32_t cpu_cc_compute_all(CPUX86State *env1, int op);
+uint32_t cpu_cc_compute_all(CPUX86State *env1);
static inline uint32_t cpu_compute_eflags(CPUX86State *env)
{
uint32_t eflags = env->eflags;
if (tcg_enabled()) {
- eflags |= cpu_cc_compute_all(env, CC_OP) | (env->df & DF_MASK);
+ eflags |= cpu_cc_compute_all(env) | (env->df & DF_MASK);
}
return eflags;
}
diff --git a/target/i386/tcg/cc_helper.c b/target/i386/tcg/cc_helper.c
index c310bd842f1..f76e9cb8cfb 100644
--- a/target/i386/tcg/cc_helper.c
+++ b/target/i386/tcg/cc_helper.c
@@ -220,9 +220,9 @@ target_ulong helper_cc_compute_all(target_ulong dst, target_ulong src1,
}
}
-uint32_t cpu_cc_compute_all(CPUX86State *env, int op)
+uint32_t cpu_cc_compute_all(CPUX86State *env)
{
- return helper_cc_compute_all(CC_DST, CC_SRC, CC_SRC2, op);
+ return helper_cc_compute_all(CC_DST, CC_SRC, CC_SRC2, CC_OP);
}
target_ulong helper_cc_compute_c(target_ulong dst, target_ulong src1,
@@ -335,7 +335,7 @@ target_ulong helper_read_eflags(CPUX86State *env)
{
uint32_t eflags;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
eflags |= (env->df & DF_MASK);
eflags |= env->eflags & ~(VM_MASK | RF_MASK);
return eflags;
diff --git a/target/i386/tcg/fpu_helper.c b/target/i386/tcg/fpu_helper.c
index 4430d3d380c..4b965a5d6c4 100644
--- a/target/i386/tcg/fpu_helper.c
+++ b/target/i386/tcg/fpu_helper.c
@@ -484,9 +484,8 @@ void helper_fcomi_ST0_FT0(CPUX86State *env)
FloatRelation ret;
ret = floatx80_compare(ST0, FT0, &env->fp_status);
- eflags = cpu_cc_compute_all(env, CC_OP);
- eflags = (eflags & ~(CC_Z | CC_P | CC_C)) | fcomi_ccval[ret + 1];
- CC_SRC = eflags;
+ eflags = cpu_cc_compute_all(env) & ~(CC_Z | CC_P | CC_C);
+ CC_SRC = eflags | fcomi_ccval[ret + 1];
merge_exception_flags(env, old_flags);
}
@@ -497,9 +496,8 @@ void helper_fucomi_ST0_FT0(CPUX86State *env)
FloatRelation ret;
ret = floatx80_compare_quiet(ST0, FT0, &env->fp_status);
- eflags = cpu_cc_compute_all(env, CC_OP);
- eflags = (eflags & ~(CC_Z | CC_P | CC_C)) | fcomi_ccval[ret + 1];
- CC_SRC = eflags;
+ eflags = cpu_cc_compute_all(env) & ~(CC_Z | CC_P | CC_C);
+ CC_SRC = eflags | fcomi_ccval[ret + 1];
merge_exception_flags(env, old_flags);
}
diff --git a/target/i386/tcg/int_helper.c b/target/i386/tcg/int_helper.c
index 05418f181f1..ab85dc55400 100644
--- a/target/i386/tcg/int_helper.c
+++ b/target/i386/tcg/int_helper.c
@@ -190,7 +190,7 @@ void helper_aaa(CPUX86State *env)
int al, ah, af;
int eflags;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
af = eflags & CC_A;
al = env->regs[R_EAX] & 0xff;
ah = (env->regs[R_EAX] >> 8) & 0xff;
@@ -214,7 +214,7 @@ void helper_aas(CPUX86State *env)
int al, ah, af;
int eflags;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
af = eflags & CC_A;
al = env->regs[R_EAX] & 0xff;
ah = (env->regs[R_EAX] >> 8) & 0xff;
@@ -237,7 +237,7 @@ void helper_daa(CPUX86State *env)
int old_al, al, af, cf;
int eflags;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
cf = eflags & CC_C;
af = eflags & CC_A;
old_al = al = env->regs[R_EAX] & 0xff;
@@ -264,7 +264,7 @@ void helper_das(CPUX86State *env)
int al, al1, af, cf;
int eflags;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
cf = eflags & CC_C;
af = eflags & CC_A;
al = env->regs[R_EAX] & 0xff;
diff --git a/target/i386/tcg/misc_helper.c b/target/i386/tcg/misc_helper.c
index 66b332a83c1..b0f0f7b893b 100644
--- a/target/i386/tcg/misc_helper.c
+++ b/target/i386/tcg/misc_helper.c
@@ -41,7 +41,7 @@ void helper_into(CPUX86State *env, int next_eip_addend)
{
int eflags;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
if (eflags & CC_O) {
raise_interrupt(env, EXCP04_INTO, next_eip_addend);
}
diff --git a/target/i386/tcg/seg_helper.c b/target/i386/tcg/seg_helper.c
index eb29a1fd4e7..34ccabd8ce3 100644
--- a/target/i386/tcg/seg_helper.c
+++ b/target/i386/tcg/seg_helper.c
@@ -2230,7 +2230,7 @@ target_ulong helper_lsl(CPUX86State *env, target_ulong selector1)
int rpl, dpl, cpl, type;
selector = selector1 & 0xffff;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
if ((selector & 0xfffc) == 0) {
goto fail;
}
@@ -2277,7 +2277,7 @@ target_ulong helper_lar(CPUX86State *env, target_ulong selector1)
int rpl, dpl, cpl, type;
selector = selector1 & 0xffff;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
if ((selector & 0xfffc) == 0) {
goto fail;
}
@@ -2326,7 +2326,7 @@ void helper_verr(CPUX86State *env, target_ulong selector1)
int rpl, dpl, cpl;
selector = selector1 & 0xffff;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
if ((selector & 0xfffc) == 0) {
goto fail;
}
@@ -2364,7 +2364,7 @@ void helper_verw(CPUX86State *env, target_ulong selector1)
int rpl, dpl, cpl;
selector = selector1 & 0xffff;
- eflags = cpu_cc_compute_all(env, CC_OP);
+ eflags = cpu_cc_compute_all(env);
if ((selector & 0xfffc) == 0) {
goto fail;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 06/22] target/i386: document more deviations from the manual
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (4 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 05/22] target/i386: clean up cpu_cc_compute_all Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 21:34 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 07/22] target/i386: reimplement check for validity of LOCK prefix Paolo Bonzini
` (15 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 12 ++++++++++++
1 file changed, 12 insertions(+)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 2bdbb1bba0f..232c6a45c96 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -26,6 +26,13 @@
* size (X86_SIZE_*) codes used in the manual. There are a few differences
* though.
*
+ * Operand sizes
+ * -------------
+ *
+ * The manual lists d64 ("cannot encode 32-bit size in 64-bit mode") and f64
+ * ("cannot encode 16-bit or 32-bit size in 64-bit mode") as modifiers of the
+ * "v" or "z" sizes. The decoder simply makes them separate operand sizes.
+ *
* Vector operands
* ---------------
*
@@ -44,6 +51,11 @@
* if the difference is expressed via prefixes. Individual instructions
* are separated by prefix in the generator functions.
*
+ * There is a custom size "xh" used to address half of a SSE/AVX operand.
+ * This points to a 64-bit operand for SSE operations, 128-bit operand
+ * for 256-bit AVX operands, etc. It is used for conversion operations
+ * such as VCVTPH2PS or VCVTSS2SD.
+ *
* There are a couple cases in which instructions (e.g. MOVD) write the
* whole XMM or MM register but are established incorrectly in the manual
* as "d" or "q". These have to be fixed for the decoder to work correctly.
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 07/22] target/i386: reimplement check for validity of LOCK prefix
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (5 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 06/22] target/i386: document more deviations from the manual Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 21:55 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 08/22] target/i386: avoid trunc and ext for MULX and RORX Paolo Bonzini
` (14 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
The previous check erroneously allowed CMP to be modified with LOCK.
Instead, tag explicitly the instructions that do support LOCK.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 17 ++++++++++-------
target/i386/tcg/decode-new.h | 3 +++
target/i386/tcg/emit.c.inc | 5 -----
3 files changed, 13 insertions(+), 12 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 232c6a45c96..5eb2e9d0224 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -151,6 +151,7 @@
#define cpuid(feat) .cpuid = X86_FEAT_##feat,
#define xchg .special = X86_SPECIAL_Locked,
+#define lock .special = X86_SPECIAL_HasLock,
#define mmx .special = X86_SPECIAL_MMX,
#define zext0 .special = X86_SPECIAL_ZExtOp0,
#define zext2 .special = X86_SPECIAL_ZExtOp2,
@@ -1103,10 +1104,6 @@ static int decode_modrm(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod
{
int modrm = get_modrm(s, env);
if ((modrm >> 6) == 3) {
- if (s->prefix & PREFIX_LOCK) {
- decode->e.gen = gen_illegal;
- return 0xff;
- }
op->n = (modrm & 7);
if (type != X86_TYPE_Q && type != X86_TYPE_N) {
op->n |= REX_B(s);
@@ -1881,6 +1878,9 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
if (decode.op[0].has_ea) {
s->prefix |= PREFIX_LOCK;
}
+ decode.e.special = X86_SPECIAL_HasLock;
+ /* fallthrough */
+ case X86_SPECIAL_HasLock:
break;
case X86_SPECIAL_ZExtOp0:
@@ -1909,6 +1909,12 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
break;
}
+ if (s->prefix & PREFIX_LOCK) {
+ if (decode.e.special != X86_SPECIAL_HasLock || !decode.op[0].has_ea) {
+ goto illegal_op;
+ }
+ }
+
if (!validate_vex(s, &decode)) {
return;
}
@@ -1952,9 +1958,6 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
gen_load_ea(s, &decode.mem, decode.e.vex_class == 12);
}
if (s->prefix & PREFIX_LOCK) {
- if (decode.op[0].unit != X86_OP_INT || !decode.op[0].has_ea) {
- goto illegal_op;
- }
gen_load(s, &decode, 2, s->T1);
decode.e.gen(s, env, &decode);
} else {
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index e6c904a3192..611bfddd957 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -158,6 +158,9 @@ typedef enum X86InsnCheck {
typedef enum X86InsnSpecial {
X86_SPECIAL_None,
+ /* Accepts LOCK prefix; LOCKed operations do not load or writeback operand 0 */
+ X86_SPECIAL_HasLock,
+
/* Always locked if it has a memory operand (XCHG) */
X86_SPECIAL_Locked,
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index d444d83e534..98c4c9569ef 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -55,11 +55,6 @@ static void gen_NM_exception(DisasContext *s)
gen_exception(s, EXCP07_PREX);
}
-static void gen_illegal(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
-{
- gen_illegal_opcode(s);
-}
-
static void gen_load_ea(DisasContext *s, AddressParts *mem, bool is_vsib)
{
TCGv ea = gen_lea_modrm_1(s, *mem, is_vsib);
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 08/22] target/i386: avoid trunc and ext for MULX and RORX
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (6 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 07/22] target/i386: reimplement check for validity of LOCK prefix Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 21:50 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 09/22] target/i386: rename zext0/zext2 and make them closer to the manual Paolo Bonzini
` (13 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Use _tl operations for 32-bit operands on 32-bit targets, and only go
through trunc and extu ops for 64-bit targets. While the trunc/ext
ops should be pretty much free after optimization, the optimizer also
does not like having the same temporary used in multiple EBBs.
Therefore it is nicer to not use tmpN* unless necessary.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/emit.c.inc | 37 +++++++++++++++++++++++++------------
1 file changed, 25 insertions(+), 12 deletions(-)
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 98c4c9569ef..f5e44117eab 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -1348,7 +1348,8 @@ static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
/* low part of result in VEX.vvvv, high in MODRM */
switch (ot) {
- default:
+ case MO_32:
+#ifdef TARGET_X86_64
tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
tcg_gen_mulu2_i32(s->tmp2_i32, s->tmp3_i32,
@@ -1356,13 +1357,15 @@ static void gen_MULX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
tcg_gen_extu_i32_tl(cpu_regs[s->vex_v], s->tmp2_i32);
tcg_gen_extu_i32_tl(s->T0, s->tmp3_i32);
break;
-#ifdef TARGET_X86_64
- case MO_64:
- tcg_gen_mulu2_i64(cpu_regs[s->vex_v], s->T0, s->T0, s->T1);
- break;
-#endif
- }
+ case MO_64:
+#endif
+ tcg_gen_mulu2_tl(cpu_regs[s->vex_v], s->T0, s->T0, s->T1);
+ break;
+
+ default:
+ g_assert_not_reached();
+ }
}
static void gen_PALIGNR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
@@ -1765,14 +1768,24 @@ static void gen_PSLLDQ_i(DisasContext *s, CPUX86State *env, X86DecodedInsn *deco
static void gen_RORX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
MemOp ot = decode->op[0].ot;
- int b = decode->immediate;
+ int mask = ot == MO_64 ? 63 : 31;
+ int b = decode->immediate & mask;
- if (ot == MO_64) {
- tcg_gen_rotri_tl(s->T0, s->T0, b & 63);
- } else {
+ switch (ot) {
+ case MO_32:
+#ifdef TARGET_X86_64
tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
- tcg_gen_rotri_i32(s->tmp2_i32, s->tmp2_i32, b & 31);
+ tcg_gen_rotri_i32(s->tmp2_i32, s->tmp2_i32, b);
tcg_gen_extu_i32_tl(s->T0, s->tmp2_i32);
+ break;
+
+ case MO_64:
+#endif
+ tcg_gen_rotri_tl(s->T0, s->T0, b);
+ break;
+
+ default:
+ g_assert_not_reached();
}
}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 09/22] target/i386: rename zext0/zext2 and make them closer to the manual
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (7 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 08/22] target/i386: avoid trunc and ext for MULX and RORX Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:04 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 10/22] target/i386: add X86_SPECIALs for MOVSX and MOVZX Paolo Bonzini
` (12 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
X86_SPECIAL_ZExtOp0 and X86_SPECIAL_ZExtOp2 are poorly named; they are a hack
that is needed by scalar insertion and extraction instructions, and not really
related to zero extension: for PEXTR the zero extension is done by the generation
functions, for PINSR the high bits are not used at all and in fact are *not*
filled with zeroes when loaded into s->T1.
Rename the values to match the effect described in the manual, and explain
better in the comments.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 16 ++++++++--------
target/i386/tcg/decode-new.h | 17 +++++++++++++----
2 files changed, 21 insertions(+), 12 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 5eb2e9d0224..00fdb243857 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -153,8 +153,8 @@
#define xchg .special = X86_SPECIAL_Locked,
#define lock .special = X86_SPECIAL_HasLock,
#define mmx .special = X86_SPECIAL_MMX,
-#define zext0 .special = X86_SPECIAL_ZExtOp0,
-#define zext2 .special = X86_SPECIAL_ZExtOp2,
+#define op0_Rd .special = X86_SPECIAL_Op0_Rd,
+#define op2_Ry .special = X86_SPECIAL_Op2_Ry,
#define avx_movx .special = X86_SPECIAL_AVXExtMov,
#define vex1 .vex_class = 1,
@@ -632,13 +632,13 @@ static const X86OpEntry opcodes_0F3A[256] = {
[0x05] = X86_OP_ENTRY3(VPERMILPD_i, V,x, W,x, I,b, vex6 chk(W0) cpuid(AVX) p_66),
[0x06] = X86_OP_ENTRY4(VPERM2x128, V,qq, H,qq, W,qq, vex6 chk(W0) cpuid(AVX) p_66),
- [0x14] = X86_OP_ENTRY3(PEXTRB, E,b, V,dq, I,b, vex5 cpuid(SSE41) zext0 p_66),
- [0x15] = X86_OP_ENTRY3(PEXTRW, E,w, V,dq, I,b, vex5 cpuid(SSE41) zext0 p_66),
+ [0x14] = X86_OP_ENTRY3(PEXTRB, E,b, V,dq, I,b, vex5 cpuid(SSE41) op0_Rd p_66),
+ [0x15] = X86_OP_ENTRY3(PEXTRW, E,w, V,dq, I,b, vex5 cpuid(SSE41) op0_Rd p_66),
[0x16] = X86_OP_ENTRY3(PEXTR, E,y, V,dq, I,b, vex5 cpuid(SSE41) p_66),
[0x17] = X86_OP_ENTRY3(VEXTRACTPS, E,d, V,dq, I,b, vex5 cpuid(SSE41) p_66),
[0x1d] = X86_OP_ENTRY3(VCVTPS2PH, W,xh, V,x, I,b, vex11 chk(W0) cpuid(F16C) p_66),
- [0x20] = X86_OP_ENTRY4(PINSRB, V,dq, H,dq, E,b, vex5 cpuid(SSE41) zext2 p_66),
+ [0x20] = X86_OP_ENTRY4(PINSRB, V,dq, H,dq, E,b, vex5 cpuid(SSE41) op2_Ry p_66),
[0x21] = X86_OP_GROUP0(VINSERTPS),
[0x22] = X86_OP_ENTRY4(PINSR, V,dq, H,dq, E,y, vex5 cpuid(SSE41) p_66),
@@ -1883,17 +1883,17 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
case X86_SPECIAL_HasLock:
break;
- case X86_SPECIAL_ZExtOp0:
+ case X86_SPECIAL_Op0_Rd:
assert(decode.op[0].unit == X86_OP_INT);
if (!decode.op[0].has_ea) {
decode.op[0].ot = MO_32;
}
break;
- case X86_SPECIAL_ZExtOp2:
+ case X86_SPECIAL_Op2_Ry:
assert(decode.op[2].unit == X86_OP_INT);
if (!decode.op[2].has_ea) {
- decode.op[2].ot = MO_32;
+ decode.op[2].ot = s->dflag == MO_16 ? MO_32 : s->dflag;
}
break;
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index 611bfddd957..b253f7457ae 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -165,11 +165,20 @@ typedef enum X86InsnSpecial {
X86_SPECIAL_Locked,
/*
- * Register operand 0/2 is zero extended to 32 bits. Rd/Mb or Rd/Mw
- * in the manual.
+ * Rd/Mb or Rd/Mw in the manual: register operand 0 is treated as 32 bits
+ * (and writeback zero-extends it to 64 bits if applicable). PREFIX_DATA
+ * does not trigger 16-bit writeback and, as a side effect, high-byte
+ * registers are never used.
*/
- X86_SPECIAL_ZExtOp0,
- X86_SPECIAL_ZExtOp2,
+ X86_SPECIAL_Op0_Rd,
+
+ /*
+ * Ry/Mb in the manual (PINSRB). However, the high bits are never used by
+ * the instruction in either the register or memory cases; the *real* effect
+ * of this modifier is that high-byte registers are never used, even without
+ * a REX prefix. Therefore, PINSRW does not need it despite having Ry/Mw.
+ */
+ X86_SPECIAL_Op2_Ry,
/*
* Register operand 2 is extended to full width, while a memory operand
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 10/22] target/i386: add X86_SPECIALs for MOVSX and MOVZX
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (8 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 09/22] target/i386: rename zext0/zext2 and make them closer to the manual Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:08 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 11/22] target/i386: do not decode string source/destination into decode->mem Paolo Bonzini
` (11 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Usually the registers are just moved into s->T0 without much care for
their operand size. However, in some cases we can get more efficient
code if the operand fetching logic syncs with the emission function
on what is nicer.
All the current uses are mostly demonstrative and only reduce the code
in the emission functions, because the instructions do not support
memory operands. However the logic is generic and applies to several
more instructions such as MOVSXD (aka movslq), one-byte shift
instructions, multiplications, XLAT, and indirect calls/jumps.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 18 ++++++++++----
target/i386/tcg/decode-new.h | 4 +++
target/i386/tcg/emit.c.inc | 42 +++++++++++++++++---------------
3 files changed, 40 insertions(+), 24 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 00fdb243857..d7a86d96c0c 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -156,6 +156,8 @@
#define op0_Rd .special = X86_SPECIAL_Op0_Rd,
#define op2_Ry .special = X86_SPECIAL_Op2_Ry,
#define avx_movx .special = X86_SPECIAL_AVXExtMov,
+#define sextT0 .special = X86_SPECIAL_SExtT0,
+#define zextT0 .special = X86_SPECIAL_ZExtT0,
#define vex1 .vex_class = 1,
#define vex1_rep3 .vex_class = 1, .vex_special = X86_VEX_REPScalar,
@@ -571,8 +573,8 @@ static const X86OpEntry opcodes_0F38_F0toFF[16][5] = {
[5] = {
X86_OP_ENTRY3(BZHI, G,y, E,y, B,y, vex13 cpuid(BMI1)),
{},
- X86_OP_ENTRY3(PEXT, G,y, B,y, E,y, vex13 cpuid(BMI2)),
- X86_OP_ENTRY3(PDEP, G,y, B,y, E,y, vex13 cpuid(BMI2)),
+ X86_OP_ENTRY3(PEXT, G,y, B,y, E,y, vex13 zextT0 cpuid(BMI2)),
+ X86_OP_ENTRY3(PDEP, G,y, B,y, E,y, vex13 zextT0 cpuid(BMI2)),
{},
},
[6] = {
@@ -583,10 +585,10 @@ static const X86OpEntry opcodes_0F38_F0toFF[16][5] = {
{},
},
[7] = {
- X86_OP_ENTRY3(BEXTR, G,y, E,y, B,y, vex13 cpuid(BMI1)),
+ X86_OP_ENTRY3(BEXTR, G,y, E,y, B,y, vex13 zextT0 cpuid(BMI1)),
X86_OP_ENTRY3(SHLX, G,y, E,y, B,y, vex13 cpuid(BMI1)),
- X86_OP_ENTRY3(SARX, G,y, E,y, B,y, vex13 cpuid(BMI1)),
- X86_OP_ENTRY3(SHRX, G,y, E,y, B,y, vex13 cpuid(BMI1)),
+ X86_OP_ENTRY3(SARX, G,y, E,y, B,y, vex13 sextT0 cpuid(BMI1)),
+ X86_OP_ENTRY3(SHRX, G,y, E,y, B,y, vex13 zextT0 cpuid(BMI1)),
{},
},
};
@@ -1905,6 +1907,12 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
}
break;
+ case X86_SPECIAL_SExtT0:
+ case X86_SPECIAL_ZExtT0:
+ /* Handled in gen_load. */
+ assert(decode.op[1].unit == X86_OP_INT);
+ break;
+
default:
break;
}
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index b253f7457ae..70b6717227f 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -191,6 +191,10 @@ typedef enum X86InsnSpecial {
* become P/P/Q/N, and size "x" becomes "q".
*/
X86_SPECIAL_MMX,
+
+ /* When loaded into s->T0, register operand 1 is zero/sign extended. */
+ X86_SPECIAL_SExtT0,
+ X86_SPECIAL_ZExtT0,
} X86InsnSpecial;
/*
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index f5e44117eab..4c2006fdd09 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -232,9 +232,30 @@ static void gen_load(DisasContext *s, X86DecodedInsn *decode, int opn, TCGv v)
break;
case X86_OP_INT:
if (op->has_ea) {
- gen_op_ld_v(s, op->ot, v, s->A0);
+ if (v == s->T0 && decode->e.special == X86_SPECIAL_SExtT0) {
+ gen_op_ld_v(s, op->ot | MO_SIGN, v, s->A0);
+ } else {
+ gen_op_ld_v(s, op->ot, v, s->A0);
+ }
+
+ } else if (op->ot == MO_8 && byte_reg_is_xH(s, op->n)) {
+ if (v == s->T0 && decode->e.special == X86_SPECIAL_SExtT0) {
+ tcg_gen_sextract_tl(v, cpu_regs[op->n - 4], 8, 8);
+ } else {
+ tcg_gen_extract_tl(v, cpu_regs[op->n - 4], 8, 8);
+ }
+
+ } else if (op->ot < MO_TL && v == s->T0 &&
+ (decode->e.special == X86_SPECIAL_SExtT0 ||
+ decode->e.special == X86_SPECIAL_ZExtT0)) {
+ if (decode->e.special == X86_SPECIAL_SExtT0) {
+ tcg_gen_ext_tl(v, cpu_regs[op->n], op->ot | MO_SIGN);
+ } else {
+ tcg_gen_ext_tl(v, cpu_regs[op->n], op->ot);
+ }
+
} else {
- gen_op_mov_v_reg(s, op->ot, v, op->n);
+ tcg_gen_mov_tl(v, cpu_regs[op->n]);
}
break;
case X86_OP_IMM:
@@ -1084,9 +1105,6 @@ static void gen_BEXTR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
* Shifts larger than operand size get zeros.
*/
tcg_gen_ext8u_tl(s->A0, s->T1);
- if (TARGET_LONG_BITS == 64 && ot == MO_32) {
- tcg_gen_ext32u_tl(s->T0, s->T0);
- }
tcg_gen_shr_tl(s->T0, s->T0, s->A0);
tcg_gen_movcond_tl(TCG_COND_LEU, s->T0, s->A0, bound, s->T0, zero);
@@ -1428,19 +1446,11 @@ static void gen_PCMPISTRM(DisasContext *s, CPUX86State *env, X86DecodedInsn *dec
static void gen_PDEP(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
- MemOp ot = decode->op[1].ot;
- if (ot < MO_64) {
- tcg_gen_ext32u_tl(s->T0, s->T0);
- }
gen_helper_pdep(s->T0, s->T0, s->T1);
}
static void gen_PEXT(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
- MemOp ot = decode->op[1].ot;
- if (ot < MO_64) {
- tcg_gen_ext32u_tl(s->T0, s->T0);
- }
gen_helper_pext(s->T0, s->T0, s->T1);
}
@@ -1796,9 +1806,6 @@ static void gen_SARX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
mask = ot == MO_64 ? 63 : 31;
tcg_gen_andi_tl(s->T1, s->T1, mask);
- if (ot != MO_64) {
- tcg_gen_ext32s_tl(s->T0, s->T0);
- }
tcg_gen_sar_tl(s->T0, s->T0, s->T1);
}
@@ -1873,9 +1880,6 @@ static void gen_SHRX(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
mask = ot == MO_64 ? 63 : 31;
tcg_gen_andi_tl(s->T1, s->T1, mask);
- if (ot != MO_64) {
- tcg_gen_ext32u_tl(s->T0, s->T0);
- }
tcg_gen_shr_tl(s->T0, s->T0, s->T1);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 11/22] target/i386: do not decode string source/destination into decode->mem
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (9 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 10/22] target/i386: add X86_SPECIALs for MOVSX and MOVZX Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:09 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 12/22] target/i386: do not clobber A0 in POP translation Paolo Bonzini
` (10 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
decode->mem is only used if one operand has has_ea == true. String
operations will not use decode->mem and will load A0 on their own, because
they are the only case of two memory operands in a single instruction.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 20 ++------------------
1 file changed, 2 insertions(+), 18 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index d7a86d96c0c..99d18d2871e 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1212,6 +1212,8 @@ static bool decode_op(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode,
case X86_TYPE_None: /* Implicit or absent */
case X86_TYPE_A: /* Implicit */
case X86_TYPE_F: /* EFLAGS/RFLAGS */
+ case X86_TYPE_X: /* string source */
+ case X86_TYPE_Y: /* string destination */
break;
case X86_TYPE_B: /* VEX.vvvv selects a GPR */
@@ -1346,24 +1348,6 @@ static bool decode_op(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode,
op->n = insn_get(env, s, op->ot) >> 4;
break;
- case X86_TYPE_X: /* string source */
- op->n = -1;
- decode->mem = (AddressParts) {
- .def_seg = R_DS,
- .base = R_ESI,
- .index = -1,
- };
- break;
-
- case X86_TYPE_Y: /* string destination */
- op->n = -1;
- decode->mem = (AddressParts) {
- .def_seg = R_ES,
- .base = R_EDI,
- .index = -1,
- };
- break;
-
case X86_TYPE_2op:
*op = decode->op[0];
break;
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 12/22] target/i386: do not clobber A0 in POP translation
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (10 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 11/22] target/i386: do not decode string source/destination into decode->mem Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-22 18:15 ` [PATCH 13/22] target/i386: do not clobber T0 on string operations Paolo Bonzini
` (9 subsequent siblings)
21 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
The new decoder likes to compute the address in A0 very early, so the
gen_lea_v_seg in gen_pop_T0 would clobber the address of the memory
operand. Instead use T0 since it is already available and will be
overwritten immediately after.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 34 ++++++++++++++++++++--------------
1 file changed, 20 insertions(+), 14 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 73b83e07e23..efef4e74d4c 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -635,17 +635,17 @@ static TCGv eip_cur_tl(DisasContext *s)
}
}
-/* Compute SEG:REG into A0. SEG is selected from the override segment
+/* Compute SEG:REG into DEST. SEG is selected from the override segment
(OVR_SEG) and the default segment (DEF_SEG). OVR_SEG may be -1 to
indicate no override. */
-static void gen_lea_v_seg(DisasContext *s, MemOp aflag, TCGv a0,
- int def_seg, int ovr_seg)
+static void gen_lea_v_seg_dest(DisasContext *s, MemOp aflag, TCGv dest, TCGv a0,
+ int def_seg, int ovr_seg)
{
switch (aflag) {
#ifdef TARGET_X86_64
case MO_64:
if (ovr_seg < 0) {
- tcg_gen_mov_tl(s->A0, a0);
+ tcg_gen_mov_tl(dest, a0);
return;
}
break;
@@ -656,14 +656,14 @@ static void gen_lea_v_seg(DisasContext *s, MemOp aflag, TCGv a0,
ovr_seg = def_seg;
}
if (ovr_seg < 0) {
- tcg_gen_ext32u_tl(s->A0, a0);
+ tcg_gen_ext32u_tl(dest, a0);
return;
}
break;
case MO_16:
/* 16 bit address */
- tcg_gen_ext16u_tl(s->A0, a0);
- a0 = s->A0;
+ tcg_gen_ext16u_tl(dest, a0);
+ a0 = dest;
if (ovr_seg < 0) {
if (ADDSEG(s)) {
ovr_seg = def_seg;
@@ -680,17 +680,23 @@ static void gen_lea_v_seg(DisasContext *s, MemOp aflag, TCGv a0,
TCGv seg = cpu_seg_base[ovr_seg];
if (aflag == MO_64) {
- tcg_gen_add_tl(s->A0, a0, seg);
+ tcg_gen_add_tl(dest, a0, seg);
} else if (CODE64(s)) {
- tcg_gen_ext32u_tl(s->A0, a0);
- tcg_gen_add_tl(s->A0, s->A0, seg);
+ tcg_gen_ext32u_tl(dest, a0);
+ tcg_gen_add_tl(dest, dest, seg);
} else {
- tcg_gen_add_tl(s->A0, a0, seg);
- tcg_gen_ext32u_tl(s->A0, s->A0);
+ tcg_gen_add_tl(dest, a0, seg);
+ tcg_gen_ext32u_tl(dest, dest);
}
}
}
+static void gen_lea_v_seg(DisasContext *s, MemOp aflag, TCGv a0,
+ int def_seg, int ovr_seg)
+{
+ gen_lea_v_seg_dest(s, aflag, s->A0, a0, def_seg, ovr_seg);
+}
+
static inline void gen_string_movl_A0_ESI(DisasContext *s)
{
gen_lea_v_seg(s, s->aflag, cpu_regs[R_ESI], R_DS, s->override);
@@ -2576,8 +2582,8 @@ static MemOp gen_pop_T0(DisasContext *s)
{
MemOp d_ot = mo_pushpop(s, s->dflag);
- gen_lea_v_seg(s, mo_stacksize(s), cpu_regs[R_ESP], R_SS, -1);
- gen_op_ld_v(s, d_ot, s->T0, s->A0);
+ gen_lea_v_seg_dest(s, mo_stacksize(s), s->T0, cpu_regs[R_ESP], R_SS, -1);
+ gen_op_ld_v(s, d_ot, s->T0, s->T0);
return d_ot;
}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 13/22] target/i386: do not clobber T0 on string operations
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (11 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 12/22] target/i386: do not clobber A0 in POP translation Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:11 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 14/22] target/i386: split eflags computation out of gen_compute_eflags Paolo Bonzini
` (8 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
The new decoder would rather have the operand in T0 when expanding SCAS, rather
than use R_EAX directly as gen_scas currently does. This makes SCAS more similar
to CMP and SUB, in that CC_DST = T0 - T1.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 45 ++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 21 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index efef4e74d4c..00ed0cc9a31 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -522,9 +522,9 @@ void gen_op_add_reg_im(DisasContext *s, MemOp size, int reg, int32_t val)
gen_op_mov_reg_v(s, size, reg, s->tmp0);
}
-static inline void gen_op_add_reg_T0(DisasContext *s, MemOp size, int reg)
+static inline void gen_op_add_reg(DisasContext *s, MemOp size, int reg, TCGv val)
{
- tcg_gen_add_tl(s->tmp0, cpu_regs[reg], s->T0);
+ tcg_gen_add_tl(s->tmp0, cpu_regs[reg], val);
gen_op_mov_reg_v(s, size, reg, s->tmp0);
}
@@ -707,10 +707,12 @@ static inline void gen_string_movl_A0_EDI(DisasContext *s)
gen_lea_v_seg(s, s->aflag, cpu_regs[R_EDI], R_ES, -1);
}
-static inline void gen_op_movl_T0_Dshift(DisasContext *s, MemOp ot)
+static inline TCGv gen_compute_Dshift(DisasContext *s, MemOp ot)
{
- tcg_gen_ld32s_tl(s->T0, tcg_env, offsetof(CPUX86State, df));
- tcg_gen_shli_tl(s->T0, s->T0, ot);
+ TCGv dshift = tcg_temp_new();
+ tcg_gen_ld32s_tl(dshift, tcg_env, offsetof(CPUX86State, df));
+ tcg_gen_shli_tl(dshift, dshift, ot);
+ return dshift;
};
static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, bool sign)
@@ -818,13 +820,16 @@ static bool gen_check_io(DisasContext *s, MemOp ot, TCGv_i32 port,
static void gen_movs(DisasContext *s, MemOp ot)
{
+ TCGv dshift;
+
gen_string_movl_A0_ESI(s);
gen_op_ld_v(s, ot, s->T0, s->A0);
gen_string_movl_A0_EDI(s);
gen_op_st_v(s, ot, s->T0, s->A0);
- gen_op_movl_T0_Dshift(s, ot);
- gen_op_add_reg_T0(s, s->aflag, R_ESI);
- gen_op_add_reg_T0(s, s->aflag, R_EDI);
+
+ dshift = gen_compute_Dshift(s, ot);
+ gen_op_add_reg(s, s->aflag, R_ESI, dshift);
+ gen_op_add_reg(s, s->aflag, R_EDI, dshift);
}
static void gen_op_update1_cc(DisasContext *s)
@@ -1249,8 +1254,7 @@ static void gen_stos(DisasContext *s, MemOp ot)
gen_op_mov_v_reg(s, MO_32, s->T0, R_EAX);
gen_string_movl_A0_EDI(s);
gen_op_st_v(s, ot, s->T0, s->A0);
- gen_op_movl_T0_Dshift(s, ot);
- gen_op_add_reg_T0(s, s->aflag, R_EDI);
+ gen_op_add_reg(s, s->aflag, R_EDI, gen_compute_Dshift(s, ot));
}
static void gen_lods(DisasContext *s, MemOp ot)
@@ -1258,8 +1262,7 @@ static void gen_lods(DisasContext *s, MemOp ot)
gen_string_movl_A0_ESI(s);
gen_op_ld_v(s, ot, s->T0, s->A0);
gen_op_mov_reg_v(s, ot, R_EAX, s->T0);
- gen_op_movl_T0_Dshift(s, ot);
- gen_op_add_reg_T0(s, s->aflag, R_ESI);
+ gen_op_add_reg(s, s->aflag, R_ESI, gen_compute_Dshift(s, ot));
}
static void gen_scas(DisasContext *s, MemOp ot)
@@ -1267,19 +1270,21 @@ static void gen_scas(DisasContext *s, MemOp ot)
gen_string_movl_A0_EDI(s);
gen_op_ld_v(s, ot, s->T1, s->A0);
gen_op(s, OP_CMPL, ot, R_EAX);
- gen_op_movl_T0_Dshift(s, ot);
- gen_op_add_reg_T0(s, s->aflag, R_EDI);
+ gen_op_add_reg(s, s->aflag, R_EDI, gen_compute_Dshift(s, ot));
}
static void gen_cmps(DisasContext *s, MemOp ot)
{
+ TCGv dshift;
+
gen_string_movl_A0_EDI(s);
gen_op_ld_v(s, ot, s->T1, s->A0);
gen_string_movl_A0_ESI(s);
gen_op(s, OP_CMPL, ot, OR_TMP0);
- gen_op_movl_T0_Dshift(s, ot);
- gen_op_add_reg_T0(s, s->aflag, R_ESI);
- gen_op_add_reg_T0(s, s->aflag, R_EDI);
+
+ dshift = gen_compute_Dshift(s, ot);
+ gen_op_add_reg(s, s->aflag, R_ESI, dshift);
+ gen_op_add_reg(s, s->aflag, R_EDI, dshift);
}
static void gen_bpt_io(DisasContext *s, TCGv_i32 t_port, int ot)
@@ -1307,8 +1312,7 @@ static void gen_ins(DisasContext *s, MemOp ot)
tcg_gen_andi_i32(s->tmp2_i32, s->tmp2_i32, 0xffff);
gen_helper_in_func(ot, s->T0, s->tmp2_i32);
gen_op_st_v(s, ot, s->T0, s->A0);
- gen_op_movl_T0_Dshift(s, ot);
- gen_op_add_reg_T0(s, s->aflag, R_EDI);
+ gen_op_add_reg(s, s->aflag, R_EDI, gen_compute_Dshift(s, ot));
gen_bpt_io(s, s->tmp2_i32, ot);
}
@@ -1321,8 +1325,7 @@ static void gen_outs(DisasContext *s, MemOp ot)
tcg_gen_andi_i32(s->tmp2_i32, s->tmp2_i32, 0xffff);
tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T0);
gen_helper_out_func(ot, s->tmp2_i32, s->tmp3_i32);
- gen_op_movl_T0_Dshift(s, ot);
- gen_op_add_reg_T0(s, s->aflag, R_ESI);
+ gen_op_add_reg(s, s->aflag, R_ESI, gen_compute_Dshift(s, ot));
gen_bpt_io(s, s->tmp2_i32, ot);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 14/22] target/i386: split eflags computation out of gen_compute_eflags
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (12 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 13/22] target/i386: do not clobber T0 on string operations Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:13 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 15/22] target/i386: do not use s->tmp4 for push Paolo Bonzini
` (7 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
The new x86 decoder wants the gen_* functions to compute EFLAGS before
writeback, which can be an issue for instructions with a memory
destination such as ARPL or shifts.
Extract code to compute the EFLAGS without clobbering CC_SRC, in case
the memory write causes a fault. The flags writeback mechanism will
take care of copying the result to CC_SRC.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 28 +++++++++++++++++++---------
1 file changed, 19 insertions(+), 9 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 00ed0cc9a31..b79c312465b 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -862,22 +862,22 @@ static void gen_op_update_neg_cc(DisasContext *s)
tcg_gen_movi_tl(s->cc_srcT, 0);
}
-/* compute all eflags to cc_src */
-static void gen_compute_eflags(DisasContext *s)
+/* compute all eflags to reg */
+static void gen_mov_eflags(DisasContext *s, TCGv reg)
{
- TCGv zero, dst, src1, src2;
+ TCGv dst, src1, src2;
+ TCGv_i32 cc_op;
int live, dead;
if (s->cc_op == CC_OP_EFLAGS) {
+ tcg_gen_mov_tl(reg, cpu_cc_src);
return;
}
if (s->cc_op == CC_OP_CLR) {
- tcg_gen_movi_tl(cpu_cc_src, CC_Z | CC_P);
- set_cc_op(s, CC_OP_EFLAGS);
+ tcg_gen_movi_tl(reg, CC_Z | CC_P);
return;
}
- zero = NULL;
dst = cpu_cc_dst;
src1 = cpu_cc_src;
src2 = cpu_cc_src2;
@@ -886,7 +886,7 @@ static void gen_compute_eflags(DisasContext *s)
live = cc_op_live[s->cc_op] & ~USES_CC_SRCT;
dead = live ^ (USES_CC_DST | USES_CC_SRC | USES_CC_SRC2);
if (dead) {
- zero = tcg_constant_tl(0);
+ TCGv zero = tcg_constant_tl(0);
if (dead & USES_CC_DST) {
dst = zero;
}
@@ -898,8 +898,18 @@ static void gen_compute_eflags(DisasContext *s)
}
}
- gen_update_cc_op(s);
- gen_helper_cc_compute_all(cpu_cc_src, dst, src1, src2, cpu_cc_op);
+ if (s->cc_op != CC_OP_DYNAMIC) {
+ cc_op = tcg_constant_i32(s->cc_op);
+ } else {
+ cc_op = cpu_cc_op;
+ }
+ gen_helper_cc_compute_all(reg, dst, src1, src2, cc_op);
+}
+
+/* compute all eflags to cc_src */
+static void gen_compute_eflags(DisasContext *s)
+{
+ gen_mov_eflags(s, cpu_cc_src);
set_cc_op(s, CC_OP_EFLAGS);
}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 15/22] target/i386: do not use s->tmp4 for push
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (13 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 14/22] target/i386: split eflags computation out of gen_compute_eflags Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:14 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 16/22] target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0 Paolo Bonzini
` (6 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Just create a temporary for the occasion.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index b79c312465b..afe0fa6c65f 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2580,7 +2580,7 @@ static void gen_push_v(DisasContext *s, TCGv val)
if (!CODE64(s)) {
if (ADDSEG(s)) {
- new_esp = s->tmp4;
+ new_esp = tcg_temp_new();
tcg_gen_mov_tl(new_esp, s->A0);
}
gen_lea_v_seg(s, a_ot, s->A0, R_SS, -1);
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 16/22] target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (14 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 15/22] target/i386: do not use s->tmp4 for push Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:15 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 17/22] target/i386: extract gen_far_call/jmp, reordering temporaries Paolo Bonzini
` (5 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Create a new temporary, to ease the register allocator's work.
Creation of the temporary is pushed into gen_ext_tl, which
also allows NULL as the first parameter now.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 9 ++++++---
1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index afe0fa6c65f..e5f71170967 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -720,6 +720,9 @@ static TCGv gen_ext_tl(TCGv dst, TCGv src, MemOp size, bool sign)
if (size == MO_TL) {
return src;
}
+ if (!dst) {
+ dst = tcg_temp_new();
+ }
tcg_gen_ext_tl(dst, src, size | (sign ? MO_SIGN : 0));
return dst;
}
@@ -736,9 +739,9 @@ static void gen_exts(MemOp ot, TCGv reg)
static void gen_op_j_ecx(DisasContext *s, TCGCond cond, TCGLabel *label1)
{
- tcg_gen_mov_tl(s->tmp0, cpu_regs[R_ECX]);
- gen_extu(s->aflag, s->tmp0);
- tcg_gen_brcondi_tl(cond, s->tmp0, 0, label1);
+ TCGv tmp = gen_ext_tl(NULL, cpu_regs[R_ECX], s->aflag, false);
+
+ tcg_gen_brcondi_tl(cond, tmp, 0, label1);
}
static inline void gen_op_jz_ecx(DisasContext *s, TCGLabel *label1)
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 17/22] target/i386: extract gen_far_call/jmp, reordering temporaries
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (15 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 16/22] target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0 Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:25 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 18/22] target/i386: prepare for implementation of STOS/SCAS in new decoder Paolo Bonzini
` (4 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Extract the code into new functions, and swap T0/T1 so that T0 corresponds
to the first immediate in the instruction stream.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 90 ++++++++++++++++++++-----------------
1 file changed, 50 insertions(+), 40 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index e5f71170967..edbad0ad746 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2525,12 +2525,12 @@ static inline void gen_op_movl_T0_seg(DisasContext *s, X86Seg seg_reg)
offsetof(CPUX86State,segs[seg_reg].selector));
}
-static inline void gen_op_movl_seg_T0_vm(DisasContext *s, X86Seg seg_reg)
+static inline void gen_op_movl_seg_real(DisasContext *s, X86Seg seg_reg, TCGv seg)
{
- tcg_gen_ext16u_tl(s->T0, s->T0);
- tcg_gen_st32_tl(s->T0, tcg_env,
+ tcg_gen_ext16u_tl(seg, seg);
+ tcg_gen_st32_tl(seg, tcg_env,
offsetof(CPUX86State,segs[seg_reg].selector));
- tcg_gen_shli_tl(cpu_seg_base[seg_reg], s->T0, 4);
+ tcg_gen_shli_tl(cpu_seg_base[seg_reg], seg, 4);
}
/* move T0 to seg_reg and compute if the CPU state may change. Never
@@ -2550,13 +2550,43 @@ static void gen_movl_seg_T0(DisasContext *s, X86Seg seg_reg)
s->base.is_jmp = DISAS_EOB_NEXT;
}
} else {
- gen_op_movl_seg_T0_vm(s, seg_reg);
+ gen_op_movl_seg_real(s, seg_reg, s->T0);
if (seg_reg == R_SS) {
s->base.is_jmp = DISAS_EOB_INHIBIT_IRQ;
}
}
}
+static void gen_far_call(DisasContext *s)
+{
+ if (PE(s) && !VM86(s)) {
+ tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T1);
+ gen_helper_lcall_protected(tcg_env, s->tmp2_i32, s->T0,
+ tcg_constant_i32(s->dflag - 1),
+ eip_next_tl(s));
+ } else {
+ tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
+ tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
+ gen_helper_lcall_real(tcg_env, s->tmp3_i32, s->tmp2_i32,
+ tcg_constant_i32(s->dflag - 1),
+ eip_next_i32(s));
+ }
+ s->base.is_jmp = DISAS_JUMP;
+}
+
+static void gen_far_jmp(DisasContext *s)
+{
+ if (PE(s) && !VM86(s)) {
+ tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T1);
+ gen_helper_ljmp_protected(tcg_env, s->tmp2_i32, s->T0,
+ eip_next_tl(s));
+ } else {
+ gen_op_movl_seg_real(s, R_CS, s->T1);
+ gen_op_jmp_v(s, s->T0);
+ }
+ s->base.is_jmp = DISAS_JUMP;
+}
+
static void gen_svm_check_intercept(DisasContext *s, uint32_t type)
{
/* no SVM activated; fast case */
@@ -3637,23 +3667,10 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
if (mod == 3) {
goto illegal_op;
}
- gen_op_ld_v(s, ot, s->T1, s->A0);
+ gen_op_ld_v(s, ot, s->T0, s->A0);
gen_add_A0_im(s, 1 << ot);
- gen_op_ld_v(s, MO_16, s->T0, s->A0);
- do_lcall:
- if (PE(s) && !VM86(s)) {
- tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
- gen_helper_lcall_protected(tcg_env, s->tmp2_i32, s->T1,
- tcg_constant_i32(dflag - 1),
- eip_next_tl(s));
- } else {
- tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
- tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
- gen_helper_lcall_real(tcg_env, s->tmp2_i32, s->tmp3_i32,
- tcg_constant_i32(dflag - 1),
- eip_next_i32(s));
- }
- s->base.is_jmp = DISAS_JUMP;
+ gen_op_ld_v(s, MO_16, s->T1, s->A0);
+ gen_far_call(s);
break;
case 4: /* jmp Ev */
if (dflag == MO_16) {
@@ -3667,19 +3684,10 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
if (mod == 3) {
goto illegal_op;
}
- gen_op_ld_v(s, ot, s->T1, s->A0);
+ gen_op_ld_v(s, ot, s->T0, s->A0);
gen_add_A0_im(s, 1 << ot);
- gen_op_ld_v(s, MO_16, s->T0, s->A0);
- do_ljmp:
- if (PE(s) && !VM86(s)) {
- tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
- gen_helper_ljmp_protected(tcg_env, s->tmp2_i32, s->T1,
- eip_next_tl(s));
- } else {
- gen_op_movl_seg_T0_vm(s, R_CS);
- gen_op_jmp_v(s, s->T1);
- }
- s->base.is_jmp = DISAS_JUMP;
+ gen_op_ld_v(s, MO_16, s->T1, s->A0);
+ gen_far_jmp(s);
break;
case 6: /* push Ev */
gen_push_v(s, s->T0);
@@ -5117,7 +5125,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
/* pop selector */
gen_add_A0_im(s, 1 << dflag);
gen_op_ld_v(s, dflag, s->T0, s->A0);
- gen_op_movl_seg_T0_vm(s, R_CS);
+ gen_op_movl_seg_real(s, R_CS, s->T0);
/* add stack offset */
gen_stack_update(s, val + (2 << dflag));
}
@@ -5161,10 +5169,11 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
offset = insn_get(env, s, ot);
selector = insn_get(env, s, MO_16);
- tcg_gen_movi_tl(s->T0, selector);
- tcg_gen_movi_tl(s->T1, offset);
+ tcg_gen_movi_tl(s->T0, offset);
+ tcg_gen_movi_tl(s->T1, selector);
}
- goto do_lcall;
+ gen_far_call(s);
+ break;
case 0xe9: /* jmp im */
{
int diff = (dflag != MO_16
@@ -5184,10 +5193,11 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
offset = insn_get(env, s, ot);
selector = insn_get(env, s, MO_16);
- tcg_gen_movi_tl(s->T0, selector);
- tcg_gen_movi_tl(s->T1, offset);
+ tcg_gen_movi_tl(s->T0, offset);
+ tcg_gen_movi_tl(s->T1, selector);
}
- goto do_ljmp;
+ gen_far_jmp(s);
+ break;
case 0xeb: /* jmp Jb */
{
int diff = (int8_t)insn_get(env, s, MO_8);
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 18/22] target/i386: prepare for implementation of STOS/SCAS in new decoder
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (16 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 17/22] target/i386: extract gen_far_call/jmp, reordering temporaries Paolo Bonzini
@ 2023-12-22 18:15 ` Paolo Bonzini
2023-12-28 22:27 ` Richard Henderson
2023-12-22 18:16 ` [PATCH 19/22] target/i386: move operand load and writeback out of gen_cmovcc1 Paolo Bonzini
` (3 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:15 UTC (permalink / raw)
To: qemu-devel
Do not use gen_op, and pull the load from the accumulator into
disas_insn.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index edbad0ad746..c7d48088418 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -1264,7 +1264,6 @@ static TCGLabel *gen_jz_ecx_string(DisasContext *s)
static void gen_stos(DisasContext *s, MemOp ot)
{
- gen_op_mov_v_reg(s, MO_32, s->T0, R_EAX);
gen_string_movl_A0_EDI(s);
gen_op_st_v(s, ot, s->T0, s->A0);
gen_op_add_reg(s, s->aflag, R_EDI, gen_compute_Dshift(s, ot));
@@ -1282,7 +1281,11 @@ static void gen_scas(DisasContext *s, MemOp ot)
{
gen_string_movl_A0_EDI(s);
gen_op_ld_v(s, ot, s->T1, s->A0);
- gen_op(s, OP_CMPL, ot, R_EAX);
+ tcg_gen_mov_tl(cpu_cc_src, s->T1);
+ tcg_gen_mov_tl(s->cc_srcT, s->T0);
+ tcg_gen_sub_tl(cpu_cc_dst, s->T0, s->T1);
+ set_cc_op(s, CC_OP_SUBB + ot);
+
gen_op_add_reg(s, s->aflag, R_EDI, gen_compute_Dshift(s, ot));
}
@@ -4960,6 +4963,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
case 0xaa: /* stosS */
case 0xab:
ot = mo_b_d(b, dflag);
+ gen_op_mov_v_reg(s, MO_32, s->T0, R_EAX);
if (prefixes & (PREFIX_REPZ | PREFIX_REPNZ)) {
gen_repz_stos(s, ot);
} else {
@@ -4978,6 +4982,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
case 0xae: /* scasS */
case 0xaf:
ot = mo_b_d(b, dflag);
+ gen_op_mov_v_reg(s, MO_32, s->T0, R_EAX);
if (prefixes & PREFIX_REPNZ) {
gen_repz_scas(s, ot, 1);
} else if (prefixes & PREFIX_REPZ) {
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 19/22] target/i386: move operand load and writeback out of gen_cmovcc1
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (17 preceding siblings ...)
2023-12-22 18:15 ` [PATCH 18/22] target/i386: prepare for implementation of STOS/SCAS in new decoder Paolo Bonzini
@ 2023-12-22 18:16 ` Paolo Bonzini
2023-12-28 22:29 ` Richard Henderson
2023-12-22 18:16 ` [PATCH 20/22] target/i386: adjust decoding of J operand Paolo Bonzini
` (2 subsequent siblings)
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:16 UTC (permalink / raw)
To: qemu-devel
Similar to gen_setcc1, make gen_cmovcc1 receive TCGv. This is more friendly
to simultaneous implementation in the old and the new decoder.
A small wart is that s->T0 of CMOV is currently the *second* argument (which
would ordinarily be in T1). Therefore, the condition has to be inverted in
order to overwrite s->T0 with cpu_regs[reg] if the MOV is not performed.
This only applies to the old decoder, and this code will go away soon.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/translate.c | 16 ++++++----------
1 file changed, 6 insertions(+), 10 deletions(-)
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index c7d48088418..53b98d5e6ac 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -2500,14 +2500,10 @@ static void gen_jcc(DisasContext *s, int b, int diff)
gen_jmp_rel(s, s->dflag, diff, 0);
}
-static void gen_cmovcc1(CPUX86State *env, DisasContext *s, MemOp ot, int b,
- int modrm, int reg)
+static void gen_cmovcc1(DisasContext *s, int b, TCGv dest, TCGv src)
{
- CCPrepare cc;
+ CCPrepare cc = gen_prepare_cc(s, b, s->T1);
- gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
-
- cc = gen_prepare_cc(s, b, s->T1);
if (cc.mask != -1) {
TCGv t0 = tcg_temp_new();
tcg_gen_andi_tl(t0, cc.reg, cc.mask);
@@ -2517,9 +2513,7 @@ static void gen_cmovcc1(CPUX86State *env, DisasContext *s, MemOp ot, int b,
cc.reg2 = tcg_constant_tl(cc.imm);
}
- tcg_gen_movcond_tl(cc.cond, s->T0, cc.reg, cc.reg2,
- s->T0, cpu_regs[reg]);
- gen_op_mov_reg_v(s, ot, reg, s->T0);
+ tcg_gen_movcond_tl(cc.cond, dest, cc.reg, cc.reg2, src, dest);
}
static inline void gen_op_movl_T0_seg(DisasContext *s, X86Seg seg_reg)
@@ -5238,7 +5232,9 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
ot = dflag;
modrm = x86_ldub_code(env, s);
reg = ((modrm >> 3) & 7) | REX_R(s);
- gen_cmovcc1(env, s, ot, b, modrm, reg);
+ gen_ldst_modrm(env, s, modrm, ot, OR_TMP0, 0);
+ gen_cmovcc1(s, b ^ 1, s->T0, cpu_regs[reg]);
+ gen_op_mov_reg_v(s, ot, reg, s->T0);
break;
/************************/
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 20/22] target/i386: adjust decoding of J operand
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (18 preceding siblings ...)
2023-12-22 18:16 ` [PATCH 19/22] target/i386: move operand load and writeback out of gen_cmovcc1 Paolo Bonzini
@ 2023-12-22 18:16 ` Paolo Bonzini
2023-12-22 18:16 ` [PATCH 21/22] target/i386: introduce flags writeback mechanism Paolo Bonzini
2023-12-22 18:16 ` [PATCH 22/22] target/i386: implement CMPccXADD Paolo Bonzini
21 siblings, 0 replies; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:16 UTC (permalink / raw)
To: qemu-devel; +Cc: Richard Henderson
gen_jcc() has been changed to accept a relative offset since the
new decoder was written. Adjust the J operand, which is meant
to be used with jump instructions such as gen_jcc(), to not
include the program counter and to not truncate the result, as
both operations are now performed by common code.
The result is that J is now the same as the I operand.
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 10 ----------
1 file changed, 10 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 99d18d2871e..f30889dbc0a 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1329,19 +1329,9 @@ static bool decode_op(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode,
}
case X86_TYPE_I: /* Immediate */
- op->unit = X86_OP_IMM;
- decode->immediate = insn_get_signed(env, s, op->ot);
- break;
-
case X86_TYPE_J: /* Relative offset for a jump */
op->unit = X86_OP_IMM;
decode->immediate = insn_get_signed(env, s, op->ot);
- decode->immediate += s->pc - s->cs_base;
- if (s->dflag == MO_16) {
- decode->immediate &= 0xffff;
- } else if (!CODE64(s)) {
- decode->immediate &= 0xffffffffu;
- }
break;
case X86_TYPE_L: /* The upper 4 bits of the immediate select a 128-bit register */
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 21/22] target/i386: introduce flags writeback mechanism
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (19 preceding siblings ...)
2023-12-22 18:16 ` [PATCH 20/22] target/i386: adjust decoding of J operand Paolo Bonzini
@ 2023-12-22 18:16 ` Paolo Bonzini
2023-12-28 22:46 ` Richard Henderson
2023-12-22 18:16 ` [PATCH 22/22] target/i386: implement CMPccXADD Paolo Bonzini
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:16 UTC (permalink / raw)
To: qemu-devel
ALU instructions can write to both memory and flags. If the CC_SRC*
and CC_DST locations have been written already when a memory access
causes a fault, the value in CC_SRC* and CC_DST might be interpreted
with the wrong CC_OP (the one that is in effect before the instruction.
Besides just using the wrong result for the flags, something like
subtracting -1 can have disastrous effects if the current CC_OP is
CC_OP_EFLAGS: this is because QEMU does not expect bits outside the ALU
flags to be set in CC_SRC, and env->eflags can end up set to all-ones.
In the case of the attached testcase, this sets IOPL to 3 and would
cause an assertion failure if SUB is moved to the new decoder.
This mechanism is not really needed for BMI instructions, which can
only write to a register, but put it to use anyway for cleanliness.
In the case of BZHI, the code has to be modified slightly to ensure
that decode->cc_src is written, otherwise the new assertions trigger.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.h | 1 +
target/i386/tcg/decode-new.c.inc | 34 +++++++++++++++++++++++++++++
target/i386/tcg/decode-new.h | 4 ++++
target/i386/tcg/emit.c.inc | 36 ++++++++++++++++++++-----------
tests/tcg/i386/Makefile.target | 2 +-
tests/tcg/i386/test-flags.c | 37 ++++++++++++++++++++++++++++++++
6 files changed, 101 insertions(+), 13 deletions(-)
create mode 100644 tests/tcg/i386/test-flags.c
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index ecdd4518c64..7f0786e8b98 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1285,6 +1285,7 @@ typedef enum {
CC_OP_NB,
} CCOp;
+QEMU_BUILD_BUG_ON(CC_OP_NB >= 128);
typedef struct SegmentCache {
uint32_t selector;
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index f30889dbc0a..717d7307722 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -1662,6 +1662,7 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
bool first = true;
X86DecodedInsn decode;
X86DecodeFunc decode_func = decode_root;
+ uint8_t cc_live;
s->has_modrm = false;
@@ -1815,6 +1816,7 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
}
memset(&decode, 0, sizeof(decode));
+ decode.cc_op = -1;
decode.b = b;
if (!decode_insn(s, env, decode_func, &decode)) {
goto illegal_op;
@@ -1953,6 +1955,38 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
decode.e.gen(s, env, &decode);
gen_writeback(s, &decode, 0, s->T0);
}
+
+ /*
+ * Write back flags after last memory access. Some newer ALU instructions, as
+ * well as SSE instructions, write flags in the gen_* function, but that can
+ * cause incorrect tracking of CC_OP for instructions that write to both memory
+ * and flags.
+ */
+ if (decode.cc_op != -1) {
+ if (decode.cc_dst) {
+ tcg_gen_mov_tl(cpu_cc_dst, decode.cc_dst);
+ }
+ if (decode.cc_src) {
+ tcg_gen_mov_tl(cpu_cc_src, decode.cc_src);
+ }
+ if (decode.cc_src2) {
+ tcg_gen_mov_tl(cpu_cc_src2, decode.cc_src2);
+ }
+ if (decode.cc_op == CC_OP_DYNAMIC) {
+ tcg_gen_mov_i32(cpu_cc_op, decode.cc_op_dynamic);
+ }
+ set_cc_op(s, decode.cc_op);
+ cc_live = cc_op_live[decode.cc_op];
+ } else {
+ cc_live = 0;
+ }
+ if (decode.cc_op != CC_OP_DYNAMIC) {
+ assert(!decode.cc_op_dynamic);
+ assert(!!decode.cc_dst == !!(cc_live & USES_CC_DST));
+ assert(!!decode.cc_src == !!(cc_live & USES_CC_SRC));
+ assert(!!decode.cc_src2 == !!(cc_live & USES_CC_SRC2));
+ }
+
return;
gp_fault:
gen_exception_gpf(s);
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index 70b6717227f..25220fc4362 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -283,6 +283,10 @@ struct X86DecodedInsn {
target_ulong immediate;
AddressParts mem;
+ TCGv cc_dst, cc_src, cc_src2;
+ TCGv_i32 cc_op_dynamic;
+ int8_t cc_op;
+
uint8_t b;
};
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 4c2006fdd09..fd120e7b9b4 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -339,6 +339,19 @@ static inline int vector_len(DisasContext *s, X86DecodedInsn *decode)
return s->vex_l ? 32 : 16;
}
+static void prepare_update1_cc(X86DecodedInsn *decode, DisasContext *s, CCOp op)
+{
+ decode->cc_dst = s->T0;
+ decode->cc_op = op;
+}
+
+static void prepare_update2_cc(X86DecodedInsn *decode, DisasContext *s, CCOp op)
+{
+ decode->cc_src = s->T1;
+ decode->cc_dst = s->T0;
+ decode->cc_op = op;
+}
+
static void gen_store_sse(DisasContext *s, X86DecodedInsn *decode, int src_ofs)
{
MemOp ot = decode->op[0].ot;
@@ -1027,6 +1040,7 @@ static void gen_##uname(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod
VSIB_AVX(VPGATHERD, vpgatherd)
VSIB_AVX(VPGATHERQ, vpgatherq)
+/* ADCX/ADOX do not have memory operands and can use set_cc_op. */
static void gen_ADCOX(DisasContext *s, CPUX86State *env, MemOp ot, int cc_op)
{
int opposite_cc_op;
@@ -1089,8 +1103,7 @@ static void gen_ANDN(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
MemOp ot = decode->op[0].ot;
tcg_gen_andc_tl(s->T0, s->T1, s->T0);
- gen_op_update1_cc(s);
- set_cc_op(s, CC_OP_LOGICB + ot);
+ prepare_update1_cc(decode, s, CC_OP_LOGICB + ot);
}
static void gen_BEXTR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
@@ -1118,10 +1131,10 @@ static void gen_BEXTR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
tcg_gen_movcond_tl(TCG_COND_LEU, s->T1, s->A0, bound, s->T1, zero);
tcg_gen_andc_tl(s->T0, s->T0, s->T1);
- gen_op_update1_cc(s);
- set_cc_op(s, CC_OP_LOGICB + ot);
+ prepare_update1_cc(decode, s, CC_OP_LOGICB + ot);
}
+/* BLSI do not have memory operands and can use set_cc_op. */
static void gen_BLSI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
MemOp ot = decode->op[0].ot;
@@ -1133,6 +1146,7 @@ static void gen_BLSI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
set_cc_op(s, CC_OP_BMILGB + ot);
}
+/* BLSMSK do not have memory operands and can use set_cc_op. */
static void gen_BLSMSK(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
MemOp ot = decode->op[0].ot;
@@ -1144,6 +1158,7 @@ static void gen_BLSMSK(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode
set_cc_op(s, CC_OP_BMILGB + ot);
}
+/* BLSR do not have memory operands and can use set_cc_op. */
static void gen_BLSR(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
MemOp ot = decode->op[0].ot;
@@ -1164,18 +1179,15 @@ static void gen_BZHI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
tcg_gen_ext8u_tl(s->T1, s->T1);
+ tcg_gen_shl_tl(s->A0, mone, s->T1);
+ tcg_gen_movcond_tl(TCG_COND_LEU, s->A0, s->T1, bound, s->A0, zero);
+ tcg_gen_andc_tl(s->T0, s->T0, s->A0);
/*
* Note that since we're using BMILG (in order to get O
* cleared) we need to store the inverse into C.
*/
- tcg_gen_setcond_tl(TCG_COND_LEU, cpu_cc_src, s->T1, bound);
-
- tcg_gen_shl_tl(s->A0, mone, s->T1);
- tcg_gen_movcond_tl(TCG_COND_LEU, s->A0, s->T1, bound, s->A0, zero);
- tcg_gen_andc_tl(s->T0, s->T0, s->A0);
-
- gen_op_update1_cc(s);
- set_cc_op(s, CC_OP_BMILGB + ot);
+ tcg_gen_setcond_tl(TCG_COND_LEU, s->T1, s->T1, bound);
+ prepare_update2_cc(decode, s, CC_OP_BMILGB + ot);
}
static void gen_CRC32(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
diff --git a/tests/tcg/i386/Makefile.target b/tests/tcg/i386/Makefile.target
index 3dec7c6c423..9906f9e116b 100644
--- a/tests/tcg/i386/Makefile.target
+++ b/tests/tcg/i386/Makefile.target
@@ -13,7 +13,7 @@ config-cc.mak: Makefile
I386_SRCS=$(notdir $(wildcard $(I386_SRC)/*.c))
ALL_X86_TESTS=$(I386_SRCS:.c=)
-SKIP_I386_TESTS=test-i386-ssse3 test-avx test-3dnow test-mmx
+SKIP_I386_TESTS=test-i386-ssse3 test-avx test-3dnow test-mmx test-flags
X86_64_TESTS:=$(filter test-i386-adcox test-i386-bmi2 $(SKIP_I386_TESTS), $(ALL_X86_TESTS))
test-i386-sse-exceptions: CFLAGS += -msse4.1 -mfpmath=sse
diff --git a/tests/tcg/i386/test-flags.c b/tests/tcg/i386/test-flags.c
new file mode 100644
index 00000000000..c379e296275
--- /dev/null
+++ b/tests/tcg/i386/test-flags.c
@@ -0,0 +1,37 @@
+#define _GNU_SOURCE
+#include <sys/mman.h>
+#include <signal.h>
+#include <stdio.h>
+#include <assert.h>
+
+volatile unsigned long flags;
+volatile unsigned long flags_after;
+int *addr;
+
+void sigsegv(int sig, siginfo_t *info, ucontext_t *uc)
+{
+ flags = uc->uc_mcontext.gregs[REG_EFL];
+ mprotect(addr, 4096, PROT_READ|PROT_WRITE);
+}
+
+int main()
+{
+ struct sigaction sa = { .sa_handler = (void *)sigsegv, .sa_flags = SA_SIGINFO };
+ sigaction(SIGSEGV, &sa, NULL);
+
+ /* fault in the page then protect it */
+ addr = mmap (NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANON, -1, 0);
+ *addr = 0x1234;
+ mprotect(addr, 4096, PROT_READ);
+
+ asm("# set flags to all ones \n"
+ "mov $-1, %%eax \n"
+ "movq addr, %%rdi \n"
+ "sahf \n"
+ "sub %%eax, (%%rdi) \n"
+ "pushf \n"
+ "pop flags_after(%%rip) \n" : : : "eax", "edi", "memory");
+
+ /* OF can have any value before the SUB instruction. */
+ assert((flags & 0xff) == 0xd7 && (flags_after & 0x8ff) == 0x17);
+}
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* [PATCH 22/22] target/i386: implement CMPccXADD
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
` (20 preceding siblings ...)
2023-12-22 18:16 ` [PATCH 21/22] target/i386: introduce flags writeback mechanism Paolo Bonzini
@ 2023-12-22 18:16 ` Paolo Bonzini
2023-12-28 23:04 ` Richard Henderson
21 siblings, 1 reply; 43+ messages in thread
From: Paolo Bonzini @ 2023-12-22 18:16 UTC (permalink / raw)
To: qemu-devel
The main difficulty here is that a page fault when writing to the destination
must not overwrite the flags. Therefore, the compute-flags helper must be
called with a temporary destination instead of using gen_jcc1*.
For simplicity, I am using an unconditional cmpxchg operation, that becomes
a NOP if the comparison fails.
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/cpu.c | 2 +-
target/i386/tcg/decode-new.c.inc | 25 ++++++++
target/i386/tcg/decode-new.h | 1 +
target/i386/tcg/emit.c.inc | 104 +++++++++++++++++++++++++++++++
target/i386/tcg/translate.c | 2 +
5 files changed, 133 insertions(+), 1 deletion(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 95d5f16cd5e..fd47ee7defb 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -738,7 +738,7 @@ void x86_cpu_vendor_words2str(char *dst, uint32_t vendor1,
#define TCG_7_0_EDX_FEATURES (CPUID_7_0_EDX_FSRM | CPUID_7_0_EDX_KERNEL_FEATURES)
#define TCG_7_1_EAX_FEATURES (CPUID_7_1_EAX_FZRM | CPUID_7_1_EAX_FSRS | \
- CPUID_7_1_EAX_FSRC)
+ CPUID_7_1_EAX_FSRC | CPUID_7_1_EAX_CMPCCXADD)
#define TCG_7_1_EDX_FEATURES 0
#define TCG_7_2_EDX_FEATURES 0
#define TCG_APM_FEATURES 0
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 717d7307722..426c4594120 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -538,6 +538,28 @@ static const X86OpEntry opcodes_0F38_00toEF[240] = {
[0xdd] = X86_OP_ENTRY3(VAESENCLAST, V,x, H,x, W,x, vex4 cpuid(AES) p_66),
[0xde] = X86_OP_ENTRY3(VAESDEC, V,x, H,x, W,x, vex4 cpuid(AES) p_66),
[0xdf] = X86_OP_ENTRY3(VAESDECLAST, V,x, H,x, W,x, vex4 cpuid(AES) p_66),
+
+ /*
+ * REG selects srcdest2 operand, VEX.vvvv selects src3. VEX class not found
+ * in manual, assumed to be 13 from the VEX.L0 constraint.
+ */
+ [0xe0] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe1] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe2] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe3] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe4] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe5] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe6] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe7] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+
+ [0xe8] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xe9] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xea] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xeb] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xec] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xed] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xee] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
+ [0xef] = X86_OP_ENTRY3(CMPccXADD, M,y, G,y, B,y, vex13 xchg chk(o64) cpuid(CMPCCXADD) p_66),
};
/* five rows for no prefix, 66, F3, F2, 66+F2 */
@@ -1503,6 +1525,9 @@ static bool has_cpuid_feature(DisasContext *s, X86CPUIDFeature cpuid)
return (s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_AVX2);
case X86_FEAT_SHA_NI:
return (s->cpuid_7_0_ebx_features & CPUID_7_0_EBX_SHA_NI);
+
+ case X86_FEAT_CMPCCXADD:
+ return (s->cpuid_7_1_eax_features & CPUID_7_1_EAX_CMPCCXADD);
}
g_assert_not_reached();
}
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index 25220fc4362..15e6bfef4b1 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -104,6 +104,7 @@ typedef enum X86CPUIDFeature {
X86_FEAT_AVX2,
X86_FEAT_BMI1,
X86_FEAT_BMI2,
+ X86_FEAT_CMPCCXADD,
X86_FEAT_F16C,
X86_FEAT_FMA,
X86_FEAT_MOVBE,
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index fd120e7b9b4..f05f79e1f62 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -1190,6 +1190,109 @@ static void gen_BZHI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
prepare_update2_cc(decode, s, CC_OP_BMILGB + ot);
}
+static void gen_CMPccXADD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+{
+ TCGLabel *label_top = gen_new_label();
+ TCGLabel *label_bottom = gen_new_label();
+ TCGv oldv = tcg_temp_new();
+ TCGv newv = tcg_temp_new();
+ TCGv cmpv = tcg_temp_new();
+ TCGCond cond;
+
+ TCGv cmp_lhs, cmp_rhs;
+ MemOp ot, ot_full;
+
+ int jcc_op = (decode->b >> 1) & 7;
+ static const TCGCond cond_table[8] = {
+ [JCC_O] = TCG_COND_LT, /* test sign bit by comparing against 0 */
+ [JCC_B] = TCG_COND_LTU,
+ [JCC_Z] = TCG_COND_EQ,
+ [JCC_BE] = TCG_COND_LEU,
+ [JCC_S] = TCG_COND_LT, /* test sign bit by comparing against 0 */
+ [JCC_P] = TCG_COND_EQ, /* even parity - tests low bit of popcount */
+ [JCC_L] = TCG_COND_LT,
+ [JCC_LE] = TCG_COND_LE,
+ };
+
+ cond = cond_table[jcc_op];
+ if (decode->b & 1) {
+ cond = tcg_invert_cond(cond);
+ }
+
+ ot = decode->op[0].ot;
+ ot_full = ot | MO_LE;
+ if (jcc_op >= JCC_S) {
+ /*
+ * Sign-extend values before subtracting for S, P (zero/sign extension
+ * does not matter there) L, LE and their inverses.
+ */
+ ot_full |= MO_SIGN;
+ }
+
+ /*
+ * cmpv will be moved to cc_src *after* cpu_regs[] is written back, so use
+ * tcg_gen_ext_tl instead of gen_ext_tl.
+ */
+ tcg_gen_ext_tl(cmpv, cpu_regs[decode->op[1].n], ot_full);
+
+ /*
+ * Cmpxchg loop starts here.
+ * - s->T1: addition operand (from decoder)
+ * - s->A0: dest address (from decoder)
+ * - s->cc_srcT: memory operand (lhs for comparison)
+ * - cmpv: rhs for comparison
+ */
+ gen_set_label(label_top);
+ gen_op_ld_v(s, ot_full, s->cc_srcT, s->A0);
+ tcg_gen_sub_tl(s->T0, s->cc_srcT, cmpv);
+
+ /* Compute the comparison result by hand, to avoid clobbering cc_*. */
+ switch (jcc_op) {
+ case JCC_O:
+ /* (src1 ^ src2) & (src1 ^ dst). newv is only used here for a moment */
+ tcg_gen_xor_tl(newv, s->cc_srcT, s->T0);
+ tcg_gen_xor_tl(s->tmp0, s->cc_srcT, cmpv);
+ tcg_gen_and_tl(s->tmp0, s->tmp0, newv);
+ tcg_gen_sextract_tl(s->tmp0, s->tmp0, 0, 8 << ot);
+ cmp_lhs = s->tmp0, cmp_rhs = tcg_constant_tl(0);
+ break;
+
+ case JCC_P:
+ tcg_gen_ext8u_tl(s->tmp0, s->T0);
+ tcg_gen_ctpop_tl(s->tmp0, s->tmp0);
+ tcg_gen_andi_tl(s->tmp0, s->tmp0, 1);
+ cmp_lhs = s->tmp0, cmp_rhs = tcg_constant_tl(0);
+ break;
+
+ case JCC_S:
+ cmp_lhs = s->T0, cmp_rhs = tcg_constant_tl(0);
+ break;
+
+ default:
+ cmp_lhs = s->cc_srcT, cmp_rhs = cmpv;
+ break;
+ }
+
+ /* Compute new value: if condition does not hold, just store back s->cc_srcT */
+ tcg_gen_add_tl(newv, s->cc_srcT, s->T1);
+ tcg_gen_movcond_tl(cond, newv, cmp_lhs, cmp_rhs, newv, s->cc_srcT);
+ tcg_gen_atomic_cmpxchg_tl(oldv, s->A0, s->cc_srcT, newv, s->mem_index, ot_full);
+
+ /* Exit unconditionally if cmpxchg succeeded. */
+ tcg_gen_brcond_tl(TCG_COND_EQ, oldv, s->cc_srcT, label_bottom);
+
+ /* Try again if there was actually a store to make. */
+ tcg_gen_brcond_tl(cond, cmp_lhs, cmp_rhs, label_top);
+ gen_set_label(label_bottom);
+
+ /* Store old value to registers only after a successful store. */
+ gen_writeback(s, decode, 1, s->cc_srcT);
+
+ decode->cc_dst = s->T0;
+ decode->cc_src = cmpv;
+ decode->cc_op = CC_OP_SUBB + ot;
+}
+
static void gen_CRC32(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
MemOp ot = decode->op[2].ot;
diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 53b98d5e6ac..c6bb215d68a 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -122,6 +122,7 @@ typedef struct DisasContext {
int cpuid_ext3_features;
int cpuid_7_0_ebx_features;
int cpuid_7_0_ecx_features;
+ int cpuid_7_1_eax_features;
int cpuid_xsave_features;
/* TCG local temps */
@@ -6973,6 +6974,7 @@ static void i386_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cpu)
dc->cpuid_ext3_features = env->features[FEAT_8000_0001_ECX];
dc->cpuid_7_0_ebx_features = env->features[FEAT_7_0_EBX];
dc->cpuid_7_0_ecx_features = env->features[FEAT_7_0_ECX];
+ dc->cpuid_7_1_eax_features = env->features[FEAT_7_1_EAX];
dc->cpuid_xsave_features = env->features[FEAT_XSAVE];
dc->jmp_opt = !((cflags & CF_NO_GOTO_TB) ||
(flags & (HF_TF_MASK | HF_INHIBIT_IRQ_MASK)));
--
2.43.0
^ permalink raw reply related [flat|nested] 43+ messages in thread
* Re: [PATCH 01/22] target/i386: optimize computation of JL and JLE from flags
2023-12-22 18:15 ` [PATCH 01/22] target/i386: optimize computation of JL and JLE from flags Paolo Bonzini
@ 2023-12-28 20:53 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 20:53 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Take advantage of the fact that there can be no 1 bits between SF and OF.
> If they were adjacent, you could sum SF and get a carry only if SF was
> already set. Then the value of OF in the sum is the XOR of OF itself,
> the carry (which is SF) and 0 (the value of the OF bit in the addend):
> this is OF^SF exactly.
>
> Because OF and SF are not adjacent, just place more 1 bits to the
> left so that the carry propagates, which means summing CC_O - CC_S.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 10 ++++------
> 1 file changed, 4 insertions(+), 6 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 02/22] target/i386: speedup JO/SETO after MUL or IMUL
2023-12-22 18:15 ` [PATCH 02/22] target/i386: speedup JO/SETO after MUL or IMUL Paolo Bonzini
@ 2023-12-28 20:56 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 20:56 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> OF is equal to the carry flag, so use the same CCPrepare.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 3 +++
> 1 file changed, 3 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 03/22] target/i386: remove unnecessary arguments from raise_interrupt
2023-12-22 18:15 ` [PATCH 03/22] target/i386: remove unnecessary arguments from raise_interrupt Paolo Bonzini
@ 2023-12-28 20:58 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 20:58 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> is_int is always 1, and error_code is always zero.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/excp_helper.c | 7 +++----
> target/i386/tcg/helper-tcg.h | 3 +--
> target/i386/tcg/misc_helper.c | 2 +-
> 3 files changed, 5 insertions(+), 7 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 04/22] target/i386: remove unnecessary truncations
2023-12-22 18:15 ` [PATCH 04/22] target/i386: remove unnecessary truncations Paolo Bonzini
@ 2023-12-28 21:13 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 21:13 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> gen_lea_v_seg (called by gen_add_A0_ds_seg) already zeroes any
> bits of s->A0 beyond s->aflag. It does so before summing the
> segment base and, if not in 64-bit mode, also after summing it.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/emit.c.inc | 4 +---
> target/i386/tcg/translate.c | 2 --
> 2 files changed, 1 insertion(+), 5 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 05/22] target/i386: clean up cpu_cc_compute_all
2023-12-22 18:15 ` [PATCH 05/22] target/i386: clean up cpu_cc_compute_all Paolo Bonzini
@ 2023-12-28 21:27 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 21:27 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> cpu_cc_compute_all() has an argument that is always equal to CC_OP for historical
> reasons (dating back to commit a7812ae4123, "TCG variable type checking.", 2008-11-17,
> which added the argument to helper_cc_compute_all). It does not make sense for the
> argumnt to have any other value, so remove it and clean up some lines that are not
argument
> too long anymore.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/cpu.h | 4 ++--
> target/i386/tcg/cc_helper.c | 6 +++---
> target/i386/tcg/fpu_helper.c | 10 ++++------
> target/i386/tcg/int_helper.c | 8 ++++----
> target/i386/tcg/misc_helper.c | 2 +-
> target/i386/tcg/seg_helper.c | 8 ++++----
> 6 files changed, 18 insertions(+), 20 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 06/22] target/i386: document more deviations from the manual
2023-12-22 18:15 ` [PATCH 06/22] target/i386: document more deviations from the manual Paolo Bonzini
@ 2023-12-28 21:34 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 21:34 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 12 ++++++++++++
> 1 file changed, 12 insertions(+)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
>
> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
> index 2bdbb1bba0f..232c6a45c96 100644
> --- a/target/i386/tcg/decode-new.c.inc
> +++ b/target/i386/tcg/decode-new.c.inc
> @@ -26,6 +26,13 @@
> * size (X86_SIZE_*) codes used in the manual. There are a few differences
> * though.
> *
> + * Operand sizes
> + * -------------
> + *
> + * The manual lists d64 ("cannot encode 32-bit size in 64-bit mode") and f64
> + * ("cannot encode 16-bit or 32-bit size in 64-bit mode") as modifiers of the
> + * "v" or "z" sizes. The decoder simply makes them separate operand sizes.
> + *
> * Vector operands
> * ---------------
> *
> @@ -44,6 +51,11 @@
> * if the difference is expressed via prefixes. Individual instructions
> * are separated by prefix in the generator functions.
> *
> + * There is a custom size "xh" used to address half of a SSE/AVX operand.
> + * This points to a 64-bit operand for SSE operations, 128-bit operand
> + * for 256-bit AVX operands, etc. It is used for conversion operations
> + * such as VCVTPH2PS or VCVTSS2SD.
> + *
> * There are a couple cases in which instructions (e.g. MOVD) write the
> * whole XMM or MM register but are established incorrectly in the manual
> * as "d" or "q". These have to be fixed for the decoder to work correctly.
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 08/22] target/i386: avoid trunc and ext for MULX and RORX
2023-12-22 18:15 ` [PATCH 08/22] target/i386: avoid trunc and ext for MULX and RORX Paolo Bonzini
@ 2023-12-28 21:50 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 21:50 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Use _tl operations for 32-bit operands on 32-bit targets, and only go
> through trunc and extu ops for 64-bit targets. While the trunc/ext
> ops should be pretty much free after optimization, the optimizer also
> does not like having the same temporary used in multiple EBBs.
> Therefore it is nicer to not use tmpN* unless necessary.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/emit.c.inc | 37 +++++++++++++++++++++++++------------
> 1 file changed, 25 insertions(+), 12 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 07/22] target/i386: reimplement check for validity of LOCK prefix
2023-12-22 18:15 ` [PATCH 07/22] target/i386: reimplement check for validity of LOCK prefix Paolo Bonzini
@ 2023-12-28 21:55 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 21:55 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> The previous check erroneously allowed CMP to be modified with LOCK.
> Instead, tag explicitly the instructions that do support LOCK.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 17 ++++++++++-------
> target/i386/tcg/decode-new.h | 3 +++
> target/i386/tcg/emit.c.inc | 5 -----
> 3 files changed, 13 insertions(+), 12 deletions(-)
It's hard to see how this fits together, because there don't seem to be any uses of
X86_SPECIAL_{Locked,HasLock} yet.
But the illegal test in disas_insn_new looks more plausibly correct than the one in
decode_modrm, so
Acked-by: Richard Henderson <richard.henderson@linaro.org>
r~
>
> diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
> index 232c6a45c96..5eb2e9d0224 100644
> --- a/target/i386/tcg/decode-new.c.inc
> +++ b/target/i386/tcg/decode-new.c.inc
> @@ -151,6 +151,7 @@
>
> #define cpuid(feat) .cpuid = X86_FEAT_##feat,
> #define xchg .special = X86_SPECIAL_Locked,
> +#define lock .special = X86_SPECIAL_HasLock,
> #define mmx .special = X86_SPECIAL_MMX,
> #define zext0 .special = X86_SPECIAL_ZExtOp0,
> #define zext2 .special = X86_SPECIAL_ZExtOp2,
> @@ -1103,10 +1104,6 @@ static int decode_modrm(DisasContext *s, CPUX86State *env, X86DecodedInsn *decod
> {
> int modrm = get_modrm(s, env);
> if ((modrm >> 6) == 3) {
> - if (s->prefix & PREFIX_LOCK) {
> - decode->e.gen = gen_illegal;
> - return 0xff;
> - }
> op->n = (modrm & 7);
> if (type != X86_TYPE_Q && type != X86_TYPE_N) {
> op->n |= REX_B(s);
> @@ -1881,6 +1878,9 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
> if (decode.op[0].has_ea) {
> s->prefix |= PREFIX_LOCK;
> }
> + decode.e.special = X86_SPECIAL_HasLock;
> + /* fallthrough */
> + case X86_SPECIAL_HasLock:
> break;
>
> case X86_SPECIAL_ZExtOp0:
> @@ -1909,6 +1909,12 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
> break;
> }
>
> + if (s->prefix & PREFIX_LOCK) {
> + if (decode.e.special != X86_SPECIAL_HasLock || !decode.op[0].has_ea) {
> + goto illegal_op;
> + }
> + }
> +
> if (!validate_vex(s, &decode)) {
> return;
> }
> @@ -1952,9 +1958,6 @@ static void disas_insn_new(DisasContext *s, CPUState *cpu, int b)
> gen_load_ea(s, &decode.mem, decode.e.vex_class == 12);
> }
> if (s->prefix & PREFIX_LOCK) {
> - if (decode.op[0].unit != X86_OP_INT || !decode.op[0].has_ea) {
> - goto illegal_op;
> - }
> gen_load(s, &decode, 2, s->T1);
> decode.e.gen(s, env, &decode);
> } else {
> diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
> index e6c904a3192..611bfddd957 100644
> --- a/target/i386/tcg/decode-new.h
> +++ b/target/i386/tcg/decode-new.h
> @@ -158,6 +158,9 @@ typedef enum X86InsnCheck {
> typedef enum X86InsnSpecial {
> X86_SPECIAL_None,
>
> + /* Accepts LOCK prefix; LOCKed operations do not load or writeback operand 0 */
> + X86_SPECIAL_HasLock,
> +
> /* Always locked if it has a memory operand (XCHG) */
> X86_SPECIAL_Locked,
>
> diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
> index d444d83e534..98c4c9569ef 100644
> --- a/target/i386/tcg/emit.c.inc
> +++ b/target/i386/tcg/emit.c.inc
> @@ -55,11 +55,6 @@ static void gen_NM_exception(DisasContext *s)
> gen_exception(s, EXCP07_PREX);
> }
>
> -static void gen_illegal(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
> -{
> - gen_illegal_opcode(s);
> -}
> -
> static void gen_load_ea(DisasContext *s, AddressParts *mem, bool is_vsib)
> {
> TCGv ea = gen_lea_modrm_1(s, *mem, is_vsib);
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 09/22] target/i386: rename zext0/zext2 and make them closer to the manual
2023-12-22 18:15 ` [PATCH 09/22] target/i386: rename zext0/zext2 and make them closer to the manual Paolo Bonzini
@ 2023-12-28 22:04 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:04 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> X86_SPECIAL_ZExtOp0 and X86_SPECIAL_ZExtOp2 are poorly named; they are a hack
> that is needed by scalar insertion and extraction instructions, and not really
> related to zero extension: for PEXTR the zero extension is done by the generation
> functions, for PINSR the high bits are not used at all and in fact are*not*
> filled with zeroes when loaded into s->T1.
>
> Rename the values to match the effect described in the manual, and explain
> better in the comments.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 16 ++++++++--------
> target/i386/tcg/decode-new.h | 17 +++++++++++++----
> 2 files changed, 21 insertions(+), 12 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 10/22] target/i386: add X86_SPECIALs for MOVSX and MOVZX
2023-12-22 18:15 ` [PATCH 10/22] target/i386: add X86_SPECIALs for MOVSX and MOVZX Paolo Bonzini
@ 2023-12-28 22:08 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:08 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Usually the registers are just moved into s->T0 without much care for
> their operand size. However, in some cases we can get more efficient
> code if the operand fetching logic syncs with the emission function
> on what is nicer.
>
> All the current uses are mostly demonstrative and only reduce the code
> in the emission functions, because the instructions do not support
> memory operands. However the logic is generic and applies to several
> more instructions such as MOVSXD (aka movslq), one-byte shift
> instructions, multiplications, XLAT, and indirect calls/jumps.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 18 ++++++++++----
> target/i386/tcg/decode-new.h | 4 +++
> target/i386/tcg/emit.c.inc | 42 +++++++++++++++++---------------
> 3 files changed, 40 insertions(+), 24 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 11/22] target/i386: do not decode string source/destination into decode->mem
2023-12-22 18:15 ` [PATCH 11/22] target/i386: do not decode string source/destination into decode->mem Paolo Bonzini
@ 2023-12-28 22:09 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:09 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> decode->mem is only used if one operand has has_ea == true. String
> operations will not use decode->mem and will load A0 on their own, because
> they are the only case of two memory operands in a single instruction.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 20 ++------------------
> 1 file changed, 2 insertions(+), 18 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 13/22] target/i386: do not clobber T0 on string operations
2023-12-22 18:15 ` [PATCH 13/22] target/i386: do not clobber T0 on string operations Paolo Bonzini
@ 2023-12-28 22:11 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:11 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> The new decoder would rather have the operand in T0 when expanding SCAS, rather
> than use R_EAX directly as gen_scas currently does. This makes SCAS more similar
> to CMP and SUB, in that CC_DST = T0 - T1.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 45 ++++++++++++++++++++-----------------
> 1 file changed, 24 insertions(+), 21 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 14/22] target/i386: split eflags computation out of gen_compute_eflags
2023-12-22 18:15 ` [PATCH 14/22] target/i386: split eflags computation out of gen_compute_eflags Paolo Bonzini
@ 2023-12-28 22:13 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:13 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> The new x86 decoder wants the gen_* functions to compute EFLAGS before
> writeback, which can be an issue for instructions with a memory
> destination such as ARPL or shifts.
>
> Extract code to compute the EFLAGS without clobbering CC_SRC, in case
> the memory write causes a fault. The flags writeback mechanism will
> take care of copying the result to CC_SRC.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 28 +++++++++++++++++++---------
> 1 file changed, 19 insertions(+), 9 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 15/22] target/i386: do not use s->tmp4 for push
2023-12-22 18:15 ` [PATCH 15/22] target/i386: do not use s->tmp4 for push Paolo Bonzini
@ 2023-12-28 22:14 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:14 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Just create a temporary for the occasion.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 16/22] target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0
2023-12-22 18:15 ` [PATCH 16/22] target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0 Paolo Bonzini
@ 2023-12-28 22:15 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:15 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Create a new temporary, to ease the register allocator's work.
>
> Creation of the temporary is pushed into gen_ext_tl, which
> also allows NULL as the first parameter now.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 9 ++++++---
> 1 file changed, 6 insertions(+), 3 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 17/22] target/i386: extract gen_far_call/jmp, reordering temporaries
2023-12-22 18:15 ` [PATCH 17/22] target/i386: extract gen_far_call/jmp, reordering temporaries Paolo Bonzini
@ 2023-12-28 22:25 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:25 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Extract the code into new functions, and swap T0/T1 so that T0 corresponds
> to the first immediate in the instruction stream.
>
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 90 ++++++++++++++++++++-----------------
> 1 file changed, 50 insertions(+), 40 deletions(-)
>
> diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
> index e5f71170967..edbad0ad746 100644
> --- a/target/i386/tcg/translate.c
> +++ b/target/i386/tcg/translate.c
> @@ -2525,12 +2525,12 @@ static inline void gen_op_movl_T0_seg(DisasContext *s, X86Seg seg_reg)
> offsetof(CPUX86State,segs[seg_reg].selector));
> }
>
> -static inline void gen_op_movl_seg_T0_vm(DisasContext *s, X86Seg seg_reg)
> +static inline void gen_op_movl_seg_real(DisasContext *s, X86Seg seg_reg, TCGv seg)
In general, you probably want to drop inline markers as you come across them, and just let
the compiler inline as it chooses.
> {
> - tcg_gen_ext16u_tl(s->T0, s->T0);
> - tcg_gen_st32_tl(s->T0, tcg_env,
> + tcg_gen_ext16u_tl(seg, seg);
While cleaning, maybe better to not silently modify the input?
> +static void gen_far_call(DisasContext *s)
> +{
> + if (PE(s) && !VM86(s)) {
> + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T1);
> + gen_helper_lcall_protected(tcg_env, s->tmp2_i32, s->T0,
> + tcg_constant_i32(s->dflag - 1),
> + eip_next_tl(s));
> + } else {
> + tcg_gen_trunc_tl_i32(s->tmp3_i32, s->T1);
> + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T0);
New temps?
> + gen_helper_lcall_real(tcg_env, s->tmp3_i32, s->tmp2_i32,
> + tcg_constant_i32(s->dflag - 1),
> + eip_next_i32(s));
> + }
> + s->base.is_jmp = DISAS_JUMP;
> +}
> +
> +static void gen_far_jmp(DisasContext *s)
> +{
> + if (PE(s) && !VM86(s)) {
> + tcg_gen_trunc_tl_i32(s->tmp2_i32, s->T1);
Likewise.
Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 18/22] target/i386: prepare for implementation of STOS/SCAS in new decoder
2023-12-22 18:15 ` [PATCH 18/22] target/i386: prepare for implementation of STOS/SCAS in new decoder Paolo Bonzini
@ 2023-12-28 22:27 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:27 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:15, Paolo Bonzini wrote:
> Do not use gen_op, and pull the load from the accumulator into
> disas_insn.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 9 +++++++--
> 1 file changed, 7 insertions(+), 2 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 19/22] target/i386: move operand load and writeback out of gen_cmovcc1
2023-12-22 18:16 ` [PATCH 19/22] target/i386: move operand load and writeback out of gen_cmovcc1 Paolo Bonzini
@ 2023-12-28 22:29 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:29 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:16, Paolo Bonzini wrote:
> Similar to gen_setcc1, make gen_cmovcc1 receive TCGv. This is more friendly
> to simultaneous implementation in the old and the new decoder.
>
> A small wart is that s->T0 of CMOV is currently the*second* argument (which
> would ordinarily be in T1). Therefore, the condition has to be inverted in
> order to overwrite s->T0 with cpu_regs[reg] if the MOV is not performed.
>
> This only applies to the old decoder, and this code will go away soon.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/translate.c | 16 ++++++----------
> 1 file changed, 6 insertions(+), 10 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 21/22] target/i386: introduce flags writeback mechanism
2023-12-22 18:16 ` [PATCH 21/22] target/i386: introduce flags writeback mechanism Paolo Bonzini
@ 2023-12-28 22:46 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 22:46 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:16, Paolo Bonzini wrote:
> ALU instructions can write to both memory and flags. If the CC_SRC*
> and CC_DST locations have been written already when a memory access
> causes a fault, the value in CC_SRC* and CC_DST might be interpreted
> with the wrong CC_OP (the one that is in effect before the instruction.
>
> Besides just using the wrong result for the flags, something like
> subtracting -1 can have disastrous effects if the current CC_OP is
> CC_OP_EFLAGS: this is because QEMU does not expect bits outside the ALU
> flags to be set in CC_SRC, and env->eflags can end up set to all-ones.
> In the case of the attached testcase, this sets IOPL to 3 and would
> cause an assertion failure if SUB is moved to the new decoder.
>
> This mechanism is not really needed for BMI instructions, which can
> only write to a register, but put it to use anyway for cleanliness.
> In the case of BZHI, the code has to be modified slightly to ensure
> that decode->cc_src is written, otherwise the new assertions trigger.
>
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/cpu.h | 1 +
> target/i386/tcg/decode-new.c.inc | 34 +++++++++++++++++++++++++++++
> target/i386/tcg/decode-new.h | 4 ++++
> target/i386/tcg/emit.c.inc | 36 ++++++++++++++++++++-----------
> tests/tcg/i386/Makefile.target | 2 +-
> tests/tcg/i386/test-flags.c | 37 ++++++++++++++++++++++++++++++++
> 6 files changed, 101 insertions(+), 13 deletions(-)
> create mode 100644 tests/tcg/i386/test-flags.c
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
* Re: [PATCH 22/22] target/i386: implement CMPccXADD
2023-12-22 18:16 ` [PATCH 22/22] target/i386: implement CMPccXADD Paolo Bonzini
@ 2023-12-28 23:04 ` Richard Henderson
0 siblings, 0 replies; 43+ messages in thread
From: Richard Henderson @ 2023-12-28 23:04 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel
On 12/23/23 05:16, Paolo Bonzini wrote:
> + case JCC_S:
> + cmp_lhs = s->T0, cmp_rhs = tcg_constant_tl(0);
> + break;
I think you need an sextract here, when ot != full word size, same as JCC_O.
Otherwise,
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 43+ messages in thread
end of thread, other threads:[~2023-12-28 23:05 UTC | newest]
Thread overview: 43+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-22 18:15 [PATCH 00/22] target/i386: first part of TCG changes for 9.0 Paolo Bonzini
2023-12-22 18:15 ` [PATCH 01/22] target/i386: optimize computation of JL and JLE from flags Paolo Bonzini
2023-12-28 20:53 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 02/22] target/i386: speedup JO/SETO after MUL or IMUL Paolo Bonzini
2023-12-28 20:56 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 03/22] target/i386: remove unnecessary arguments from raise_interrupt Paolo Bonzini
2023-12-28 20:58 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 04/22] target/i386: remove unnecessary truncations Paolo Bonzini
2023-12-28 21:13 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 05/22] target/i386: clean up cpu_cc_compute_all Paolo Bonzini
2023-12-28 21:27 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 06/22] target/i386: document more deviations from the manual Paolo Bonzini
2023-12-28 21:34 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 07/22] target/i386: reimplement check for validity of LOCK prefix Paolo Bonzini
2023-12-28 21:55 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 08/22] target/i386: avoid trunc and ext for MULX and RORX Paolo Bonzini
2023-12-28 21:50 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 09/22] target/i386: rename zext0/zext2 and make them closer to the manual Paolo Bonzini
2023-12-28 22:04 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 10/22] target/i386: add X86_SPECIALs for MOVSX and MOVZX Paolo Bonzini
2023-12-28 22:08 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 11/22] target/i386: do not decode string source/destination into decode->mem Paolo Bonzini
2023-12-28 22:09 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 12/22] target/i386: do not clobber A0 in POP translation Paolo Bonzini
2023-12-22 18:15 ` [PATCH 13/22] target/i386: do not clobber T0 on string operations Paolo Bonzini
2023-12-28 22:11 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 14/22] target/i386: split eflags computation out of gen_compute_eflags Paolo Bonzini
2023-12-28 22:13 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 15/22] target/i386: do not use s->tmp4 for push Paolo Bonzini
2023-12-28 22:14 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 16/22] target/i386: do not use s->tmp0 for jumps on ECX ==/!= 0 Paolo Bonzini
2023-12-28 22:15 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 17/22] target/i386: extract gen_far_call/jmp, reordering temporaries Paolo Bonzini
2023-12-28 22:25 ` Richard Henderson
2023-12-22 18:15 ` [PATCH 18/22] target/i386: prepare for implementation of STOS/SCAS in new decoder Paolo Bonzini
2023-12-28 22:27 ` Richard Henderson
2023-12-22 18:16 ` [PATCH 19/22] target/i386: move operand load and writeback out of gen_cmovcc1 Paolo Bonzini
2023-12-28 22:29 ` Richard Henderson
2023-12-22 18:16 ` [PATCH 20/22] target/i386: adjust decoding of J operand Paolo Bonzini
2023-12-22 18:16 ` [PATCH 21/22] target/i386: introduce flags writeback mechanism Paolo Bonzini
2023-12-28 22:46 ` Richard Henderson
2023-12-22 18:16 ` [PATCH 22/22] target/i386: implement CMPccXADD Paolo Bonzini
2023-12-28 23:04 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).