* [PATCH 0/2] target/i386: reimplement fp2fp conversion instructions
@ 2023-08-29 16:53 Paolo Bonzini
2023-08-29 16:53 ` [PATCH 1/2] target/i386: generalize operand size "ph" for use in CVTPS2PD Paolo Bonzini
2023-08-29 16:53 ` [PATCH 2/2] target/i386: fix memory operand size for CVTPS2PD Paolo Bonzini
0 siblings, 2 replies; 5+ messages in thread
From: Paolo Bonzini @ 2023-08-29 16:53 UTC (permalink / raw)
To: qemu-devel
CVTPS2PD only loads a half-register for memory, unlike the other
operations under 0x0F 0x5A. Therefore, it is unlike other
"unary" floating point operations, that load a full register
from memory in their packed incarnation.
To fix it, reimplement the four operations under 0x0F 0x5A
(CVTSS2SD, CVTSD2SS, CVTPS2PD, CVTPD2PS) individually.
Paolo
Paolo Bonzini (2):
target/i386: generalize operand size "ph" for use in CVTPS2PD
target/i386: fix memory operand size for CVTPS2PD
target/i386/tcg/decode-new.c.inc | 20 +++++++++++++++-----
target/i386/tcg/decode-new.h | 2 +-
target/i386/tcg/emit.c.inc | 30 +++++++++++++++++++++++++-----
3 files changed, 41 insertions(+), 11 deletions(-)
--
2.41.0
^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH 1/2] target/i386: generalize operand size "ph" for use in CVTPS2PD
2023-08-29 16:53 [PATCH 0/2] target/i386: reimplement fp2fp conversion instructions Paolo Bonzini
@ 2023-08-29 16:53 ` Paolo Bonzini
2023-08-29 17:32 ` Richard Henderson
2023-08-29 16:53 ` [PATCH 2/2] target/i386: fix memory operand size for CVTPS2PD Paolo Bonzini
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2023-08-29 16:53 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable
CVTPS2PD only loads a half-register for memory, like CVTPH2PS. It can
reuse the "ph" packed half-precision size to load a half-register,
but rename it to "xh" because it is now a variation of "x" (it is not
used only for half-precision values).
Cc: qemu-stable@nongnu.org
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 6 +++---
target/i386/tcg/decode-new.h | 2 +-
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 8f93a239ddb..43c39aad2aa 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -337,7 +337,7 @@ static const X86OpEntry opcodes_0F38_00toEF[240] = {
[0x07] = X86_OP_ENTRY3(PHSUBSW, V,x, H,x, W,x, vex4 cpuid(SSSE3) mmx avx2_256 p_00_66),
[0x10] = X86_OP_ENTRY2(PBLENDVB, V,x, W,x, vex4 cpuid(SSE41) avx2_256 p_66),
- [0x13] = X86_OP_ENTRY2(VCVTPH2PS, V,x, W,ph, vex11 cpuid(F16C) p_66),
+ [0x13] = X86_OP_ENTRY2(VCVTPH2PS, V,x, W,xh, vex11 cpuid(F16C) p_66),
[0x14] = X86_OP_ENTRY2(BLENDVPS, V,x, W,x, vex4 cpuid(SSE41) p_66),
[0x15] = X86_OP_ENTRY2(BLENDVPD, V,x, W,x, vex4 cpuid(SSE41) p_66),
/* Listed incorrectly as type 4 */
@@ -565,7 +565,7 @@ static const X86OpEntry opcodes_0F3A[256] = {
[0x15] = X86_OP_ENTRY3(PEXTRW, E,w, V,dq, I,b, vex5 cpuid(SSE41) zext0 p_66),
[0x16] = X86_OP_ENTRY3(PEXTR, E,y, V,dq, I,b, vex5 cpuid(SSE41) p_66),
[0x17] = X86_OP_ENTRY3(VEXTRACTPS, E,d, V,dq, I,b, vex5 cpuid(SSE41) p_66),
- [0x1d] = X86_OP_ENTRY3(VCVTPS2PH, W,ph, V,x, I,b, vex11 cpuid(F16C) p_66),
+ [0x1d] = X86_OP_ENTRY3(VCVTPS2PH, W,xh, V,x, I,b, vex11 cpuid(F16C) p_66),
[0x20] = X86_OP_ENTRY4(PINSRB, V,dq, H,dq, E,b, vex5 cpuid(SSE41) zext2 p_66),
[0x21] = X86_OP_GROUP0(VINSERTPS),
@@ -1104,7 +1104,7 @@ static bool decode_op_size(DisasContext *s, X86OpEntry *e, X86OpSize size, MemOp
*ot = s->vex_l ? MO_256 : MO_128;
return true;
- case X86_SIZE_ph: /* SSE/AVX packed half precision */
+ case X86_SIZE_xh: /* SSE/AVX packed half register */
*ot = s->vex_l ? MO_128 : MO_64;
return true;
diff --git a/target/i386/tcg/decode-new.h b/target/i386/tcg/decode-new.h
index cb6b8bcf678..a542ec16813 100644
--- a/target/i386/tcg/decode-new.h
+++ b/target/i386/tcg/decode-new.h
@@ -92,7 +92,7 @@ typedef enum X86OpSize {
/* Custom */
X86_SIZE_d64,
X86_SIZE_f64,
- X86_SIZE_ph, /* SSE/AVX packed half precision */
+ X86_SIZE_xh, /* SSE/AVX packed half register */
} X86OpSize;
typedef enum X86CPUIDFeature {
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH 2/2] target/i386: fix memory operand size for CVTPS2PD
2023-08-29 16:53 [PATCH 0/2] target/i386: reimplement fp2fp conversion instructions Paolo Bonzini
2023-08-29 16:53 ` [PATCH 1/2] target/i386: generalize operand size "ph" for use in CVTPS2PD Paolo Bonzini
@ 2023-08-29 16:53 ` Paolo Bonzini
2023-08-29 17:36 ` Richard Henderson
1 sibling, 1 reply; 5+ messages in thread
From: Paolo Bonzini @ 2023-08-29 16:53 UTC (permalink / raw)
To: qemu-devel; +Cc: qemu-stable
CVTPS2PD only loads a half-register for memory, unlike the other
operations under 0x0F 0x5A. "Unpack" the group into separate
emission functions instead of using gen_unary_fp_sse.
Cc: qemu-stable@nongnu.org
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1377
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
target/i386/tcg/decode-new.c.inc | 14 ++++++++++++--
target/i386/tcg/emit.c.inc | 30 +++++++++++++++++++++++++-----
2 files changed, 37 insertions(+), 7 deletions(-)
diff --git a/target/i386/tcg/decode-new.c.inc b/target/i386/tcg/decode-new.c.inc
index 43c39aad2aa..0db19cda3b7 100644
--- a/target/i386/tcg/decode-new.c.inc
+++ b/target/i386/tcg/decode-new.c.inc
@@ -805,10 +805,20 @@ static void decode_sse_unary(DisasContext *s, CPUX86State *env, X86OpEntry *entr
case 0x51: entry->gen = gen_VSQRT; break;
case 0x52: entry->gen = gen_VRSQRT; break;
case 0x53: entry->gen = gen_VRCP; break;
- case 0x5A: entry->gen = gen_VCVTfp2fp; break;
}
}
+static void decode_0F5A(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
+{
+ static const X86OpEntry opcodes_0F5A[4] = {
+ X86_OP_ENTRY2(VCVTPS2PD, V,x, W,xh, vex2), /* VCVTPS2PD */
+ X86_OP_ENTRY2(VCVTPD2PS, V,x, W,x, vex2), /* VCVTPD2PS */
+ X86_OP_ENTRY3(VCVTSS2SD, V,x, H,x, W,x, vex2_rep3), /* VCVTSS2SD */
+ X86_OP_ENTRY3(VCVTSD2SS, V,x, H,x, W,x, vex2_rep3), /* VCVTSD2SS */
+ };
+ *entry = *decode_by_prefix(s, opcodes_0F5A);
+}
+
static void decode_0F5B(DisasContext *s, CPUX86State *env, X86OpEntry *entry, uint8_t *b)
{
static const X86OpEntry opcodes_0F5B[4] = {
@@ -891,7 +901,7 @@ static const X86OpEntry opcodes_0F[256] = {
[0x58] = X86_OP_ENTRY3(VADD, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
[0x59] = X86_OP_ENTRY3(VMUL, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
- [0x5a] = X86_OP_GROUP3(sse_unary, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2), /* CVTPS2PD */
+ [0x5a] = X86_OP_GROUP0(0F5A),
[0x5b] = X86_OP_GROUP0(0F5B),
[0x5c] = X86_OP_ENTRY3(VSUB, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
[0x5d] = X86_OP_ENTRY3(VMIN, V,x, H,x, W,x, vex2_rep3 p_00_66_f3_f2),
diff --git a/target/i386/tcg/emit.c.inc b/target/i386/tcg/emit.c.inc
index 4fe8dec4274..45a3e55cbfb 100644
--- a/target/i386/tcg/emit.c.inc
+++ b/target/i386/tcg/emit.c.inc
@@ -1914,12 +1914,22 @@ static void gen_VCOMI(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
set_cc_op(s, CC_OP_EFLAGS);
}
-static void gen_VCVTfp2fp(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+static void gen_VCVTPD2PS(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
- gen_unary_fp_sse(s, env, decode,
- gen_helper_cvtpd2ps_xmm, gen_helper_cvtps2pd_xmm,
- gen_helper_cvtpd2ps_ymm, gen_helper_cvtps2pd_ymm,
- gen_helper_cvtsd2ss, gen_helper_cvtss2sd);
+ if (s->vex_l) {
+ gen_helper_cvtpd2ps_ymm(cpu_env, OP_PTR0, OP_PTR2);
+ } else {
+ gen_helper_cvtpd2ps_xmm(cpu_env, OP_PTR0, OP_PTR2);
+ }
+}
+
+static void gen_VCVTPS2PD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+{
+ if (s->vex_l) {
+ gen_helper_cvtps2pd_ymm(cpu_env, OP_PTR0, OP_PTR2);
+ } else {
+ gen_helper_cvtps2pd_xmm(cpu_env, OP_PTR0, OP_PTR2);
+ }
}
static void gen_VCVTPS2PH(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
@@ -1936,6 +1946,16 @@ static void gen_VCVTPS2PH(DisasContext *s, CPUX86State *env, X86DecodedInsn *dec
}
}
+static void gen_VCVTSD2SS(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+{
+ gen_helper_cvtsd2ss(cpu_env, OP_PTR0, OP_PTR1, OP_PTR2);
+}
+
+static void gen_VCVTSS2SD(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
+{
+ gen_helper_cvtss2sd(cpu_env, OP_PTR0, OP_PTR1, OP_PTR2);
+}
+
static void gen_VCVTSI2Sx(DisasContext *s, CPUX86State *env, X86DecodedInsn *decode)
{
int vec_len = vector_len(s, decode);
--
2.41.0
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH 1/2] target/i386: generalize operand size "ph" for use in CVTPS2PD
2023-08-29 16:53 ` [PATCH 1/2] target/i386: generalize operand size "ph" for use in CVTPS2PD Paolo Bonzini
@ 2023-08-29 17:32 ` Richard Henderson
0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2023-08-29 17:32 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: qemu-stable
On 8/29/23 09:53, Paolo Bonzini wrote:
> CVTPS2PD only loads a half-register for memory, like CVTPH2PS. It can
> reuse the "ph" packed half-precision size to load a half-register,
> but rename it to "xh" because it is now a variation of "x" (it is not
> used only for half-precision values).
>
> Cc:qemu-stable@nongnu.org
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 6 +++---
> target/i386/tcg/decode-new.h | 2 +-
> 2 files changed, 4 insertions(+), 4 deletions(-)
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH 2/2] target/i386: fix memory operand size for CVTPS2PD
2023-08-29 16:53 ` [PATCH 2/2] target/i386: fix memory operand size for CVTPS2PD Paolo Bonzini
@ 2023-08-29 17:36 ` Richard Henderson
0 siblings, 0 replies; 5+ messages in thread
From: Richard Henderson @ 2023-08-29 17:36 UTC (permalink / raw)
To: Paolo Bonzini, qemu-devel; +Cc: qemu-stable
On 8/29/23 09:53, Paolo Bonzini wrote:
> CVTPS2PD only loads a half-register for memory, unlike the other
> operations under 0x0F 0x5A. "Unpack" the group into separate
> emission functions instead of using gen_unary_fp_sse.
>
> Cc:qemu-stable@nongnu.org
> Resolves:https://gitlab.com/qemu-project/qemu/-/issues/1377
> Signed-off-by: Paolo Bonzini<pbonzini@redhat.com>
> ---
> target/i386/tcg/decode-new.c.inc | 14 ++++++++++++--
> target/i386/tcg/emit.c.inc | 30 +++++++++++++++++++++++++-----
> 2 files changed, 37
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
r~
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2023-08-29 17:36 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-29 16:53 [PATCH 0/2] target/i386: reimplement fp2fp conversion instructions Paolo Bonzini
2023-08-29 16:53 ` [PATCH 1/2] target/i386: generalize operand size "ph" for use in CVTPS2PD Paolo Bonzini
2023-08-29 17:32 ` Richard Henderson
2023-08-29 16:53 ` [PATCH 2/2] target/i386: fix memory operand size for CVTPS2PD Paolo Bonzini
2023-08-29 17:36 ` Richard Henderson
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).