* [PATCH v2 0/7] target/sparc: vis fixes
@ 2024-05-02 16:55 Richard Henderson
2024-05-02 16:55 ` [PATCH v2 1/7] linux-user/sparc: Add more hwcap bits for sparc64 Richard Henderson
` (8 more replies)
0 siblings, 9 replies; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
Split out from my vis4 patch set, with just the bug fixes.
I've fixed the issue in patch 6, as noticed by Mark, but
include the follow-up that cleans up all of the macros by
removing them.
r~
Richard Henderson (7):
linux-user/sparc: Add more hwcap bits for sparc64
target/sparc: Fix FEXPAND
target/sparc: Fix FMUL8x16
target/sparc: Fix FMUL8x16A{U,L}
target/sparc: Fix FMULD8*X16
target/sparc: Fix FPMERGE
target/sparc: Split out do_ms16b
target/sparc/helper.h | 11 +--
target/sparc/insns.decode | 2 +-
linux-user/elfload.c | 48 +++++++---
target/sparc/translate.c | 129 ++++++++++++++++++++++---
target/sparc/vis_helper.c | 195 ++++++++++----------------------------
5 files changed, 207 insertions(+), 178 deletions(-)
--
2.34.1
^ permalink raw reply [flat|nested] 18+ messages in thread
* [PATCH v2 1/7] linux-user/sparc: Add more hwcap bits for sparc64
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
@ 2024-05-02 16:55 ` Richard Henderson
2024-05-02 16:55 ` [PATCH v2 2/7] target/sparc: Fix FEXPAND Richard Henderson
` (7 subsequent siblings)
8 siblings, 0 replies; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
Supply HWCAP_SPARC_V8PLUS, HWCAP_SPARC_MUL32, HWCAP_SPARC_DIV32,
HWCAP_SPARC_POPC, HWCAP_SPARC_FSMULD, HWCAP_SPARC_VIS, HWCAP_SPARC_VIS2.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
linux-user/elfload.c | 48 +++++++++++++++++++++++++++++++-------------
1 file changed, 34 insertions(+), 14 deletions(-)
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index f9461d2844..14f08b64a1 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -968,24 +968,44 @@ const char *elf_hwcap2_str(uint32_t bit)
#endif /* TARGET_ARM */
#ifdef TARGET_SPARC
-#ifdef TARGET_SPARC64
-#define ELF_HWCAP (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR | HWCAP_SPARC_SWAP \
- | HWCAP_SPARC_MULDIV | HWCAP_SPARC_V9)
-#ifndef TARGET_ABI32
-#define elf_check_arch(x) ( (x) == EM_SPARCV9 || (x) == EM_SPARC32PLUS )
+#ifndef TARGET_SPARC64
+# define ELF_CLASS ELFCLASS32
+# define ELF_ARCH EM_SPARC
+#elif defined(TARGET_ABI32)
+# define ELF_CLASS ELFCLASS32
+# define elf_check_arch(x) ((x) == EM_SPARC32PLUS || (x) == EM_SPARC)
#else
-#define elf_check_arch(x) ( (x) == EM_SPARC32PLUS || (x) == EM_SPARC )
+# define ELF_CLASS ELFCLASS64
+# define ELF_ARCH EM_SPARCV9
#endif
-#define ELF_CLASS ELFCLASS64
-#define ELF_ARCH EM_SPARCV9
-#else
-#define ELF_HWCAP (HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR | HWCAP_SPARC_SWAP \
- | HWCAP_SPARC_MULDIV)
-#define ELF_CLASS ELFCLASS32
-#define ELF_ARCH EM_SPARC
-#endif /* TARGET_SPARC64 */
+#include "elf.h"
+
+#define ELF_HWCAP get_elf_hwcap()
+
+static uint32_t get_elf_hwcap(void)
+{
+ /* There are not many sparc32 hwcap bits -- we have all of them. */
+ uint32_t r = HWCAP_SPARC_FLUSH | HWCAP_SPARC_STBAR |
+ HWCAP_SPARC_SWAP | HWCAP_SPARC_MULDIV;
+
+#ifdef TARGET_SPARC64
+ CPUSPARCState *env = cpu_env(thread_cpu);
+ uint32_t features = env->def.features;
+
+ r |= HWCAP_SPARC_V9 | HWCAP_SPARC_V8PLUS;
+ /* 32x32 multiply and divide are efficient. */
+ r |= HWCAP_SPARC_MUL32 | HWCAP_SPARC_DIV32;
+ /* We don't have an internal feature bit for this. */
+ r |= HWCAP_SPARC_POPC;
+ r |= features & CPU_FEATURE_FSMULD ? HWCAP_SPARC_FSMULD : 0;
+ r |= features & CPU_FEATURE_VIS1 ? HWCAP_SPARC_VIS : 0;
+ r |= features & CPU_FEATURE_VIS2 ? HWCAP_SPARC_VIS2 : 0;
+#endif
+
+ return r;
+}
static inline void init_thread(struct target_pt_regs *regs,
struct image_info *infop)
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 2/7] target/sparc: Fix FEXPAND
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
2024-05-02 16:55 ` [PATCH v2 1/7] linux-user/sparc: Add more hwcap bits for sparc64 Richard Henderson
@ 2024-05-02 16:55 ` Richard Henderson
2024-05-03 18:34 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 3/7] target/sparc: Fix FMUL8x16 Richard Henderson
` (6 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
This is a 2-operand instruction, not 3-operand.
Worse, we took the source from the wrong operand.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/sparc/helper.h | 2 +-
target/sparc/insns.decode | 2 +-
target/sparc/translate.c | 20 +++++++++++++++++++-
target/sparc/vis_helper.c | 6 +++---
4 files changed, 24 insertions(+), 6 deletions(-)
diff --git a/target/sparc/helper.h b/target/sparc/helper.h
index b8087d0d2b..57ab755ffd 100644
--- a/target/sparc/helper.h
+++ b/target/sparc/helper.h
@@ -102,7 +102,7 @@ DEF_HELPER_FLAGS_2(fmul8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmul8ulx16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmuld8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmuld8ulx16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-DEF_HELPER_FLAGS_2(fexpand, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_1(fexpand, TCG_CALL_NO_RWG_SE, i64, i32)
DEF_HELPER_FLAGS_3(pdist, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
DEF_HELPER_FLAGS_2(fpack16, TCG_CALL_NO_RWG_SE, i32, i64, i64)
DEF_HELPER_FLAGS_3(fpack32, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
diff --git a/target/sparc/insns.decode b/target/sparc/insns.decode
index 2d26404cb2..e2d8a07dc4 100644
--- a/target/sparc/insns.decode
+++ b/target/sparc/insns.decode
@@ -352,7 +352,7 @@ FCMPEq 10 000 cc:2 110101 rs1:5 0 0101 0111 rs2:5
FALIGNDATAg 10 ..... 110110 ..... 0 0100 1000 ..... @r_r_r
FPMERGE 10 ..... 110110 ..... 0 0100 1011 ..... @r_r_r
BSHUFFLE 10 ..... 110110 ..... 0 0100 1100 ..... @r_r_r
- FEXPAND 10 ..... 110110 ..... 0 0100 1101 ..... @r_r_r
+ FEXPAND 10 ..... 110110 00000 0 0100 1101 ..... @r_r2
FSRCd 10 ..... 110110 ..... 0 0111 0100 00000 @r_r1 # FSRC1d
FSRCs 10 ..... 110110 ..... 0 0111 0101 00000 @r_r1 # FSRC1s
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 571b3e3f03..dfcfe855a1 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -4358,6 +4358,25 @@ TRANS(FSQRTd, ALL, do_env_dd, a, gen_helper_fsqrtd)
TRANS(FxTOd, 64, do_env_dd, a, gen_helper_fxtod)
TRANS(FdTOx, 64, do_env_dd, a, gen_helper_fdtox)
+static bool do_df(DisasContext *dc, arg_r_r *a,
+ void (*func)(TCGv_i64, TCGv_i32))
+{
+ TCGv_i64 dst;
+ TCGv_i32 src;
+
+ if (gen_trap_ifnofpu(dc)) {
+ return true;
+ }
+
+ dst = tcg_temp_new_i64();
+ src = gen_load_fpr_F(dc, a->rs);
+ func(dst, src);
+ gen_store_fpr_D(dc, a->rd, dst);
+ return advance_pc(dc);
+}
+
+TRANS(FEXPAND, VIS1, do_df, a, gen_helper_fexpand)
+
static bool do_env_df(DisasContext *dc, arg_r_r *a,
void (*func)(TCGv_i64, TCGv_env, TCGv_i32))
{
@@ -4589,7 +4608,6 @@ TRANS(FMUL8ULx16, VIS1, do_ddd, a, gen_helper_fmul8ulx16)
TRANS(FMULD8SUx16, VIS1, do_ddd, a, gen_helper_fmuld8sux16)
TRANS(FMULD8ULx16, VIS1, do_ddd, a, gen_helper_fmuld8ulx16)
TRANS(FPMERGE, VIS1, do_ddd, a, gen_helper_fpmerge)
-TRANS(FEXPAND, VIS1, do_ddd, a, gen_helper_fexpand)
TRANS(FPADD16, VIS1, do_ddd, a, tcg_gen_vec_add16_i64)
TRANS(FPADD32, VIS1, do_ddd, a, tcg_gen_vec_add32_i64)
diff --git a/target/sparc/vis_helper.c b/target/sparc/vis_helper.c
index 7763b16c24..db2e6dd6c1 100644
--- a/target/sparc/vis_helper.c
+++ b/target/sparc/vis_helper.c
@@ -260,13 +260,13 @@ uint64_t helper_fmuld8ulx16(uint64_t src1, uint64_t src2)
return d.ll;
}
-uint64_t helper_fexpand(uint64_t src1, uint64_t src2)
+uint64_t helper_fexpand(uint32_t src2)
{
VIS32 s;
VIS64 d;
- s.l = (uint32_t)src1;
- d.ll = src2;
+ s.l = src2;
+ d.ll = 0;
d.VIS_W64(0) = s.VIS_B32(0) << 4;
d.VIS_W64(1) = s.VIS_B32(1) << 4;
d.VIS_W64(2) = s.VIS_B32(2) << 4;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 3/7] target/sparc: Fix FMUL8x16
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
2024-05-02 16:55 ` [PATCH v2 1/7] linux-user/sparc: Add more hwcap bits for sparc64 Richard Henderson
2024-05-02 16:55 ` [PATCH v2 2/7] target/sparc: Fix FEXPAND Richard Henderson
@ 2024-05-02 16:55 ` Richard Henderson
2024-05-03 18:31 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 4/7] target/sparc: Fix FMUL8x16A{U,L} Richard Henderson
` (5 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
This instruction has f32 as source1, which alters the
decoding of the register number, which means we've been
passing the wrong data for odd register numbers.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/sparc/helper.h | 2 +-
target/sparc/translate.c | 21 ++++++++++++++++++++-
target/sparc/vis_helper.c | 9 +++++----
3 files changed, 26 insertions(+), 6 deletions(-)
diff --git a/target/sparc/helper.h b/target/sparc/helper.h
index 57ab755ffd..27dc604cac 100644
--- a/target/sparc/helper.h
+++ b/target/sparc/helper.h
@@ -95,7 +95,7 @@ DEF_HELPER_FLAGS_2(fdtox, TCG_CALL_NO_WG, s64, env, f64)
DEF_HELPER_FLAGS_2(fqtox, TCG_CALL_NO_WG, s64, env, i128)
DEF_HELPER_FLAGS_2(fpmerge, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-DEF_HELPER_FLAGS_2(fmul8x16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(fmul8x16, TCG_CALL_NO_RWG_SE, i64, i32, i64)
DEF_HELPER_FLAGS_2(fmul8x16al, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmul8x16au, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmul8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index dfcfe855a1..c4adc148d2 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -4583,6 +4583,26 @@ TRANS(FSUBs, ALL, do_env_fff, a, gen_helper_fsubs)
TRANS(FMULs, ALL, do_env_fff, a, gen_helper_fmuls)
TRANS(FDIVs, ALL, do_env_fff, a, gen_helper_fdivs)
+static bool do_dfd(DisasContext *dc, arg_r_r_r *a,
+ void (*func)(TCGv_i64, TCGv_i32, TCGv_i64))
+{
+ TCGv_i64 dst, src2;
+ TCGv_i32 src1;
+
+ if (gen_trap_ifnofpu(dc)) {
+ return true;
+ }
+
+ dst = gen_dest_fpr_D(dc, a->rd);
+ src1 = gen_load_fpr_F(dc, a->rs1);
+ src2 = gen_load_fpr_D(dc, a->rs2);
+ func(dst, src1, src2);
+ gen_store_fpr_D(dc, a->rd, dst);
+ return advance_pc(dc);
+}
+
+TRANS(FMUL8x16, VIS1, do_dfd, a, gen_helper_fmul8x16)
+
static bool do_ddd(DisasContext *dc, arg_r_r_r *a,
void (*func)(TCGv_i64, TCGv_i64, TCGv_i64))
{
@@ -4600,7 +4620,6 @@ static bool do_ddd(DisasContext *dc, arg_r_r_r *a,
return advance_pc(dc);
}
-TRANS(FMUL8x16, VIS1, do_ddd, a, gen_helper_fmul8x16)
TRANS(FMUL8x16AU, VIS1, do_ddd, a, gen_helper_fmul8x16au)
TRANS(FMUL8x16AL, VIS1, do_ddd, a, gen_helper_fmul8x16al)
TRANS(FMUL8SUx16, VIS1, do_ddd, a, gen_helper_fmul8sux16)
diff --git a/target/sparc/vis_helper.c b/target/sparc/vis_helper.c
index db2e6dd6c1..7728ffe9c6 100644
--- a/target/sparc/vis_helper.c
+++ b/target/sparc/vis_helper.c
@@ -94,16 +94,17 @@ uint64_t helper_fpmerge(uint64_t src1, uint64_t src2)
return d.ll;
}
-uint64_t helper_fmul8x16(uint64_t src1, uint64_t src2)
+uint64_t helper_fmul8x16(uint32_t src1, uint64_t src2)
{
- VIS64 s, d;
+ VIS64 d;
+ VIS32 s;
uint32_t tmp;
- s.ll = src1;
+ s.l = src1;
d.ll = src2;
#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(r) * (int32_t)s.VIS_B64(r); \
+ tmp = (int32_t)d.VIS_SW64(r) * (int32_t)s.VIS_B32(r); \
if ((tmp & 0xff) > 0x7f) { \
tmp += 0x100; \
} \
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 4/7] target/sparc: Fix FMUL8x16A{U,L}
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
` (2 preceding siblings ...)
2024-05-02 16:55 ` [PATCH v2 3/7] target/sparc: Fix FMUL8x16 Richard Henderson
@ 2024-05-02 16:55 ` Richard Henderson
2024-05-03 18:32 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 5/7] target/sparc: Fix FMULD8*X16 Richard Henderson
` (4 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
These instructions have f32 inputs, which changes the decode
of the register numbers. While we're fixing things, use a
common helper for both insns, extracting the 16-bit scalar
in tcg beforehand.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/sparc/helper.h | 3 +--
target/sparc/translate.c | 38 +++++++++++++++++++++++++++----
target/sparc/vis_helper.c | 47 +++++++++++----------------------------
3 files changed, 48 insertions(+), 40 deletions(-)
diff --git a/target/sparc/helper.h b/target/sparc/helper.h
index 27dc604cac..9cde2b69a5 100644
--- a/target/sparc/helper.h
+++ b/target/sparc/helper.h
@@ -96,8 +96,7 @@ DEF_HELPER_FLAGS_2(fqtox, TCG_CALL_NO_WG, s64, env, i128)
DEF_HELPER_FLAGS_2(fpmerge, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmul8x16, TCG_CALL_NO_RWG_SE, i64, i32, i64)
-DEF_HELPER_FLAGS_2(fmul8x16al, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-DEF_HELPER_FLAGS_2(fmul8x16au, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(fmul8x16a, TCG_CALL_NO_RWG_SE, i64, i32, s32)
DEF_HELPER_FLAGS_2(fmul8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmul8ulx16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmuld8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index c4adc148d2..a8ada6934a 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -45,6 +45,7 @@
# define gen_helper_clear_softint(E, S) qemu_build_not_reached()
# define gen_helper_done(E) qemu_build_not_reached()
# define gen_helper_flushw(E) qemu_build_not_reached()
+# define gen_helper_fmul8x16a(D, S1, S2) qemu_build_not_reached()
# define gen_helper_rdccr(D, E) qemu_build_not_reached()
# define gen_helper_rdcwp(D, E) qemu_build_not_reached()
# define gen_helper_restored(E) qemu_build_not_reached()
@@ -72,8 +73,6 @@
# define gen_helper_fexpand ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fmul8sux16 ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fmul8ulx16 ({ qemu_build_not_reached(); NULL; })
-# define gen_helper_fmul8x16al ({ qemu_build_not_reached(); NULL; })
-# define gen_helper_fmul8x16au ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fmul8x16 ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fmuld8sux16 ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fmuld8ulx16 ({ qemu_build_not_reached(); NULL; })
@@ -719,6 +718,18 @@ static void gen_op_bshuffle(TCGv_i64 dst, TCGv_i64 src1, TCGv_i64 src2)
#endif
}
+static void gen_op_fmul8x16al(TCGv_i64 dst, TCGv_i32 src1, TCGv_i32 src2)
+{
+ tcg_gen_ext16s_i32(src2, src2);
+ gen_helper_fmul8x16a(dst, src1, src2);
+}
+
+static void gen_op_fmul8x16au(TCGv_i64 dst, TCGv_i32 src1, TCGv_i32 src2)
+{
+ tcg_gen_sari_i32(src2, src2, 16);
+ gen_helper_fmul8x16a(dst, src1, src2);
+}
+
static void finishing_insn(DisasContext *dc)
{
/*
@@ -4583,6 +4594,27 @@ TRANS(FSUBs, ALL, do_env_fff, a, gen_helper_fsubs)
TRANS(FMULs, ALL, do_env_fff, a, gen_helper_fmuls)
TRANS(FDIVs, ALL, do_env_fff, a, gen_helper_fdivs)
+static bool do_dff(DisasContext *dc, arg_r_r_r *a,
+ void (*func)(TCGv_i64, TCGv_i32, TCGv_i32))
+{
+ TCGv_i64 dst;
+ TCGv_i32 src1, src2;
+
+ if (gen_trap_ifnofpu(dc)) {
+ return true;
+ }
+
+ dst = gen_dest_fpr_D(dc, a->rd);
+ src1 = gen_load_fpr_F(dc, a->rs1);
+ src2 = gen_load_fpr_F(dc, a->rs2);
+ func(dst, src1, src2);
+ gen_store_fpr_D(dc, a->rd, dst);
+ return advance_pc(dc);
+}
+
+TRANS(FMUL8x16AU, VIS1, do_dff, a, gen_op_fmul8x16au)
+TRANS(FMUL8x16AL, VIS1, do_dff, a, gen_op_fmul8x16al)
+
static bool do_dfd(DisasContext *dc, arg_r_r_r *a,
void (*func)(TCGv_i64, TCGv_i32, TCGv_i64))
{
@@ -4620,8 +4652,6 @@ static bool do_ddd(DisasContext *dc, arg_r_r_r *a,
return advance_pc(dc);
}
-TRANS(FMUL8x16AU, VIS1, do_ddd, a, gen_helper_fmul8x16au)
-TRANS(FMUL8x16AL, VIS1, do_ddd, a, gen_helper_fmul8x16al)
TRANS(FMUL8SUx16, VIS1, do_ddd, a, gen_helper_fmul8sux16)
TRANS(FMUL8ULx16, VIS1, do_ddd, a, gen_helper_fmul8ulx16)
TRANS(FMULD8SUx16, VIS1, do_ddd, a, gen_helper_fmuld8sux16)
diff --git a/target/sparc/vis_helper.c b/target/sparc/vis_helper.c
index 7728ffe9c6..ff2f43c23f 100644
--- a/target/sparc/vis_helper.c
+++ b/target/sparc/vis_helper.c
@@ -119,44 +119,23 @@ uint64_t helper_fmul8x16(uint32_t src1, uint64_t src2)
return d.ll;
}
-uint64_t helper_fmul8x16al(uint64_t src1, uint64_t src2)
+uint64_t helper_fmul8x16a(uint32_t src1, int32_t src2)
{
- VIS64 s, d;
+ VIS32 s;
+ VIS64 d;
uint32_t tmp;
- s.ll = src1;
- d.ll = src2;
+ s.l = src1;
+ d.ll = 0;
-#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(1) * (int32_t)s.VIS_B64(r); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_W64(r) = tmp >> 8;
-
- PMUL(0);
- PMUL(1);
- PMUL(2);
- PMUL(3);
-#undef PMUL
-
- return d.ll;
-}
-
-uint64_t helper_fmul8x16au(uint64_t src1, uint64_t src2)
-{
- VIS64 s, d;
- uint32_t tmp;
-
- s.ll = src1;
- d.ll = src2;
-
-#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(0) * (int32_t)s.VIS_B64(r); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_W64(r) = tmp >> 8;
+#define PMUL(r) \
+ do { \
+ tmp = src2 * (int32_t)s.VIS_B32(r); \
+ if ((tmp & 0xff) > 0x7f) { \
+ tmp += 0x100; \
+ } \
+ d.VIS_W64(r) = tmp >> 8; \
+ } while (0)
PMUL(0);
PMUL(1);
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 5/7] target/sparc: Fix FMULD8*X16
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
` (3 preceding siblings ...)
2024-05-02 16:55 ` [PATCH v2 4/7] target/sparc: Fix FMUL8x16A{U,L} Richard Henderson
@ 2024-05-02 16:55 ` Richard Henderson
2024-05-06 14:12 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 6/7] target/sparc: Fix FPMERGE Richard Henderson
` (3 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
Not only do these instructions have f32 inputs, they also do not
perform rounding. Since these are relatively simple, implement
them properly inline.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/sparc/helper.h | 2 --
target/sparc/translate.c | 48 +++++++++++++++++++++++++++++++++++----
target/sparc/vis_helper.c | 46 -------------------------------------
3 files changed, 44 insertions(+), 52 deletions(-)
diff --git a/target/sparc/helper.h b/target/sparc/helper.h
index 9cde2b69a5..fcb9c617b7 100644
--- a/target/sparc/helper.h
+++ b/target/sparc/helper.h
@@ -99,8 +99,6 @@ DEF_HELPER_FLAGS_2(fmul8x16, TCG_CALL_NO_RWG_SE, i64, i32, i64)
DEF_HELPER_FLAGS_2(fmul8x16a, TCG_CALL_NO_RWG_SE, i64, i32, s32)
DEF_HELPER_FLAGS_2(fmul8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_2(fmul8ulx16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-DEF_HELPER_FLAGS_2(fmuld8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
-DEF_HELPER_FLAGS_2(fmuld8ulx16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
DEF_HELPER_FLAGS_1(fexpand, TCG_CALL_NO_RWG_SE, i64, i32)
DEF_HELPER_FLAGS_3(pdist, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
DEF_HELPER_FLAGS_2(fpack16, TCG_CALL_NO_RWG_SE, i32, i64, i64)
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index a8ada6934a..8a2894bb9f 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -74,8 +74,6 @@
# define gen_helper_fmul8sux16 ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fmul8ulx16 ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fmul8x16 ({ qemu_build_not_reached(); NULL; })
-# define gen_helper_fmuld8sux16 ({ qemu_build_not_reached(); NULL; })
-# define gen_helper_fmuld8ulx16 ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fpmerge ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fqtox ({ qemu_build_not_reached(); NULL; })
# define gen_helper_fstox ({ qemu_build_not_reached(); NULL; })
@@ -730,6 +728,48 @@ static void gen_op_fmul8x16au(TCGv_i64 dst, TCGv_i32 src1, TCGv_i32 src2)
gen_helper_fmul8x16a(dst, src1, src2);
}
+static void gen_op_fmuld8ulx16(TCGv_i64 dst, TCGv_i32 src1, TCGv_i32 src2)
+{
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ TCGv_i32 t1 = tcg_temp_new_i32();
+ TCGv_i32 t2 = tcg_temp_new_i32();
+
+ tcg_gen_ext8u_i32(t0, src1);
+ tcg_gen_ext16s_i32(t1, src2);
+ tcg_gen_mul_i32(t0, t0, t1);
+
+ tcg_gen_extract_i32(t1, src1, 16, 8);
+ tcg_gen_sextract_i32(t2, src2, 16, 16);
+ tcg_gen_mul_i32(t1, t1, t2);
+
+ tcg_gen_concat_i32_i64(dst, t0, t1);
+}
+
+static void gen_op_fmuld8sux16(TCGv_i64 dst, TCGv_i32 src1, TCGv_i32 src2)
+{
+ TCGv_i32 t0 = tcg_temp_new_i32();
+ TCGv_i32 t1 = tcg_temp_new_i32();
+ TCGv_i32 t2 = tcg_temp_new_i32();
+
+ /*
+ * The insn description talks about extracting the upper 8 bits
+ * of the signed 16-bit input rs1, performing the multiply, then
+ * shifting left by 8 bits. Instead, zap the lower 8 bits of
+ * the rs1 input, which avoids the need for two shifts.
+ */
+ tcg_gen_ext16s_i32(t0, src1);
+ tcg_gen_andi_i32(t0, t0, ~0xff);
+ tcg_gen_ext16s_i32(t1, src2);
+ tcg_gen_mul_i32(t0, t0, t1);
+
+ tcg_gen_sextract_i32(t1, src1, 16, 16);
+ tcg_gen_andi_i32(t1, t1, ~0xff);
+ tcg_gen_sextract_i32(t2, src2, 16, 16);
+ tcg_gen_mul_i32(t1, t1, t2);
+
+ tcg_gen_concat_i32_i64(dst, t0, t1);
+}
+
static void finishing_insn(DisasContext *dc)
{
/*
@@ -4614,6 +4654,8 @@ static bool do_dff(DisasContext *dc, arg_r_r_r *a,
TRANS(FMUL8x16AU, VIS1, do_dff, a, gen_op_fmul8x16au)
TRANS(FMUL8x16AL, VIS1, do_dff, a, gen_op_fmul8x16al)
+TRANS(FMULD8SUx16, VIS1, do_dff, a, gen_op_fmuld8sux16)
+TRANS(FMULD8ULx16, VIS1, do_dff, a, gen_op_fmuld8ulx16)
static bool do_dfd(DisasContext *dc, arg_r_r_r *a,
void (*func)(TCGv_i64, TCGv_i32, TCGv_i64))
@@ -4654,8 +4696,6 @@ static bool do_ddd(DisasContext *dc, arg_r_r_r *a,
TRANS(FMUL8SUx16, VIS1, do_ddd, a, gen_helper_fmul8sux16)
TRANS(FMUL8ULx16, VIS1, do_ddd, a, gen_helper_fmul8ulx16)
-TRANS(FMULD8SUx16, VIS1, do_ddd, a, gen_helper_fmuld8sux16)
-TRANS(FMULD8ULx16, VIS1, do_ddd, a, gen_helper_fmuld8ulx16)
TRANS(FPMERGE, VIS1, do_ddd, a, gen_helper_fpmerge)
TRANS(FPADD16, VIS1, do_ddd, a, tcg_gen_vec_add16_i64)
diff --git a/target/sparc/vis_helper.c b/target/sparc/vis_helper.c
index ff2f43c23f..61c61c7fea 100644
--- a/target/sparc/vis_helper.c
+++ b/target/sparc/vis_helper.c
@@ -194,52 +194,6 @@ uint64_t helper_fmul8ulx16(uint64_t src1, uint64_t src2)
return d.ll;
}
-uint64_t helper_fmuld8sux16(uint64_t src1, uint64_t src2)
-{
- VIS64 s, d;
- uint32_t tmp;
-
- s.ll = src1;
- d.ll = src2;
-
-#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(r) * ((int32_t)s.VIS_SW64(r) >> 8); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_L64(r) = tmp;
-
- /* Reverse calculation order to handle overlap */
- PMUL(1);
- PMUL(0);
-#undef PMUL
-
- return d.ll;
-}
-
-uint64_t helper_fmuld8ulx16(uint64_t src1, uint64_t src2)
-{
- VIS64 s, d;
- uint32_t tmp;
-
- s.ll = src1;
- d.ll = src2;
-
-#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(r) * ((uint32_t)s.VIS_B64(r * 2)); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_L64(r) = tmp;
-
- /* Reverse calculation order to handle overlap */
- PMUL(1);
- PMUL(0);
-#undef PMUL
-
- return d.ll;
-}
-
uint64_t helper_fexpand(uint32_t src2)
{
VIS32 s;
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 6/7] target/sparc: Fix FPMERGE
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
` (4 preceding siblings ...)
2024-05-02 16:55 ` [PATCH v2 5/7] target/sparc: Fix FMULD8*X16 Richard Henderson
@ 2024-05-02 16:55 ` Richard Henderson
2024-05-03 18:27 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 7/7] target/sparc: Split out do_ms16b Richard Henderson
` (2 subsequent siblings)
8 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
This instruction has f32 inputs, which changes the decode
of the register numbers.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/sparc/helper.h | 2 +-
target/sparc/translate.c | 2 +-
target/sparc/vis_helper.c | 27 ++++++++++++++-------------
3 files changed, 16 insertions(+), 15 deletions(-)
diff --git a/target/sparc/helper.h b/target/sparc/helper.h
index fcb9c617b7..97fbf6f66c 100644
--- a/target/sparc/helper.h
+++ b/target/sparc/helper.h
@@ -94,7 +94,7 @@ DEF_HELPER_FLAGS_2(fstox, TCG_CALL_NO_WG, s64, env, f32)
DEF_HELPER_FLAGS_2(fdtox, TCG_CALL_NO_WG, s64, env, f64)
DEF_HELPER_FLAGS_2(fqtox, TCG_CALL_NO_WG, s64, env, i128)
-DEF_HELPER_FLAGS_2(fpmerge, TCG_CALL_NO_RWG_SE, i64, i64, i64)
+DEF_HELPER_FLAGS_2(fpmerge, TCG_CALL_NO_RWG_SE, i64, i32, i32)
DEF_HELPER_FLAGS_2(fmul8x16, TCG_CALL_NO_RWG_SE, i64, i32, i64)
DEF_HELPER_FLAGS_2(fmul8x16a, TCG_CALL_NO_RWG_SE, i64, i32, s32)
DEF_HELPER_FLAGS_2(fmul8sux16, TCG_CALL_NO_RWG_SE, i64, i64, i64)
diff --git a/target/sparc/translate.c b/target/sparc/translate.c
index 8a2894bb9f..99c6f3cc72 100644
--- a/target/sparc/translate.c
+++ b/target/sparc/translate.c
@@ -4656,6 +4656,7 @@ TRANS(FMUL8x16AU, VIS1, do_dff, a, gen_op_fmul8x16au)
TRANS(FMUL8x16AL, VIS1, do_dff, a, gen_op_fmul8x16al)
TRANS(FMULD8SUx16, VIS1, do_dff, a, gen_op_fmuld8sux16)
TRANS(FMULD8ULx16, VIS1, do_dff, a, gen_op_fmuld8ulx16)
+TRANS(FPMERGE, VIS1, do_dff, a, gen_helper_fpmerge)
static bool do_dfd(DisasContext *dc, arg_r_r_r *a,
void (*func)(TCGv_i64, TCGv_i32, TCGv_i64))
@@ -4696,7 +4697,6 @@ static bool do_ddd(DisasContext *dc, arg_r_r_r *a,
TRANS(FMUL8SUx16, VIS1, do_ddd, a, gen_helper_fmul8sux16)
TRANS(FMUL8ULx16, VIS1, do_ddd, a, gen_helper_fmul8ulx16)
-TRANS(FPMERGE, VIS1, do_ddd, a, gen_helper_fpmerge)
TRANS(FPADD16, VIS1, do_ddd, a, tcg_gen_vec_add16_i64)
TRANS(FPADD32, VIS1, do_ddd, a, tcg_gen_vec_add32_i64)
diff --git a/target/sparc/vis_helper.c b/target/sparc/vis_helper.c
index 61c61c7fea..14c665cad6 100644
--- a/target/sparc/vis_helper.c
+++ b/target/sparc/vis_helper.c
@@ -74,22 +74,23 @@ typedef union {
float32 f;
} VIS32;
-uint64_t helper_fpmerge(uint64_t src1, uint64_t src2)
+uint64_t helper_fpmerge(uint32_t src1, uint32_t src2)
{
- VIS64 s, d;
+ VIS32 s1, s2;
+ VIS64 d;
- s.ll = src1;
- d.ll = src2;
+ s1.l = src1;
+ s2.l = src2;
+ d.ll = 0;
- /* Reverse calculation order to handle overlap */
- d.VIS_B64(7) = s.VIS_B64(3);
- d.VIS_B64(6) = d.VIS_B64(3);
- d.VIS_B64(5) = s.VIS_B64(2);
- d.VIS_B64(4) = d.VIS_B64(2);
- d.VIS_B64(3) = s.VIS_B64(1);
- d.VIS_B64(2) = d.VIS_B64(1);
- d.VIS_B64(1) = s.VIS_B64(0);
- /* d.VIS_B64(0) = d.VIS_B64(0); */
+ d.VIS_B64(7) = s1.VIS_B32(3);
+ d.VIS_B64(6) = s2.VIS_B32(3);
+ d.VIS_B64(5) = s1.VIS_B32(2);
+ d.VIS_B64(4) = s2.VIS_B32(2);
+ d.VIS_B64(3) = s1.VIS_B32(1);
+ d.VIS_B64(2) = s2.VIS_B32(1);
+ d.VIS_B64(1) = s1.VIS_B32(0);
+ d.VIS_B64(0) = s2.VIS_B32(0);
return d.ll;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* [PATCH v2 7/7] target/sparc: Split out do_ms16b
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
` (5 preceding siblings ...)
2024-05-02 16:55 ` [PATCH v2 6/7] target/sparc: Fix FPMERGE Richard Henderson
@ 2024-05-02 16:55 ` Richard Henderson
2024-05-03 19:11 ` Philippe Mathieu-Daudé
2024-05-03 18:18 ` [PATCH v2 0/7] target/sparc: vis fixes Philippe Mathieu-Daudé
2024-05-05 20:05 ` Mark Cave-Ayland
8 siblings, 1 reply; 18+ messages in thread
From: Richard Henderson @ 2024-05-02 16:55 UTC (permalink / raw)
To: qemu-devel; +Cc: mark.cave-ayland
The unit operation for fmul8x16 and friends is described in the
manual as "MS16b". Split that out for clarity. Improve rounding
with an unconditional addition of 0.5 as a fixed-point integer.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
target/sparc/vis_helper.c | 78 ++++++++++++---------------------------
1 file changed, 24 insertions(+), 54 deletions(-)
diff --git a/target/sparc/vis_helper.c b/target/sparc/vis_helper.c
index 14c665cad6..e15c6bb34e 100644
--- a/target/sparc/vis_helper.c
+++ b/target/sparc/vis_helper.c
@@ -44,6 +44,7 @@ target_ulong helper_array8(target_ulong pixel_addr, target_ulong cubesize)
#if HOST_BIG_ENDIAN
#define VIS_B64(n) b[7 - (n)]
+#define VIS_SB64(n) sb[7 - (n)]
#define VIS_W64(n) w[3 - (n)]
#define VIS_SW64(n) sw[3 - (n)]
#define VIS_L64(n) l[1 - (n)]
@@ -51,6 +52,7 @@ target_ulong helper_array8(target_ulong pixel_addr, target_ulong cubesize)
#define VIS_W32(n) w[1 - (n)]
#else
#define VIS_B64(n) b[n]
+#define VIS_SB64(n) sb[n]
#define VIS_W64(n) w[n]
#define VIS_SW64(n) sw[n]
#define VIS_L64(n) l[n]
@@ -60,6 +62,7 @@ target_ulong helper_array8(target_ulong pixel_addr, target_ulong cubesize)
typedef union {
uint8_t b[8];
+ int8_t sb[8];
uint16_t w[4];
int16_t sw[4];
uint32_t l[2];
@@ -95,27 +98,23 @@ uint64_t helper_fpmerge(uint32_t src1, uint32_t src2)
return d.ll;
}
+static inline int do_ms16b(int x, int y)
+{
+ return ((x * y) + 0x80) >> 8;
+}
+
uint64_t helper_fmul8x16(uint32_t src1, uint64_t src2)
{
VIS64 d;
VIS32 s;
- uint32_t tmp;
s.l = src1;
d.ll = src2;
-#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(r) * (int32_t)s.VIS_B32(r); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_W64(r) = tmp >> 8;
-
- PMUL(0);
- PMUL(1);
- PMUL(2);
- PMUL(3);
-#undef PMUL
+ d.VIS_W64(0) = do_ms16b(s.VIS_B32(0), d.VIS_SW64(0));
+ d.VIS_W64(1) = do_ms16b(s.VIS_B32(1), d.VIS_SW64(1));
+ d.VIS_W64(2) = do_ms16b(s.VIS_B32(2), d.VIS_SW64(2));
+ d.VIS_W64(3) = do_ms16b(s.VIS_B32(3), d.VIS_SW64(3));
return d.ll;
}
@@ -124,25 +123,14 @@ uint64_t helper_fmul8x16a(uint32_t src1, int32_t src2)
{
VIS32 s;
VIS64 d;
- uint32_t tmp;
s.l = src1;
d.ll = 0;
-#define PMUL(r) \
- do { \
- tmp = src2 * (int32_t)s.VIS_B32(r); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_W64(r) = tmp >> 8; \
- } while (0)
-
- PMUL(0);
- PMUL(1);
- PMUL(2);
- PMUL(3);
-#undef PMUL
+ d.VIS_W64(0) = do_ms16b(s.VIS_B32(0), src2);
+ d.VIS_W64(1) = do_ms16b(s.VIS_B32(1), src2);
+ d.VIS_W64(2) = do_ms16b(s.VIS_B32(2), src2);
+ d.VIS_W64(3) = do_ms16b(s.VIS_B32(3), src2);
return d.ll;
}
@@ -150,23 +138,14 @@ uint64_t helper_fmul8x16a(uint32_t src1, int32_t src2)
uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
{
VIS64 s, d;
- uint32_t tmp;
s.ll = src1;
d.ll = src2;
-#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(r) * ((int32_t)s.VIS_SW64(r) >> 8); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_W64(r) = tmp >> 8;
-
- PMUL(0);
- PMUL(1);
- PMUL(2);
- PMUL(3);
-#undef PMUL
+ d.VIS_W64(0) = do_ms16b(s.VIS_SB64(1), d.VIS_SW64(0));
+ d.VIS_W64(1) = do_ms16b(s.VIS_SB64(3), d.VIS_SW64(1));
+ d.VIS_W64(2) = do_ms16b(s.VIS_SB64(5), d.VIS_SW64(2));
+ d.VIS_W64(3) = do_ms16b(s.VIS_SB64(7), d.VIS_SW64(3));
return d.ll;
}
@@ -174,23 +153,14 @@ uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
uint64_t helper_fmul8ulx16(uint64_t src1, uint64_t src2)
{
VIS64 s, d;
- uint32_t tmp;
s.ll = src1;
d.ll = src2;
-#define PMUL(r) \
- tmp = (int32_t)d.VIS_SW64(r) * ((uint32_t)s.VIS_B64(r * 2)); \
- if ((tmp & 0xff) > 0x7f) { \
- tmp += 0x100; \
- } \
- d.VIS_W64(r) = tmp >> 8;
-
- PMUL(0);
- PMUL(1);
- PMUL(2);
- PMUL(3);
-#undef PMUL
+ d.VIS_W64(0) = do_ms16b(s.VIS_B64(0), d.VIS_SW64(0));
+ d.VIS_W64(1) = do_ms16b(s.VIS_B64(2), d.VIS_SW64(1));
+ d.VIS_W64(2) = do_ms16b(s.VIS_B64(4), d.VIS_SW64(2));
+ d.VIS_W64(3) = do_ms16b(s.VIS_B64(6), d.VIS_SW64(3));
return d.ll;
}
--
2.34.1
^ permalink raw reply related [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/7] target/sparc: vis fixes
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
` (6 preceding siblings ...)
2024-05-02 16:55 ` [PATCH v2 7/7] target/sparc: Split out do_ms16b Richard Henderson
@ 2024-05-03 18:18 ` Philippe Mathieu-Daudé
2024-05-05 20:13 ` Mark Cave-Ayland
2024-05-05 20:05 ` Mark Cave-Ayland
8 siblings, 1 reply; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 18:18 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 2/5/24 18:55, Richard Henderson wrote:
> Split out from my vis4 patch set, with just the bug fixes.
> I've fixed the issue in patch 6, as noticed by Mark, but
> include the follow-up that cleans up all of the macros by
> removing them.
>
>
> r~
>
>
> Richard Henderson (7):
> linux-user/sparc: Add more hwcap bits for sparc64
> target/sparc: Fix FEXPAND
> target/sparc: Fix FMUL8x16
> target/sparc: Fix FMUL8x16A{U,L}
> target/sparc: Fix FMULD8*X16
> target/sparc: Fix FPMERGE
> target/sparc: Split out do_ms16b
I'm wondering about the coverage here, since various patches
fix bugs since VIS intro in commit e9ebed4d41 from 2007, so
being broken for 17 years.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 6/7] target/sparc: Fix FPMERGE
2024-05-02 16:55 ` [PATCH v2 6/7] target/sparc: Fix FPMERGE Richard Henderson
@ 2024-05-03 18:27 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 18:27 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 2/5/24 18:55, Richard Henderson wrote:
> This instruction has f32 inputs, which changes the decode
> of the register numbers.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/sparc/helper.h | 2 +-
> target/sparc/translate.c | 2 +-
> target/sparc/vis_helper.c | 27 ++++++++++++++-------------
> 3 files changed, 16 insertions(+), 15 deletions(-)
Looking at the manual, this call for a gvec implementation.
But then I realized this is v2 and v1 already has it :P
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 3/7] target/sparc: Fix FMUL8x16
2024-05-02 16:55 ` [PATCH v2 3/7] target/sparc: Fix FMUL8x16 Richard Henderson
@ 2024-05-03 18:31 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 18:31 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 2/5/24 18:55, Richard Henderson wrote:
> This instruction has f32 as source1, which alters the
> decoding of the register number, which means we've been
> passing the wrong data for odd register numbers.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/sparc/helper.h | 2 +-
> target/sparc/translate.c | 21 ++++++++++++++++++++-
> target/sparc/vis_helper.c | 9 +++++----
> 3 files changed, 26 insertions(+), 6 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 4/7] target/sparc: Fix FMUL8x16A{U,L}
2024-05-02 16:55 ` [PATCH v2 4/7] target/sparc: Fix FMUL8x16A{U,L} Richard Henderson
@ 2024-05-03 18:32 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 18:32 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 2/5/24 18:55, Richard Henderson wrote:
> These instructions have f32 inputs, which changes the decode
> of the register numbers. While we're fixing things, use a
> common helper for both insns, extracting the 16-bit scalar
> in tcg beforehand.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/sparc/helper.h | 3 +--
> target/sparc/translate.c | 38 +++++++++++++++++++++++++++----
> target/sparc/vis_helper.c | 47 +++++++++++----------------------------
> 3 files changed, 48 insertions(+), 40 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 2/7] target/sparc: Fix FEXPAND
2024-05-02 16:55 ` [PATCH v2 2/7] target/sparc: Fix FEXPAND Richard Henderson
@ 2024-05-03 18:34 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 18:34 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 2/5/24 18:55, Richard Henderson wrote:
> This is a 2-operand instruction, not 3-operand.
> Worse, we took the source from the wrong operand.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/sparc/helper.h | 2 +-
> target/sparc/insns.decode | 2 +-
> target/sparc/translate.c | 20 +++++++++++++++++++-
> target/sparc/vis_helper.c | 6 +++---
> 4 files changed, 24 insertions(+), 6 deletions(-)
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 7/7] target/sparc: Split out do_ms16b
2024-05-02 16:55 ` [PATCH v2 7/7] target/sparc: Split out do_ms16b Richard Henderson
@ 2024-05-03 19:11 ` Philippe Mathieu-Daudé
2024-05-03 19:12 ` Philippe Mathieu-Daudé
0 siblings, 1 reply; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 19:11 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 2/5/24 18:55, Richard Henderson wrote:
> The unit operation for fmul8x16 and friends is described in the
> manual as "MS16b". Split that out for clarity. Improve rounding
> with an unconditional addition of 0.5 as a fixed-point integer.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/sparc/vis_helper.c | 78 ++++++++++++---------------------------
> 1 file changed, 24 insertions(+), 54 deletions(-)
> @@ -150,23 +138,14 @@ uint64_t helper_fmul8x16a(uint32_t src1, int32_t src2)
> uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
> {
> VIS64 s, d;
> - uint32_t tmp;
>
> s.ll = src1;
> d.ll = src2;
>
> -#define PMUL(r) \
> - tmp = (int32_t)d.VIS_SW64(r) * ((int32_t)s.VIS_SW64(r) >> 8); \
> - if ((tmp & 0xff) > 0x7f) { \
> - tmp += 0x100; \
> - } \
> - d.VIS_W64(r) = tmp >> 8;
> -
> - PMUL(0);
> - PMUL(1);
> - PMUL(2);
> - PMUL(3);
> -#undef PMUL
> + d.VIS_W64(0) = do_ms16b(s.VIS_SB64(1), d.VIS_SW64(0));
s.VIS_SB64(1) = upper bit, OK.
> + d.VIS_W64(1) = do_ms16b(s.VIS_SB64(3), d.VIS_SW64(1));
> + d.VIS_W64(2) = do_ms16b(s.VIS_SB64(5), d.VIS_SW64(2));
> + d.VIS_W64(3) = do_ms16b(s.VIS_SB64(7), d.VIS_SW64(3));
>
> return d.ll;
> }
> @@ -174,23 +153,14 @@ uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
> uint64_t helper_fmul8ulx16(uint64_t src1, uint64_t src2)
> {
> VIS64 s, d;
> - uint32_t tmp;
>
> s.ll = src1;
> d.ll = src2;
>
> -#define PMUL(r) \
> - tmp = (int32_t)d.VIS_SW64(r) * ((uint32_t)s.VIS_B64(r * 2)); \
> - if ((tmp & 0xff) > 0x7f) { \
> - tmp += 0x100; \
> - } \
> - d.VIS_W64(r) = tmp >> 8;
> -
> - PMUL(0);
> - PMUL(1);
> - PMUL(2);
> - PMUL(3);
> -#undef PMUL
> + d.VIS_W64(0) = do_ms16b(s.VIS_B64(0), d.VIS_SW64(0));
s.VIS_B64(0) for lower bit, OK.
> + d.VIS_W64(1) = do_ms16b(s.VIS_B64(2), d.VIS_SW64(1));
> + d.VIS_W64(2) = do_ms16b(s.VIS_B64(4), d.VIS_SW64(2));
> + d.VIS_W64(3) = do_ms16b(s.VIS_B64(6), d.VIS_SW64(3));
>
> return d.ll;
> }
Maybe add a comment for high/low bits in fmul8sux16/fmul8ulx16,
as it was not obvious at first. Otherwise,
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 7/7] target/sparc: Split out do_ms16b
2024-05-03 19:11 ` Philippe Mathieu-Daudé
@ 2024-05-03 19:12 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-03 19:12 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 3/5/24 21:11, Philippe Mathieu-Daudé wrote:
> On 2/5/24 18:55, Richard Henderson wrote:
>> The unit operation for fmul8x16 and friends is described in the
>> manual as "MS16b". Split that out for clarity. Improve rounding
>> with an unconditional addition of 0.5 as a fixed-point integer.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/sparc/vis_helper.c | 78 ++++++++++++---------------------------
>> 1 file changed, 24 insertions(+), 54 deletions(-)
>
>
>> @@ -150,23 +138,14 @@ uint64_t helper_fmul8x16a(uint32_t src1, int32_t
>> src2)
>> uint64_t helper_fmul8sux16(uint64_t src1, uint64_t src2)
>> {
>> VIS64 s, d;
>> - uint32_t tmp;
>> s.ll = src1;
>> d.ll = src2;
>> -#define
>> PMUL(r) \
>> - tmp = (int32_t)d.VIS_SW64(r) * ((int32_t)s.VIS_SW64(r) >>
>> 8); \
>> - if ((tmp & 0xff) > 0x7f)
>> { \
>> - tmp +=
>> 0x100; \
>> -
>> } \
>> - d.VIS_W64(r) = tmp >> 8;
>> -
>> - PMUL(0);
>> - PMUL(1);
>> - PMUL(2);
>> - PMUL(3);
>> -#undef PMUL
>> + d.VIS_W64(0) = do_ms16b(s.VIS_SB64(1), d.VIS_SW64(0));
>
> s.VIS_SB64(1) = upper bit, OK.
I meant "bits" (plural)!
>
>> + d.VIS_W64(1) = do_ms16b(s.VIS_SB64(3), d.VIS_SW64(1));
>> + d.VIS_W64(2) = do_ms16b(s.VIS_SB64(5), d.VIS_SW64(2));
>> + d.VIS_W64(3) = do_ms16b(s.VIS_SB64(7), d.VIS_SW64(3));
>> return d.ll;
>> }
>> @@ -174,23 +153,14 @@ uint64_t helper_fmul8sux16(uint64_t src1,
>> uint64_t src2)
>> uint64_t helper_fmul8ulx16(uint64_t src1, uint64_t src2)
>> {
>> VIS64 s, d;
>> - uint32_t tmp;
>> s.ll = src1;
>> d.ll = src2;
>> -#define
>> PMUL(r) \
>> - tmp = (int32_t)d.VIS_SW64(r) * ((uint32_t)s.VIS_B64(r *
>> 2)); \
>> - if ((tmp & 0xff) > 0x7f)
>> { \
>> - tmp +=
>> 0x100; \
>> -
>> } \
>> - d.VIS_W64(r) = tmp >> 8;
>> -
>> - PMUL(0);
>> - PMUL(1);
>> - PMUL(2);
>> - PMUL(3);
>> -#undef PMUL
>> + d.VIS_W64(0) = do_ms16b(s.VIS_B64(0), d.VIS_SW64(0));
>
> s.VIS_B64(0) for lower bit, OK.
Ditto.
>
>> + d.VIS_W64(1) = do_ms16b(s.VIS_B64(2), d.VIS_SW64(1));
>> + d.VIS_W64(2) = do_ms16b(s.VIS_B64(4), d.VIS_SW64(2));
>> + d.VIS_W64(3) = do_ms16b(s.VIS_B64(6), d.VIS_SW64(3));
>> return d.ll;
>> }
>
> Maybe add a comment for high/low bits in fmul8sux16/fmul8ulx16,
> as it was not obvious at first. Otherwise,
>
> Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
>
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/7] target/sparc: vis fixes
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
` (7 preceding siblings ...)
2024-05-03 18:18 ` [PATCH v2 0/7] target/sparc: vis fixes Philippe Mathieu-Daudé
@ 2024-05-05 20:05 ` Mark Cave-Ayland
8 siblings, 0 replies; 18+ messages in thread
From: Mark Cave-Ayland @ 2024-05-05 20:05 UTC (permalink / raw)
To: Richard Henderson, qemu-devel
On 02/05/2024 17:55, Richard Henderson wrote:
> Split out from my vis4 patch set, with just the bug fixes.
> I've fixed the issue in patch 6, as noticed by Mark, but
> include the follow-up that cleans up all of the macros by
> removing them.
>
> r~
>
>
> Richard Henderson (7):
> linux-user/sparc: Add more hwcap bits for sparc64
> target/sparc: Fix FEXPAND
> target/sparc: Fix FMUL8x16
> target/sparc: Fix FMUL8x16A{U,L}
> target/sparc: Fix FMULD8*X16
> target/sparc: Fix FPMERGE
> target/sparc: Split out do_ms16b
>
> target/sparc/helper.h | 11 +--
> target/sparc/insns.decode | 2 +-
> linux-user/elfload.c | 48 +++++++---
> target/sparc/translate.c | 129 ++++++++++++++++++++++---
> target/sparc/vis_helper.c | 195 ++++++++++----------------------------
> 5 files changed, 207 insertions(+), 178 deletions(-)
Thanks Richard. I've applied v2 to my qmeu-sparc branch, and will send a PR assuming
everything passes in GitLab.
ATB,
Mark.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 0/7] target/sparc: vis fixes
2024-05-03 18:18 ` [PATCH v2 0/7] target/sparc: vis fixes Philippe Mathieu-Daudé
@ 2024-05-05 20:13 ` Mark Cave-Ayland
0 siblings, 0 replies; 18+ messages in thread
From: Mark Cave-Ayland @ 2024-05-05 20:13 UTC (permalink / raw)
To: Philippe Mathieu-Daudé, Richard Henderson, qemu-devel
On 03/05/2024 19:18, Philippe Mathieu-Daudé wrote:
> On 2/5/24 18:55, Richard Henderson wrote:
>> Split out from my vis4 patch set, with just the bug fixes.
>> I've fixed the issue in patch 6, as noticed by Mark, but
>> include the follow-up that cleans up all of the macros by
>> removing them.
>>
>>
>> r~
>>
>>
>> Richard Henderson (7):
>> linux-user/sparc: Add more hwcap bits for sparc64
>> target/sparc: Fix FEXPAND
>> target/sparc: Fix FMUL8x16
>> target/sparc: Fix FMUL8x16A{U,L}
>> target/sparc: Fix FMULD8*X16
>> target/sparc: Fix FPMERGE
>> target/sparc: Split out do_ms16b
>
> I'm wondering about the coverage here, since various patches
> fix bugs since VIS intro in commit e9ebed4d41 from 2007, so
> being broken for 17 years.
I've definitely seen the VIS instructions in use in my test images, however I can't
recall exactly whether it was the particular ones fixed in this series. I'm certainly
keen for some more VIS instruction coverage though, especially for VIS2 and later
which I'm unlikely to come across in my day-to-day testing.
ATB,
Mark.
^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: [PATCH v2 5/7] target/sparc: Fix FMULD8*X16
2024-05-02 16:55 ` [PATCH v2 5/7] target/sparc: Fix FMULD8*X16 Richard Henderson
@ 2024-05-06 14:12 ` Philippe Mathieu-Daudé
0 siblings, 0 replies; 18+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-05-06 14:12 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: mark.cave-ayland
On 2/5/24 18:55, Richard Henderson wrote:
> Not only do these instructions have f32 inputs, they also do not
> perform rounding. Since these are relatively simple, implement
> them properly inline.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/sparc/helper.h | 2 --
> target/sparc/translate.c | 48 +++++++++++++++++++++++++++++++++++----
> target/sparc/vis_helper.c | 46 -------------------------------------
> 3 files changed, 44 insertions(+), 52 deletions(-)
> +static void gen_op_fmuld8ulx16(TCGv_i64 dst, TCGv_i32 src1, TCGv_i32 src2)
> +{
> + TCGv_i32 t0 = tcg_temp_new_i32();
> + TCGv_i32 t1 = tcg_temp_new_i32();
> + TCGv_i32 t2 = tcg_temp_new_i32();
> +
> + tcg_gen_ext8u_i32(t0, src1);
> + tcg_gen_ext16s_i32(t1, src2);
> + tcg_gen_mul_i32(t0, t0, t1);
> +
> + tcg_gen_extract_i32(t1, src1, 16, 8);
> + tcg_gen_sextract_i32(t2, src2, 16, 16);
> + tcg_gen_mul_i32(t1, t1, t2);
> +
> + tcg_gen_concat_i32_i64(dst, t0, t1);
> +}
> +
> +static void gen_op_fmuld8sux16(TCGv_i64 dst, TCGv_i32 src1, TCGv_i32 src2)
> +{
> + TCGv_i32 t0 = tcg_temp_new_i32();
> + TCGv_i32 t1 = tcg_temp_new_i32();
> + TCGv_i32 t2 = tcg_temp_new_i32();
> +
> + /*
> + * The insn description talks about extracting the upper 8 bits
> + * of the signed 16-bit input rs1, performing the multiply, then
> + * shifting left by 8 bits. Instead, zap the lower 8 bits of
> + * the rs1 input, which avoids the need for two shifts.
Nice.
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
> + */
> + tcg_gen_ext16s_i32(t0, src1);
> + tcg_gen_andi_i32(t0, t0, ~0xff);
> + tcg_gen_ext16s_i32(t1, src2);
> + tcg_gen_mul_i32(t0, t0, t1);
> +
> + tcg_gen_sextract_i32(t1, src1, 16, 16);
> + tcg_gen_andi_i32(t1, t1, ~0xff);
> + tcg_gen_sextract_i32(t2, src2, 16, 16);
> + tcg_gen_mul_i32(t1, t1, t2);
> +
> + tcg_gen_concat_i32_i64(dst, t0, t1);
> +}
^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2024-05-06 14:13 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-02 16:55 [PATCH v2 0/7] target/sparc: vis fixes Richard Henderson
2024-05-02 16:55 ` [PATCH v2 1/7] linux-user/sparc: Add more hwcap bits for sparc64 Richard Henderson
2024-05-02 16:55 ` [PATCH v2 2/7] target/sparc: Fix FEXPAND Richard Henderson
2024-05-03 18:34 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 3/7] target/sparc: Fix FMUL8x16 Richard Henderson
2024-05-03 18:31 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 4/7] target/sparc: Fix FMUL8x16A{U,L} Richard Henderson
2024-05-03 18:32 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 5/7] target/sparc: Fix FMULD8*X16 Richard Henderson
2024-05-06 14:12 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 6/7] target/sparc: Fix FPMERGE Richard Henderson
2024-05-03 18:27 ` Philippe Mathieu-Daudé
2024-05-02 16:55 ` [PATCH v2 7/7] target/sparc: Split out do_ms16b Richard Henderson
2024-05-03 19:11 ` Philippe Mathieu-Daudé
2024-05-03 19:12 ` Philippe Mathieu-Daudé
2024-05-03 18:18 ` [PATCH v2 0/7] target/sparc: vis fixes Philippe Mathieu-Daudé
2024-05-05 20:13 ` Mark Cave-Ayland
2024-05-05 20:05 ` Mark Cave-Ayland
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).