[PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store
@ 2023-09-16 22:01 Richard Henderson
  2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
                   ` (7 more replies)
  0 siblings, 8 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

For tcg generated code, use new registers with load so that we never
overlap the input address, so that we can simplify address build for
64-bit user-only.

For tcg out-of-line code, implement the host/ headers to for atomic 128-bit
load and store, reducing the cases for which we must raise EXCP_ATOMIC.


r~

Based-on: 20230916171223.521545-1-richard.henderson@linaro.org
("[PULL v2 00/39] tcg patch queue")

Richard Henderson (7):
  tcg: Add C_N2_I1
  tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
  util: Add cpuinfo for loongarch64
  tcg/loongarch64: Use cpuinfo.h
  host/include/loongarch64: Add atomic16 load and store
  accel/tcg: Remove redundant case in store_atom_16
  accel/tcg: Fix condition for store_atom_insert_al16

 .../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
 host/include/loongarch64/host/cpuinfo.h       | 21 ++++++++
 .../loongarch64/host/load-extract-al16-al8.h  | 39 ++++++++++++++
 .../loongarch64/host/store-insert-al16.h      | 12 +++++
 tcg/loongarch64/tcg-target-con-set.h          |  2 +-
 tcg/loongarch64/tcg-target.h                  |  8 +--
 accel/tcg/cputlb.c                            |  2 +-
 tcg/tcg.c                                     |  5 ++
 util/cpuinfo-loongarch.c                      | 35 +++++++++++++
 accel/tcg/ldst_atomicity.c.inc                | 14 ++---
 tcg/loongarch64/tcg-target.c.inc              | 25 +++++----
 util/meson.build                              |  2 +
 12 files changed, 189 insertions(+), 28 deletions(-)
 create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
 create mode 100644 host/include/loongarch64/host/cpuinfo.h
 create mode 100644 host/include/loongarch64/host/load-extract-al16-al8.h
 create mode 100644 host/include/loongarch64/host/store-insert-al16.h
 create mode 100644 util/cpuinfo-loongarch.c

-- 
2.34.1



^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/7] tcg: Add C_N2_I1
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
  2023-09-30 11:39   ` Jiajie Chen
  2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

Constraint with two outputs, both in new registers.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/tcg.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/tcg/tcg.c b/tcg/tcg.c
index 604fa9bf3e..fdbf79689a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -644,6 +644,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1,
 #define C_O1_I4(O1, I1, I2, I3, I4)     C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4),
 
 #define C_N1_I2(O1, I1, I2)             C_PFX3(c_n1_i2_, O1, I1, I2),
+#define C_N2_I1(O1, O2, I1)             C_PFX3(c_n2_i1_, O1, O2, I1),
 
 #define C_O2_I1(O1, O2, I1)             C_PFX3(c_o2_i1_, O1, O2, I1),
 #define C_O2_I2(O1, O2, I1, I2)         C_PFX4(c_o2_i2_, O1, O2, I1, I2),
@@ -666,6 +667,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
 #undef C_O1_I3
 #undef C_O1_I4
 #undef C_N1_I2
+#undef C_N2_I1
 #undef C_O2_I1
 #undef C_O2_I2
 #undef C_O2_I3
@@ -685,6 +687,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
 #define C_O1_I4(O1, I1, I2, I3, I4)     { .args_ct_str = { #O1, #I1, #I2, #I3, #I4 } },
 
 #define C_N1_I2(O1, I1, I2)             { .args_ct_str = { "&" #O1, #I1, #I2 } },
+#define C_N2_I1(O1, O2, I1)             { .args_ct_str = { "&" #O1, "&" #O2, #I1 } },
 
 #define C_O2_I1(O1, O2, I1)             { .args_ct_str = { #O1, #O2, #I1 } },
 #define C_O2_I2(O1, O2, I1, I2)         { .args_ct_str = { #O1, #O2, #I1, #I2 } },
@@ -706,6 +709,7 @@ static const TCGTargetOpDef constraint_sets[] = {
 #undef C_O1_I3
 #undef C_O1_I4
 #undef C_N1_I2
+#undef C_N2_I1
 #undef C_O2_I1
 #undef C_O2_I2
 #undef C_O2_I3
@@ -725,6 +729,7 @@ static const TCGTargetOpDef constraint_sets[] = {
 #define C_O1_I4(O1, I1, I2, I3, I4)     C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4)
 
 #define C_N1_I2(O1, I1, I2)             C_PFX3(c_n1_i2_, O1, I1, I2)
+#define C_N2_I1(O1, O2, I1)             C_PFX3(c_n2_i1_, O1, O2, I1)
 
 #define C_O2_I1(O1, O2, I1)             C_PFX3(c_o2_i1_, O1, O2, I1)
 #define C_O2_I2(O1, O2, I1, I2)         C_PFX4(c_o2_i2_, O1, O2, I1, I2)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
  2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
  2023-09-30 11:39   ` Jiajie Chen
  2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

Use new registers for the output, so that we never overlap
the input address, which could happen for user-only.
This avoids a "tmp = addr + 0" in that case.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/loongarch64/tcg-target-con-set.h |  2 +-
 tcg/loongarch64/tcg-target.c.inc     | 17 +++++++++++------
 2 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h
index 77d62e38e7..cae6c2aad6 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -38,4 +38,4 @@ C_O1_I2(w, w, wM)
 C_O1_I2(w, w, wA)
 C_O1_I3(w, w, w, w)
 C_O1_I4(r, rZ, rJ, rZ, rZ)
-C_O2_I1(r, r, r)
+C_N2_I1(r, r, r)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index b701df50db..40074c46b8 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1105,13 +1105,18 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi
         }
     } else {
         /* Otherwise use a pair of LD/ST. */
-        tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index);
+        TCGReg base = h.base;
+        if (h.index != TCG_REG_ZERO) {
+            base = TCG_REG_TMP0;
+            tcg_out_opc_add_d(s, base, h.base, h.index);
+        }
         if (is_ld) {
-            tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0);
-            tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8);
+            tcg_debug_assert(base != data_lo);
+            tcg_out_opc_ld_d(s, data_lo, base, 0);
+            tcg_out_opc_ld_d(s, data_hi, base, 8);
         } else {
-            tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0);
-            tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8);
+            tcg_out_opc_st_d(s, data_lo, base, 0);
+            tcg_out_opc_st_d(s, data_hi, base, 8);
         }
     }
 
@@ -2049,7 +2054,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_qemu_ld_a32_i128:
     case INDEX_op_qemu_ld_a64_i128:
-        return C_O2_I1(r, r, r);
+        return C_N2_I1(r, r, r);
 
     case INDEX_op_qemu_st_a32_i128:
     case INDEX_op_qemu_st_a64_i128:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/7] util: Add cpuinfo for loongarch64
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
  2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
  2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
  2023-09-30 11:40   ` Jiajie Chen
  2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/loongarch64/host/cpuinfo.h | 21 +++++++++++++++
 util/cpuinfo-loongarch.c                | 35 +++++++++++++++++++++++++
 util/meson.build                        |  2 ++
 3 files changed, 58 insertions(+)
 create mode 100644 host/include/loongarch64/host/cpuinfo.h
 create mode 100644 util/cpuinfo-loongarch.c

diff --git a/host/include/loongarch64/host/cpuinfo.h b/host/include/loongarch64/host/cpuinfo.h
new file mode 100644
index 0000000000..fab664a10b
--- /dev/null
+++ b/host/include/loongarch64/host/cpuinfo.h
@@ -0,0 +1,21 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu identification for LoongArch
+ */
+
+#ifndef HOST_CPUINFO_H
+#define HOST_CPUINFO_H
+
+#define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
+#define CPUINFO_LSX             (1u << 1)
+
+/* Initialized with a constructor. */
+extern unsigned cpuinfo;
+
+/*
+ * We cannot rely on constructor ordering, so other constructors must
+ * use the function interface rather than the variable above.
+ */
+unsigned cpuinfo_init(void);
+
+#endif /* HOST_CPUINFO_H */
diff --git a/util/cpuinfo-loongarch.c b/util/cpuinfo-loongarch.c
new file mode 100644
index 0000000000..08b6d7460c
--- /dev/null
+++ b/util/cpuinfo-loongarch.c
@@ -0,0 +1,35 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu identification for LoongArch.
+ */
+
+#include "qemu/osdep.h"
+#include "host/cpuinfo.h"
+
+#ifdef CONFIG_GETAUXVAL
+# include <sys/auxv.h>
+#else
+# include "elf.h"
+#endif
+#include <asm/hwcap.h>
+
+unsigned cpuinfo;
+
+/* Called both as constructor and (possibly) via other constructors. */
+unsigned __attribute__((constructor)) cpuinfo_init(void)
+{
+    unsigned info = cpuinfo;
+    unsigned long hwcap;
+
+    if (info) {
+        return info;
+    }
+
+    hwcap = qemu_getauxval(AT_HWCAP);
+
+    info = CPUINFO_ALWAYS;
+    info |= (hwcap & HWCAP_LOONGARCH_LSX ? CPUINFO_LSX : 0);
+
+    cpuinfo = info;
+    return info;
+}
diff --git a/util/meson.build b/util/meson.build
index c4827fd70a..b136f02aa0 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -112,6 +112,8 @@ if cpu == 'aarch64'
   util_ss.add(files('cpuinfo-aarch64.c'))
 elif cpu in ['x86', 'x86_64']
   util_ss.add(files('cpuinfo-i386.c'))
+elif cpu == 'loongarch64'
+  util_ss.add(files('cpuinfo-loongarch.c'))
 elif cpu in ['ppc', 'ppc64']
   util_ss.add(files('cpuinfo-ppc.c'))
 endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
                   ` (2 preceding siblings ...)
  2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
  2023-09-30 11:41   ` Jiajie Chen
  2023-09-16 22:01 ` [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store Richard Henderson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/loongarch64/tcg-target.h     | 8 ++++----
 tcg/loongarch64/tcg-target.c.inc | 8 +-------
 2 files changed, 5 insertions(+), 11 deletions(-)

diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 03017672f6..1bea15b02e 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -29,6 +29,8 @@
 #ifndef LOONGARCH_TCG_TARGET_H
 #define LOONGARCH_TCG_TARGET_H
 
+#include "host/cpuinfo.h"
+
 #define TCG_TARGET_INSN_UNIT_SIZE 4
 #define TCG_TARGET_NB_REGS 64
 
@@ -85,8 +87,6 @@ typedef enum {
     TCG_VEC_TMP0 = TCG_REG_V23,
 } TCGReg;
 
-extern bool use_lsx_instructions;
-
 /* used for function call generation */
 #define TCG_REG_CALL_STACK              TCG_REG_SP
 #define TCG_TARGET_STACK_ALIGN          16
@@ -171,10 +171,10 @@ extern bool use_lsx_instructions;
 #define TCG_TARGET_HAS_muluh_i64        1
 #define TCG_TARGET_HAS_mulsh_i64        1
 
-#define TCG_TARGET_HAS_qemu_ldst_i128   use_lsx_instructions
+#define TCG_TARGET_HAS_qemu_ldst_i128   (cpuinfo & CPUINFO_LSX)
 
 #define TCG_TARGET_HAS_v64              0
-#define TCG_TARGET_HAS_v128             use_lsx_instructions
+#define TCG_TARGET_HAS_v128             (cpuinfo & CPUINFO_LSX)
 #define TCG_TARGET_HAS_v256             0
 
 #define TCG_TARGET_HAS_not_vec          1
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 40074c46b8..52f2c26ce1 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -32,8 +32,6 @@
 #include "../tcg-ldst.c.inc"
 #include <asm/hwcap.h>
 
-bool use_lsx_instructions;
-
 #ifdef CONFIG_DEBUG_TCG
 static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
     "zero",
@@ -2316,10 +2314,6 @@ static void tcg_target_init(TCGContext *s)
         exit(EXIT_FAILURE);
     }
 
-    if (hwcap & HWCAP_LOONGARCH_LSX) {
-        use_lsx_instructions = 1;
-    }
-
     tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS;
     tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS;
 
@@ -2335,7 +2329,7 @@ static void tcg_target_init(TCGContext *s)
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
     tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
 
-    if (use_lsx_instructions) {
+    if (cpuinfo & CPUINFO_LSX) {
         tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
         tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V24);
         tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V25);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
                   ` (3 preceding siblings ...)
  2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
  2023-09-16 22:01 ` [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16 Richard Henderson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

While loongarch64 does not have a 128-bit cmpxchg, it does
have 128-bit atomic load and store via the vector unit.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 .../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
 .../loongarch64/host/load-extract-al16-al8.h  | 39 ++++++++++++++
 .../loongarch64/host/store-insert-al16.h      | 12 +++++
 3 files changed, 103 insertions(+)
 create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
 create mode 100644 host/include/loongarch64/host/load-extract-al16-al8.h
 create mode 100644 host/include/loongarch64/host/store-insert-al16.h

diff --git a/host/include/loongarch64/host/atomic128-ldst.h b/host/include/loongarch64/host/atomic128-ldst.h
new file mode 100644
index 0000000000..9a4a8f8b9e
--- /dev/null
+++ b/host/include/loongarch64/host/atomic128-ldst.h
@@ -0,0 +1,52 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Load/store for 128-bit atomic operations, LoongArch version.
+ *
+ * See docs/devel/atomics.rst for discussion about the guarantees each
+ * atomic primitive is meant to provide.
+ */
+
+#ifndef LOONGARCH_ATOMIC128_LDST_H
+#define LOONGARCH_ATOMIC128_LDST_H
+
+#include "host/cpuinfo.h"
+#include "tcg/debug-assert.h"
+
+#define HAVE_ATOMIC128_RO  likely(cpuinfo & CPUINFO_LSX)
+#define HAVE_ATOMIC128_RW  HAVE_ATOMIC128_RO
+
+/*
+ * As of gcc 13 and clang 16, there is no compiler support for LSX at all.
+ * Use inline assembly throughout.
+ */
+
+static inline Int128 atomic16_read_ro(const Int128 *ptr)
+{
+    uint64_t l, h;
+
+    tcg_debug_assert(HAVE_ATOMIC128_RO);
+    asm("vld $vr0, %2, 0\n\t"
+        "vpickve2gr.d %0, $vr0, 0\n\t"
+        "vpickve2gr.d %1, $vr0, 1"
+	: "=r"(l), "=r"(h) : "r"(ptr), "m"(*ptr) : "f0");
+
+    return int128_make128(l, h);
+}
+
+static inline Int128 atomic16_read_rw(Int128 *ptr)
+{
+    return atomic16_read_ro(ptr);
+}
+
+static inline void atomic16_set(Int128 *ptr, Int128 val)
+{
+    uint64_t l = int128_getlo(val), h = int128_gethi(val);
+
+    tcg_debug_assert(HAVE_ATOMIC128_RW);
+    asm("vinsgr2vr.d $vr0, %1, 0\n\t"
+        "vinsgr2vr.d $vr0, %2, 1\n\t"
+        "vst $vr0, %3, 0"
+	: "=m"(*ptr) : "r"(l), "r"(h), "r"(ptr) : "f0");
+}
+
+#endif /* LOONGARCH_ATOMIC128_LDST_H */
diff --git a/host/include/loongarch64/host/load-extract-al16-al8.h b/host/include/loongarch64/host/load-extract-al16-al8.h
new file mode 100644
index 0000000000..d1fb59d8af
--- /dev/null
+++ b/host/include/loongarch64/host/load-extract-al16-al8.h
@@ -0,0 +1,39 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic extract 64 from 128-bit, LoongArch version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef LOONGARCH_LOAD_EXTRACT_AL16_AL8_H
+#define LOONGARCH_LOAD_EXTRACT_AL16_AL8_H
+
+#include "host/cpuinfo.h"
+#include "tcg/debug-assert.h"
+
+/**
+ * load_atom_extract_al16_or_al8:
+ * @pv: host address
+ * @s: object size in bytes, @s <= 8.
+ *
+ * Load @s bytes from @pv, when pv % s != 0.  If [p, p+s-1] does not
+ * cross an 16-byte boundary then the access must be 16-byte atomic,
+ * otherwise the access must be 8-byte atomic.
+ */
+static inline uint64_t load_atom_extract_al16_or_al8(void *pv, int s)
+{
+    uintptr_t pi = (uintptr_t)pv;
+    Int128 *ptr_align = (Int128 *)(pi & ~7);
+    int shr = (pi & 7) * 8;
+    uint64_t l, h;
+
+    tcg_debug_assert(HAVE_ATOMIC128_RO);
+    asm("vld $vr0, %2, 0\n\t"
+        "vpickve2gr.d %0, $vr0, 0\n\t"
+        "vpickve2gr.d %1, $vr0, 1"
+	: "=r"(l), "=r"(h) : "r"(ptr_align), "m"(*ptr_align) : "f0");
+
+    return (l >> shr) | (h << (-shr & 63));
+}
+
+#endif /* LOONGARCH_LOAD_EXTRACT_AL16_AL8_H */
diff --git a/host/include/loongarch64/host/store-insert-al16.h b/host/include/loongarch64/host/store-insert-al16.h
new file mode 100644
index 0000000000..919fd8d744
--- /dev/null
+++ b/host/include/loongarch64/host/store-insert-al16.h
@@ -0,0 +1,12 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic store insert into 128-bit, LoongArch version.
+ */
+
+#ifndef LOONGARCH_STORE_INSERT_AL16_H
+#define LOONGARCH_STORE_INSERT_AL16_H
+
+void store_atom_insert_al16(Int128 *ps, Int128 val, Int128 msk)
+    QEMU_ERROR("unsupported atomic");
+
+#endif /* LOONGARCH_STORE_INSERT_AL16_H */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
                   ` (4 preceding siblings ...)
  2023-09-16 22:01 ` [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
  2023-09-16 22:01 ` [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16 Richard Henderson
  2023-09-30  2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
  7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

We handled the HAVE_ATOMIC128_RW case with atomic16_set at the top of
the function; the only thing left for a host without that support is
to fall through to cpu_loop_exit_atomic.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/ldst_atomicity.c.inc | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 1b793e6935..23d43f62a2 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -1103,10 +1103,6 @@ static void store_atom_16(CPUArchState *env, uintptr_t ra,
         }
         break;
     case MO_128:
-        if (HAVE_ATOMIC128_RW) {
-            atomic16_set(pv, val);
-            return;
-        }
         break;
     default:
         g_assert_not_reached();
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
                   ` (5 preceding siblings ...)
  2023-09-16 22:01 ` [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
  2023-09-30  2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
  7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

Store bytes under a mask is fundamentally a cmpxchg, not a straight store.
Use HAVE_CMPXCHG128 instead of HAVE_ATOMIC128_RW.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cputlb.c             |  2 +-
 accel/tcg/ldst_atomicity.c.inc | 10 +++++-----
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 3270f65c20..3b76626666 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2849,7 +2849,7 @@ static uint64_t do_st16_leN(CPUArchState *env, MMULookupPageData *p,
 
     case MO_ATOM_WITHIN16_PAIR:
         /* Since size > 8, this is the half that must be atomic. */
-        if (!HAVE_ATOMIC128_RW) {
+        if (!HAVE_CMPXCHG128) {
             cpu_loop_exit_atomic(env_cpu(env), ra);
         }
         return store_whole_le16(p->haddr, p->size, val_le);
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 23d43f62a2..5c6e116cfe 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -825,7 +825,7 @@ static uint64_t store_whole_le16(void *pv, int size, Int128 val_le)
     int sh = o * 8;
     Int128 m, v;
 
-    qemu_build_assert(HAVE_ATOMIC128_RW);
+    qemu_build_assert(HAVE_CMPXCHG128);
 
     /* Like MAKE_64BIT_MASK(0, sz), but larger. */
     if (sz <= 64) {
@@ -887,7 +887,7 @@ static void store_atom_2(CPUArchState *env, uintptr_t ra,
             return;
         }
     } else if ((pi & 15) == 7) {
-        if (HAVE_ATOMIC128_RW) {
+        if (HAVE_CMPXCHG128) {
             Int128 v = int128_lshift(int128_make64(val), 56);
             Int128 m = int128_lshift(int128_make64(0xffff), 56);
             store_atom_insert_al16(pv - 7, v, m);
@@ -956,7 +956,7 @@ static void store_atom_4(CPUArchState *env, uintptr_t ra,
                 return;
             }
         } else {
-            if (HAVE_ATOMIC128_RW) {
+            if (HAVE_CMPXCHG128) {
                 store_whole_le16(pv, 4, int128_make64(cpu_to_le32(val)));
                 return;
             }
@@ -1021,7 +1021,7 @@ static void store_atom_8(CPUArchState *env, uintptr_t ra,
         }
         break;
     case MO_64:
-        if (HAVE_ATOMIC128_RW) {
+        if (HAVE_CMPXCHG128) {
             store_whole_le16(pv, 8, int128_make64(cpu_to_le64(val)));
             return;
         }
@@ -1076,7 +1076,7 @@ static void store_atom_16(CPUArchState *env, uintptr_t ra,
         }
         break;
     case -MO_64:
-        if (HAVE_ATOMIC128_RW) {
+        if (HAVE_CMPXCHG128) {
             uint64_t val_le;
             int s2 = pi & 15;
             int s1 = 16 - s2;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store
  2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
                   ` (6 preceding siblings ...)
  2023-09-16 22:01 ` [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16 Richard Henderson
@ 2023-09-30  2:13 ` Richard Henderson
  2023-09-30 19:04   ` WANG Xuerui
  7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-30  2:13 UTC (permalink / raw)
  To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

Ping.

r~

On 9/16/23 15:01, Richard Henderson wrote:
> For tcg generated code, use new registers with load so that we never
> overlap the input address, so that we can simplify address build for
> 64-bit user-only.
> 
> For tcg out-of-line code, implement the host/ headers to for atomic 128-bit
> load and store, reducing the cases for which we must raise EXCP_ATOMIC.
> 
> 
> r~
> 
> Based-on: 20230916171223.521545-1-richard.henderson@linaro.org
> ("[PULL v2 00/39] tcg patch queue")
> 
> Richard Henderson (7):
>    tcg: Add C_N2_I1
>    tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
>    util: Add cpuinfo for loongarch64
>    tcg/loongarch64: Use cpuinfo.h
>    host/include/loongarch64: Add atomic16 load and store
>    accel/tcg: Remove redundant case in store_atom_16
>    accel/tcg: Fix condition for store_atom_insert_al16
> 
>   .../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
>   host/include/loongarch64/host/cpuinfo.h       | 21 ++++++++
>   .../loongarch64/host/load-extract-al16-al8.h  | 39 ++++++++++++++
>   .../loongarch64/host/store-insert-al16.h      | 12 +++++
>   tcg/loongarch64/tcg-target-con-set.h          |  2 +-
>   tcg/loongarch64/tcg-target.h                  |  8 +--
>   accel/tcg/cputlb.c                            |  2 +-
>   tcg/tcg.c                                     |  5 ++
>   util/cpuinfo-loongarch.c                      | 35 +++++++++++++
>   accel/tcg/ldst_atomicity.c.inc                | 14 ++---
>   tcg/loongarch64/tcg-target.c.inc              | 25 +++++----
>   util/meson.build                              |  2 +
>   12 files changed, 189 insertions(+), 28 deletions(-)
>   create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
>   create mode 100644 host/include/loongarch64/host/cpuinfo.h
>   create mode 100644 host/include/loongarch64/host/load-extract-al16-al8.h
>   create mode 100644 host/include/loongarch64/host/store-insert-al16.h
>   create mode 100644 util/cpuinfo-loongarch.c
> 



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 1/7] tcg: Add C_N2_I1
  2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
@ 2023-09-30 11:39   ` Jiajie Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan

On 2023/9/17 06:01, Richard Henderson wrote:
> Constraint with two outputs, both in new registers.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/tcg.c | 5 +++++
>   1 file changed, 5 insertions(+)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 604fa9bf3e..fdbf79689a 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -644,6 +644,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1,
>   #define C_O1_I4(O1, I1, I2, I3, I4)     C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4),
>   
>   #define C_N1_I2(O1, I1, I2)             C_PFX3(c_n1_i2_, O1, I1, I2),
> +#define C_N2_I1(O1, O2, I1)             C_PFX3(c_n2_i1_, O1, O2, I1),
>   
>   #define C_O2_I1(O1, O2, I1)             C_PFX3(c_o2_i1_, O1, O2, I1),
>   #define C_O2_I2(O1, O2, I1, I2)         C_PFX4(c_o2_i2_, O1, O2, I1, I2),
> @@ -666,6 +667,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
>   #undef C_O1_I3
>   #undef C_O1_I4
>   #undef C_N1_I2
> +#undef C_N2_I1
>   #undef C_O2_I1
>   #undef C_O2_I2
>   #undef C_O2_I3
> @@ -685,6 +687,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
>   #define C_O1_I4(O1, I1, I2, I3, I4)     { .args_ct_str = { #O1, #I1, #I2, #I3, #I4 } },
>   
>   #define C_N1_I2(O1, I1, I2)             { .args_ct_str = { "&" #O1, #I1, #I2 } },
> +#define C_N2_I1(O1, O2, I1)             { .args_ct_str = { "&" #O1, "&" #O2, #I1 } },
>   
>   #define C_O2_I1(O1, O2, I1)             { .args_ct_str = { #O1, #O2, #I1 } },
>   #define C_O2_I2(O1, O2, I1, I2)         { .args_ct_str = { #O1, #O2, #I1, #I2 } },
> @@ -706,6 +709,7 @@ static const TCGTargetOpDef constraint_sets[] = {
>   #undef C_O1_I3
>   #undef C_O1_I4
>   #undef C_N1_I2
> +#undef C_N2_I1
>   #undef C_O2_I1
>   #undef C_O2_I2
>   #undef C_O2_I3
> @@ -725,6 +729,7 @@ static const TCGTargetOpDef constraint_sets[] = {
>   #define C_O1_I4(O1, I1, I2, I3, I4)     C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4)
>   
>   #define C_N1_I2(O1, I1, I2)             C_PFX3(c_n1_i2_, O1, I1, I2)
> +#define C_N2_I1(O1, O2, I1)             C_PFX3(c_n2_i1_, O1, O2, I1)
>   
>   #define C_O2_I1(O1, O2, I1)             C_PFX3(c_o2_i1_, O1, O2, I1)
>   #define C_O2_I2(O1, O2, I1, I2)         C_PFX4(c_o2_i2_, O1, O2, I1, I2)


Reviewed-by: Jiajie Chen <c@jia.je>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
  2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
@ 2023-09-30 11:39   ` Jiajie Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:39 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan


On 2023/9/17 06:01, Richard Henderson wrote:
> Use new registers for the output, so that we never overlap
> the input address, which could happen for user-only.
> This avoids a "tmp = addr + 0" in that case.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/loongarch64/tcg-target-con-set.h |  2 +-
>   tcg/loongarch64/tcg-target.c.inc     | 17 +++++++++++------
>   2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h
> index 77d62e38e7..cae6c2aad6 100644
> --- a/tcg/loongarch64/tcg-target-con-set.h
> +++ b/tcg/loongarch64/tcg-target-con-set.h
> @@ -38,4 +38,4 @@ C_O1_I2(w, w, wM)
>   C_O1_I2(w, w, wA)
>   C_O1_I3(w, w, w, w)
>   C_O1_I4(r, rZ, rJ, rZ, rZ)
> -C_O2_I1(r, r, r)
> +C_N2_I1(r, r, r)
> diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
> index b701df50db..40074c46b8 100644
> --- a/tcg/loongarch64/tcg-target.c.inc
> +++ b/tcg/loongarch64/tcg-target.c.inc
> @@ -1105,13 +1105,18 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi
>           }
>       } else {
>           /* Otherwise use a pair of LD/ST. */
> -        tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index);
> +        TCGReg base = h.base;
> +        if (h.index != TCG_REG_ZERO) {
> +            base = TCG_REG_TMP0;
> +            tcg_out_opc_add_d(s, base, h.base, h.index);
> +        }
>           if (is_ld) {
> -            tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0);
> -            tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8);
> +            tcg_debug_assert(base != data_lo);
> +            tcg_out_opc_ld_d(s, data_lo, base, 0);
> +            tcg_out_opc_ld_d(s, data_hi, base, 8);
>           } else {
> -            tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0);
> -            tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8);
> +            tcg_out_opc_st_d(s, data_lo, base, 0);
> +            tcg_out_opc_st_d(s, data_hi, base, 8);
>           }
>       }
>   
> @@ -2049,7 +2054,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>   
>       case INDEX_op_qemu_ld_a32_i128:
>       case INDEX_op_qemu_ld_a64_i128:
> -        return C_O2_I1(r, r, r);
> +        return C_N2_I1(r, r, r);
>   
>       case INDEX_op_qemu_st_a32_i128:
>       case INDEX_op_qemu_st_a64_i128:


Reviewed-by: Jiajie Chen <c@jia.je>




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 3/7] util: Add cpuinfo for loongarch64
  2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
@ 2023-09-30 11:40   ` Jiajie Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:40 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan


On 2023/9/17 06:01, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   host/include/loongarch64/host/cpuinfo.h | 21 +++++++++++++++
>   util/cpuinfo-loongarch.c                | 35 +++++++++++++++++++++++++
>   util/meson.build                        |  2 ++
>   3 files changed, 58 insertions(+)
>   create mode 100644 host/include/loongarch64/host/cpuinfo.h
>   create mode 100644 util/cpuinfo-loongarch.c
>
> diff --git a/host/include/loongarch64/host/cpuinfo.h b/host/include/loongarch64/host/cpuinfo.h
> new file mode 100644
> index 0000000000..fab664a10b
> --- /dev/null
> +++ b/host/include/loongarch64/host/cpuinfo.h
> @@ -0,0 +1,21 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * Host specific cpu identification for LoongArch
> + */
> +
> +#ifndef HOST_CPUINFO_H
> +#define HOST_CPUINFO_H
> +
> +#define CPUINFO_ALWAYS          (1u << 0)  /* so cpuinfo is nonzero */
> +#define CPUINFO_LSX             (1u << 1)
> +
> +/* Initialized with a constructor. */
> +extern unsigned cpuinfo;
> +
> +/*
> + * We cannot rely on constructor ordering, so other constructors must
> + * use the function interface rather than the variable above.
> + */
> +unsigned cpuinfo_init(void);
> +
> +#endif /* HOST_CPUINFO_H */
> diff --git a/util/cpuinfo-loongarch.c b/util/cpuinfo-loongarch.c
> new file mode 100644
> index 0000000000..08b6d7460c
> --- /dev/null
> +++ b/util/cpuinfo-loongarch.c
> @@ -0,0 +1,35 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * Host specific cpu identification for LoongArch.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "host/cpuinfo.h"
> +
> +#ifdef CONFIG_GETAUXVAL
> +# include <sys/auxv.h>
> +#else
> +# include "elf.h"
> +#endif
> +#include <asm/hwcap.h>
> +
> +unsigned cpuinfo;
> +
> +/* Called both as constructor and (possibly) via other constructors. */
> +unsigned __attribute__((constructor)) cpuinfo_init(void)
> +{
> +    unsigned info = cpuinfo;
> +    unsigned long hwcap;
> +
> +    if (info) {
> +        return info;
> +    }
> +
> +    hwcap = qemu_getauxval(AT_HWCAP);
> +
> +    info = CPUINFO_ALWAYS;
> +    info |= (hwcap & HWCAP_LOONGARCH_LSX ? CPUINFO_LSX : 0);
> +
> +    cpuinfo = info;
> +    return info;
> +}
> diff --git a/util/meson.build b/util/meson.build
> index c4827fd70a..b136f02aa0 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -112,6 +112,8 @@ if cpu == 'aarch64'
>     util_ss.add(files('cpuinfo-aarch64.c'))
>   elif cpu in ['x86', 'x86_64']
>     util_ss.add(files('cpuinfo-i386.c'))
> +elif cpu == 'loongarch64'
> +  util_ss.add(files('cpuinfo-loongarch.c'))
>   elif cpu in ['ppc', 'ppc64']
>     util_ss.add(files('cpuinfo-ppc.c'))
>   endif


Reviewed-by: Jiajie Chen <c@jia.je>




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h
  2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
@ 2023-09-30 11:41   ` Jiajie Chen
  0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:41 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan


On 2023/9/17 06:01, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>   tcg/loongarch64/tcg-target.h     | 8 ++++----
>   tcg/loongarch64/tcg-target.c.inc | 8 +-------
>   2 files changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
> index 03017672f6..1bea15b02e 100644
> --- a/tcg/loongarch64/tcg-target.h
> +++ b/tcg/loongarch64/tcg-target.h
> @@ -29,6 +29,8 @@
>   #ifndef LOONGARCH_TCG_TARGET_H
>   #define LOONGARCH_TCG_TARGET_H
>   
> +#include "host/cpuinfo.h"
> +
>   #define TCG_TARGET_INSN_UNIT_SIZE 4
>   #define TCG_TARGET_NB_REGS 64
>   
> @@ -85,8 +87,6 @@ typedef enum {
>       TCG_VEC_TMP0 = TCG_REG_V23,
>   } TCGReg;
>   
> -extern bool use_lsx_instructions;
> -
>   /* used for function call generation */
>   #define TCG_REG_CALL_STACK              TCG_REG_SP
>   #define TCG_TARGET_STACK_ALIGN          16
> @@ -171,10 +171,10 @@ extern bool use_lsx_instructions;
>   #define TCG_TARGET_HAS_muluh_i64        1
>   #define TCG_TARGET_HAS_mulsh_i64        1
>   
> -#define TCG_TARGET_HAS_qemu_ldst_i128   use_lsx_instructions
> +#define TCG_TARGET_HAS_qemu_ldst_i128   (cpuinfo & CPUINFO_LSX)
>   
>   #define TCG_TARGET_HAS_v64              0
> -#define TCG_TARGET_HAS_v128             use_lsx_instructions
> +#define TCG_TARGET_HAS_v128             (cpuinfo & CPUINFO_LSX)
>   #define TCG_TARGET_HAS_v256             0
>   
>   #define TCG_TARGET_HAS_not_vec          1
> diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
> index 40074c46b8..52f2c26ce1 100644
> --- a/tcg/loongarch64/tcg-target.c.inc
> +++ b/tcg/loongarch64/tcg-target.c.inc
> @@ -32,8 +32,6 @@
>   #include "../tcg-ldst.c.inc"
>   #include <asm/hwcap.h>
>   
> -bool use_lsx_instructions;
> -
>   #ifdef CONFIG_DEBUG_TCG
>   static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
>       "zero",
> @@ -2316,10 +2314,6 @@ static void tcg_target_init(TCGContext *s)
>           exit(EXIT_FAILURE);
>       }
>   
> -    if (hwcap & HWCAP_LOONGARCH_LSX) {
> -        use_lsx_instructions = 1;
> -    }
> -
>       tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS;
>       tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS;
>   
> @@ -2335,7 +2329,7 @@ static void tcg_target_init(TCGContext *s)
>       tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
>       tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
>   
> -    if (use_lsx_instructions) {
> +    if (cpuinfo & CPUINFO_LSX) {
>           tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
>           tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V24);
>           tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V25);


Reviewed-by: Jiajie Chen <c@jia.je>




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store
  2023-09-30  2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
@ 2023-09-30 19:04   ` WANG Xuerui
  0 siblings, 0 replies; 14+ messages in thread
From: WANG Xuerui @ 2023-09-30 19:04 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel; +Cc: git, c, gaosong, yangxiaojuan

On 9/30/23 10:13, Richard Henderson wrote:
> Ping.
> 
> r~
> 
> On 9/16/23 15:01, Richard Henderson wrote:
>> For tcg generated code, use new registers with load so that we never
>> overlap the input address, so that we can simplify address build for
>> 64-bit user-only.
>>
>> For tcg out-of-line code, implement the host/ headers to for atomic 
>> 128-bit
>> load and store, reducing the cases for which we must raise EXCP_ATOMIC.
>>
>>
>> r~
>>
>> Based-on: 20230916171223.521545-1-richard.henderson@linaro.org
>> ("[PULL v2 00/39] tcg patch queue")
>>
>> Richard Henderson (7):
>>    tcg: Add C_N2_I1
>>    tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
>>    util: Add cpuinfo for loongarch64
>>    tcg/loongarch64: Use cpuinfo.h
>>    host/include/loongarch64: Add atomic16 load and store
>>    accel/tcg: Remove redundant case in store_atom_16
>>    accel/tcg: Fix condition for store_atom_insert_al16
>>
>>   .../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
>>   host/include/loongarch64/host/cpuinfo.h       | 21 ++++++++
>>   .../loongarch64/host/load-extract-al16-al8.h  | 39 ++++++++++++++
>>   .../loongarch64/host/store-insert-al16.h      | 12 +++++
>>   tcg/loongarch64/tcg-target-con-set.h          |  2 +-
>>   tcg/loongarch64/tcg-target.h                  |  8 +--
>>   accel/tcg/cputlb.c                            |  2 +-
>>   tcg/tcg.c                                     |  5 ++
>>   util/cpuinfo-loongarch.c                      | 35 +++++++++++++
>>   accel/tcg/ldst_atomicity.c.inc                | 14 ++---
>>   tcg/loongarch64/tcg-target.c.inc              | 25 +++++----
>>   util/meson.build                              |  2 +
>>   12 files changed, 189 insertions(+), 28 deletions(-)
>>   create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
>>   create mode 100644 host/include/loongarch64/host/cpuinfo.h
>>   create mode 100644 
>> host/include/loongarch64/host/load-extract-al16-al8.h
>>   create mode 100644 host/include/loongarch64/host/store-insert-al16.h
>>   create mode 100644 util/cpuinfo-loongarch.c

Sorry for the delay; I've skimmed through the series and tested on 
Loongson 3C5000L hardware, so

Reviewed-by: WANG Xuerui <git@xen0n.name>


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2023-09-30 19:05 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
2023-09-30 11:39   ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
2023-09-30 11:39   ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
2023-09-30 11:40   ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
2023-09-30 11:41   ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store Richard Henderson
2023-09-16 22:01 ` [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16 Richard Henderson
2023-09-16 22:01 ` [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16 Richard Henderson
2023-09-30  2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
2023-09-30 19:04   ` WANG Xuerui

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).