* [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store
@ 2023-09-16 22:01 Richard Henderson
2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
` (7 more replies)
0 siblings, 8 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
For tcg generated code, use new registers with load so that we never
overlap the input address, so that we can simplify address build for
64-bit user-only.
For tcg out-of-line code, implement the host/ headers to for atomic 128-bit
load and store, reducing the cases for which we must raise EXCP_ATOMIC.
r~
Based-on: 20230916171223.521545-1-richard.henderson@linaro.org
("[PULL v2 00/39] tcg patch queue")
Richard Henderson (7):
tcg: Add C_N2_I1
tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
util: Add cpuinfo for loongarch64
tcg/loongarch64: Use cpuinfo.h
host/include/loongarch64: Add atomic16 load and store
accel/tcg: Remove redundant case in store_atom_16
accel/tcg: Fix condition for store_atom_insert_al16
.../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
host/include/loongarch64/host/cpuinfo.h | 21 ++++++++
.../loongarch64/host/load-extract-al16-al8.h | 39 ++++++++++++++
.../loongarch64/host/store-insert-al16.h | 12 +++++
tcg/loongarch64/tcg-target-con-set.h | 2 +-
tcg/loongarch64/tcg-target.h | 8 +--
accel/tcg/cputlb.c | 2 +-
tcg/tcg.c | 5 ++
util/cpuinfo-loongarch.c | 35 +++++++++++++
accel/tcg/ldst_atomicity.c.inc | 14 ++---
tcg/loongarch64/tcg-target.c.inc | 25 +++++----
util/meson.build | 2 +
12 files changed, 189 insertions(+), 28 deletions(-)
create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
create mode 100644 host/include/loongarch64/host/cpuinfo.h
create mode 100644 host/include/loongarch64/host/load-extract-al16-al8.h
create mode 100644 host/include/loongarch64/host/store-insert-al16.h
create mode 100644 util/cpuinfo-loongarch.c
--
2.34.1
^ permalink raw reply [flat|nested] 14+ messages in thread
* [PATCH 1/7] tcg: Add C_N2_I1
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
2023-09-30 11:39 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
` (6 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
Constraint with two outputs, both in new registers.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/tcg.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 604fa9bf3e..fdbf79689a 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -644,6 +644,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1,
#define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4),
#define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2),
+#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1),
#define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1),
#define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2),
@@ -666,6 +667,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
#undef C_O1_I3
#undef C_O1_I4
#undef C_N1_I2
+#undef C_N2_I1
#undef C_O2_I1
#undef C_O2_I2
#undef C_O2_I3
@@ -685,6 +687,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
#define C_O1_I4(O1, I1, I2, I3, I4) { .args_ct_str = { #O1, #I1, #I2, #I3, #I4 } },
#define C_N1_I2(O1, I1, I2) { .args_ct_str = { "&" #O1, #I1, #I2 } },
+#define C_N2_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, "&" #O2, #I1 } },
#define C_O2_I1(O1, O2, I1) { .args_ct_str = { #O1, #O2, #I1 } },
#define C_O2_I2(O1, O2, I1, I2) { .args_ct_str = { #O1, #O2, #I1, #I2 } },
@@ -706,6 +709,7 @@ static const TCGTargetOpDef constraint_sets[] = {
#undef C_O1_I3
#undef C_O1_I4
#undef C_N1_I2
+#undef C_N2_I1
#undef C_O2_I1
#undef C_O2_I2
#undef C_O2_I3
@@ -725,6 +729,7 @@ static const TCGTargetOpDef constraint_sets[] = {
#define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4)
#define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2)
+#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1)
#define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1)
#define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2)
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
2023-09-30 11:39 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
` (5 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
Use new registers for the output, so that we never overlap
the input address, which could happen for user-only.
This avoids a "tmp = addr + 0" in that case.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/loongarch64/tcg-target-con-set.h | 2 +-
tcg/loongarch64/tcg-target.c.inc | 17 +++++++++++------
2 files changed, 12 insertions(+), 7 deletions(-)
diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h
index 77d62e38e7..cae6c2aad6 100644
--- a/tcg/loongarch64/tcg-target-con-set.h
+++ b/tcg/loongarch64/tcg-target-con-set.h
@@ -38,4 +38,4 @@ C_O1_I2(w, w, wM)
C_O1_I2(w, w, wA)
C_O1_I3(w, w, w, w)
C_O1_I4(r, rZ, rJ, rZ, rZ)
-C_O2_I1(r, r, r)
+C_N2_I1(r, r, r)
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index b701df50db..40074c46b8 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -1105,13 +1105,18 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi
}
} else {
/* Otherwise use a pair of LD/ST. */
- tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index);
+ TCGReg base = h.base;
+ if (h.index != TCG_REG_ZERO) {
+ base = TCG_REG_TMP0;
+ tcg_out_opc_add_d(s, base, h.base, h.index);
+ }
if (is_ld) {
- tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0);
- tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8);
+ tcg_debug_assert(base != data_lo);
+ tcg_out_opc_ld_d(s, data_lo, base, 0);
+ tcg_out_opc_ld_d(s, data_hi, base, 8);
} else {
- tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0);
- tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8);
+ tcg_out_opc_st_d(s, data_lo, base, 0);
+ tcg_out_opc_st_d(s, data_hi, base, 8);
}
}
@@ -2049,7 +2054,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
case INDEX_op_qemu_ld_a32_i128:
case INDEX_op_qemu_ld_a64_i128:
- return C_O2_I1(r, r, r);
+ return C_N2_I1(r, r, r);
case INDEX_op_qemu_st_a32_i128:
case INDEX_op_qemu_st_a64_i128:
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 3/7] util: Add cpuinfo for loongarch64
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
2023-09-30 11:40 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
` (4 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
host/include/loongarch64/host/cpuinfo.h | 21 +++++++++++++++
util/cpuinfo-loongarch.c | 35 +++++++++++++++++++++++++
util/meson.build | 2 ++
3 files changed, 58 insertions(+)
create mode 100644 host/include/loongarch64/host/cpuinfo.h
create mode 100644 util/cpuinfo-loongarch.c
diff --git a/host/include/loongarch64/host/cpuinfo.h b/host/include/loongarch64/host/cpuinfo.h
new file mode 100644
index 0000000000..fab664a10b
--- /dev/null
+++ b/host/include/loongarch64/host/cpuinfo.h
@@ -0,0 +1,21 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu identification for LoongArch
+ */
+
+#ifndef HOST_CPUINFO_H
+#define HOST_CPUINFO_H
+
+#define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */
+#define CPUINFO_LSX (1u << 1)
+
+/* Initialized with a constructor. */
+extern unsigned cpuinfo;
+
+/*
+ * We cannot rely on constructor ordering, so other constructors must
+ * use the function interface rather than the variable above.
+ */
+unsigned cpuinfo_init(void);
+
+#endif /* HOST_CPUINFO_H */
diff --git a/util/cpuinfo-loongarch.c b/util/cpuinfo-loongarch.c
new file mode 100644
index 0000000000..08b6d7460c
--- /dev/null
+++ b/util/cpuinfo-loongarch.c
@@ -0,0 +1,35 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Host specific cpu identification for LoongArch.
+ */
+
+#include "qemu/osdep.h"
+#include "host/cpuinfo.h"
+
+#ifdef CONFIG_GETAUXVAL
+# include <sys/auxv.h>
+#else
+# include "elf.h"
+#endif
+#include <asm/hwcap.h>
+
+unsigned cpuinfo;
+
+/* Called both as constructor and (possibly) via other constructors. */
+unsigned __attribute__((constructor)) cpuinfo_init(void)
+{
+ unsigned info = cpuinfo;
+ unsigned long hwcap;
+
+ if (info) {
+ return info;
+ }
+
+ hwcap = qemu_getauxval(AT_HWCAP);
+
+ info = CPUINFO_ALWAYS;
+ info |= (hwcap & HWCAP_LOONGARCH_LSX ? CPUINFO_LSX : 0);
+
+ cpuinfo = info;
+ return info;
+}
diff --git a/util/meson.build b/util/meson.build
index c4827fd70a..b136f02aa0 100644
--- a/util/meson.build
+++ b/util/meson.build
@@ -112,6 +112,8 @@ if cpu == 'aarch64'
util_ss.add(files('cpuinfo-aarch64.c'))
elif cpu in ['x86', 'x86_64']
util_ss.add(files('cpuinfo-i386.c'))
+elif cpu == 'loongarch64'
+ util_ss.add(files('cpuinfo-loongarch.c'))
elif cpu in ['ppc', 'ppc64']
util_ss.add(files('cpuinfo-ppc.c'))
endif
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
` (2 preceding siblings ...)
2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
2023-09-30 11:41 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store Richard Henderson
` (3 subsequent siblings)
7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
tcg/loongarch64/tcg-target.h | 8 ++++----
tcg/loongarch64/tcg-target.c.inc | 8 +-------
2 files changed, 5 insertions(+), 11 deletions(-)
diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
index 03017672f6..1bea15b02e 100644
--- a/tcg/loongarch64/tcg-target.h
+++ b/tcg/loongarch64/tcg-target.h
@@ -29,6 +29,8 @@
#ifndef LOONGARCH_TCG_TARGET_H
#define LOONGARCH_TCG_TARGET_H
+#include "host/cpuinfo.h"
+
#define TCG_TARGET_INSN_UNIT_SIZE 4
#define TCG_TARGET_NB_REGS 64
@@ -85,8 +87,6 @@ typedef enum {
TCG_VEC_TMP0 = TCG_REG_V23,
} TCGReg;
-extern bool use_lsx_instructions;
-
/* used for function call generation */
#define TCG_REG_CALL_STACK TCG_REG_SP
#define TCG_TARGET_STACK_ALIGN 16
@@ -171,10 +171,10 @@ extern bool use_lsx_instructions;
#define TCG_TARGET_HAS_muluh_i64 1
#define TCG_TARGET_HAS_mulsh_i64 1
-#define TCG_TARGET_HAS_qemu_ldst_i128 use_lsx_instructions
+#define TCG_TARGET_HAS_qemu_ldst_i128 (cpuinfo & CPUINFO_LSX)
#define TCG_TARGET_HAS_v64 0
-#define TCG_TARGET_HAS_v128 use_lsx_instructions
+#define TCG_TARGET_HAS_v128 (cpuinfo & CPUINFO_LSX)
#define TCG_TARGET_HAS_v256 0
#define TCG_TARGET_HAS_not_vec 1
diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
index 40074c46b8..52f2c26ce1 100644
--- a/tcg/loongarch64/tcg-target.c.inc
+++ b/tcg/loongarch64/tcg-target.c.inc
@@ -32,8 +32,6 @@
#include "../tcg-ldst.c.inc"
#include <asm/hwcap.h>
-bool use_lsx_instructions;
-
#ifdef CONFIG_DEBUG_TCG
static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
"zero",
@@ -2316,10 +2314,6 @@ static void tcg_target_init(TCGContext *s)
exit(EXIT_FAILURE);
}
- if (hwcap & HWCAP_LOONGARCH_LSX) {
- use_lsx_instructions = 1;
- }
-
tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS;
tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS;
@@ -2335,7 +2329,7 @@ static void tcg_target_init(TCGContext *s)
tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
- if (use_lsx_instructions) {
+ if (cpuinfo & CPUINFO_LSX) {
tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V24);
tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V25);
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
` (3 preceding siblings ...)
2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
2023-09-16 22:01 ` [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16 Richard Henderson
` (2 subsequent siblings)
7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
While loongarch64 does not have a 128-bit cmpxchg, it does
have 128-bit atomic load and store via the vector unit.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
.../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
| 39 ++++++++++++++
.../loongarch64/host/store-insert-al16.h | 12 +++++
3 files changed, 103 insertions(+)
create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
create mode 100644 host/include/loongarch64/host/load-extract-al16-al8.h
create mode 100644 host/include/loongarch64/host/store-insert-al16.h
diff --git a/host/include/loongarch64/host/atomic128-ldst.h b/host/include/loongarch64/host/atomic128-ldst.h
new file mode 100644
index 0000000000..9a4a8f8b9e
--- /dev/null
+++ b/host/include/loongarch64/host/atomic128-ldst.h
@@ -0,0 +1,52 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Load/store for 128-bit atomic operations, LoongArch version.
+ *
+ * See docs/devel/atomics.rst for discussion about the guarantees each
+ * atomic primitive is meant to provide.
+ */
+
+#ifndef LOONGARCH_ATOMIC128_LDST_H
+#define LOONGARCH_ATOMIC128_LDST_H
+
+#include "host/cpuinfo.h"
+#include "tcg/debug-assert.h"
+
+#define HAVE_ATOMIC128_RO likely(cpuinfo & CPUINFO_LSX)
+#define HAVE_ATOMIC128_RW HAVE_ATOMIC128_RO
+
+/*
+ * As of gcc 13 and clang 16, there is no compiler support for LSX at all.
+ * Use inline assembly throughout.
+ */
+
+static inline Int128 atomic16_read_ro(const Int128 *ptr)
+{
+ uint64_t l, h;
+
+ tcg_debug_assert(HAVE_ATOMIC128_RO);
+ asm("vld $vr0, %2, 0\n\t"
+ "vpickve2gr.d %0, $vr0, 0\n\t"
+ "vpickve2gr.d %1, $vr0, 1"
+ : "=r"(l), "=r"(h) : "r"(ptr), "m"(*ptr) : "f0");
+
+ return int128_make128(l, h);
+}
+
+static inline Int128 atomic16_read_rw(Int128 *ptr)
+{
+ return atomic16_read_ro(ptr);
+}
+
+static inline void atomic16_set(Int128 *ptr, Int128 val)
+{
+ uint64_t l = int128_getlo(val), h = int128_gethi(val);
+
+ tcg_debug_assert(HAVE_ATOMIC128_RW);
+ asm("vinsgr2vr.d $vr0, %1, 0\n\t"
+ "vinsgr2vr.d $vr0, %2, 1\n\t"
+ "vst $vr0, %3, 0"
+ : "=m"(*ptr) : "r"(l), "r"(h), "r"(ptr) : "f0");
+}
+
+#endif /* LOONGARCH_ATOMIC128_LDST_H */
--git a/host/include/loongarch64/host/load-extract-al16-al8.h b/host/include/loongarch64/host/load-extract-al16-al8.h
new file mode 100644
index 0000000000..d1fb59d8af
--- /dev/null
+++ b/host/include/loongarch64/host/load-extract-al16-al8.h
@@ -0,0 +1,39 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic extract 64 from 128-bit, LoongArch version.
+ *
+ * Copyright (C) 2023 Linaro, Ltd.
+ */
+
+#ifndef LOONGARCH_LOAD_EXTRACT_AL16_AL8_H
+#define LOONGARCH_LOAD_EXTRACT_AL16_AL8_H
+
+#include "host/cpuinfo.h"
+#include "tcg/debug-assert.h"
+
+/**
+ * load_atom_extract_al16_or_al8:
+ * @pv: host address
+ * @s: object size in bytes, @s <= 8.
+ *
+ * Load @s bytes from @pv, when pv % s != 0. If [p, p+s-1] does not
+ * cross an 16-byte boundary then the access must be 16-byte atomic,
+ * otherwise the access must be 8-byte atomic.
+ */
+static inline uint64_t load_atom_extract_al16_or_al8(void *pv, int s)
+{
+ uintptr_t pi = (uintptr_t)pv;
+ Int128 *ptr_align = (Int128 *)(pi & ~7);
+ int shr = (pi & 7) * 8;
+ uint64_t l, h;
+
+ tcg_debug_assert(HAVE_ATOMIC128_RO);
+ asm("vld $vr0, %2, 0\n\t"
+ "vpickve2gr.d %0, $vr0, 0\n\t"
+ "vpickve2gr.d %1, $vr0, 1"
+ : "=r"(l), "=r"(h) : "r"(ptr_align), "m"(*ptr_align) : "f0");
+
+ return (l >> shr) | (h << (-shr & 63));
+}
+
+#endif /* LOONGARCH_LOAD_EXTRACT_AL16_AL8_H */
diff --git a/host/include/loongarch64/host/store-insert-al16.h b/host/include/loongarch64/host/store-insert-al16.h
new file mode 100644
index 0000000000..919fd8d744
--- /dev/null
+++ b/host/include/loongarch64/host/store-insert-al16.h
@@ -0,0 +1,12 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Atomic store insert into 128-bit, LoongArch version.
+ */
+
+#ifndef LOONGARCH_STORE_INSERT_AL16_H
+#define LOONGARCH_STORE_INSERT_AL16_H
+
+void store_atom_insert_al16(Int128 *ps, Int128 val, Int128 msk)
+ QEMU_ERROR("unsupported atomic");
+
+#endif /* LOONGARCH_STORE_INSERT_AL16_H */
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
` (4 preceding siblings ...)
2023-09-16 22:01 ` [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
2023-09-16 22:01 ` [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16 Richard Henderson
2023-09-30 2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
We handled the HAVE_ATOMIC128_RW case with atomic16_set at the top of
the function; the only thing left for a host without that support is
to fall through to cpu_loop_exit_atomic.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/ldst_atomicity.c.inc | 4 ----
1 file changed, 4 deletions(-)
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 1b793e6935..23d43f62a2 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -1103,10 +1103,6 @@ static void store_atom_16(CPUArchState *env, uintptr_t ra,
}
break;
case MO_128:
- if (HAVE_ATOMIC128_RW) {
- atomic16_set(pv, val);
- return;
- }
break;
default:
g_assert_not_reached();
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
` (5 preceding siblings ...)
2023-09-16 22:01 ` [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16 Richard Henderson
@ 2023-09-16 22:01 ` Richard Henderson
2023-09-30 2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
7 siblings, 0 replies; 14+ messages in thread
From: Richard Henderson @ 2023-09-16 22:01 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
Store bytes under a mask is fundamentally a cmpxchg, not a straight store.
Use HAVE_CMPXCHG128 instead of HAVE_ATOMIC128_RW.
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
accel/tcg/cputlb.c | 2 +-
accel/tcg/ldst_atomicity.c.inc | 10 +++++-----
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index 3270f65c20..3b76626666 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -2849,7 +2849,7 @@ static uint64_t do_st16_leN(CPUArchState *env, MMULookupPageData *p,
case MO_ATOM_WITHIN16_PAIR:
/* Since size > 8, this is the half that must be atomic. */
- if (!HAVE_ATOMIC128_RW) {
+ if (!HAVE_CMPXCHG128) {
cpu_loop_exit_atomic(env_cpu(env), ra);
}
return store_whole_le16(p->haddr, p->size, val_le);
diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 23d43f62a2..5c6e116cfe 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -825,7 +825,7 @@ static uint64_t store_whole_le16(void *pv, int size, Int128 val_le)
int sh = o * 8;
Int128 m, v;
- qemu_build_assert(HAVE_ATOMIC128_RW);
+ qemu_build_assert(HAVE_CMPXCHG128);
/* Like MAKE_64BIT_MASK(0, sz), but larger. */
if (sz <= 64) {
@@ -887,7 +887,7 @@ static void store_atom_2(CPUArchState *env, uintptr_t ra,
return;
}
} else if ((pi & 15) == 7) {
- if (HAVE_ATOMIC128_RW) {
+ if (HAVE_CMPXCHG128) {
Int128 v = int128_lshift(int128_make64(val), 56);
Int128 m = int128_lshift(int128_make64(0xffff), 56);
store_atom_insert_al16(pv - 7, v, m);
@@ -956,7 +956,7 @@ static void store_atom_4(CPUArchState *env, uintptr_t ra,
return;
}
} else {
- if (HAVE_ATOMIC128_RW) {
+ if (HAVE_CMPXCHG128) {
store_whole_le16(pv, 4, int128_make64(cpu_to_le32(val)));
return;
}
@@ -1021,7 +1021,7 @@ static void store_atom_8(CPUArchState *env, uintptr_t ra,
}
break;
case MO_64:
- if (HAVE_ATOMIC128_RW) {
+ if (HAVE_CMPXCHG128) {
store_whole_le16(pv, 8, int128_make64(cpu_to_le64(val)));
return;
}
@@ -1076,7 +1076,7 @@ static void store_atom_16(CPUArchState *env, uintptr_t ra,
}
break;
case -MO_64:
- if (HAVE_ATOMIC128_RW) {
+ if (HAVE_CMPXCHG128) {
uint64_t val_le;
int s2 = pi & 15;
int s1 = 16 - s2;
--
2.34.1
^ permalink raw reply related [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
` (6 preceding siblings ...)
2023-09-16 22:01 ` [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16 Richard Henderson
@ 2023-09-30 2:13 ` Richard Henderson
2023-09-30 19:04 ` WANG Xuerui
7 siblings, 1 reply; 14+ messages in thread
From: Richard Henderson @ 2023-09-30 2:13 UTC (permalink / raw)
To: qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
Ping.
r~
On 9/16/23 15:01, Richard Henderson wrote:
> For tcg generated code, use new registers with load so that we never
> overlap the input address, so that we can simplify address build for
> 64-bit user-only.
>
> For tcg out-of-line code, implement the host/ headers to for atomic 128-bit
> load and store, reducing the cases for which we must raise EXCP_ATOMIC.
>
>
> r~
>
> Based-on: 20230916171223.521545-1-richard.henderson@linaro.org
> ("[PULL v2 00/39] tcg patch queue")
>
> Richard Henderson (7):
> tcg: Add C_N2_I1
> tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
> util: Add cpuinfo for loongarch64
> tcg/loongarch64: Use cpuinfo.h
> host/include/loongarch64: Add atomic16 load and store
> accel/tcg: Remove redundant case in store_atom_16
> accel/tcg: Fix condition for store_atom_insert_al16
>
> .../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
> host/include/loongarch64/host/cpuinfo.h | 21 ++++++++
> .../loongarch64/host/load-extract-al16-al8.h | 39 ++++++++++++++
> .../loongarch64/host/store-insert-al16.h | 12 +++++
> tcg/loongarch64/tcg-target-con-set.h | 2 +-
> tcg/loongarch64/tcg-target.h | 8 +--
> accel/tcg/cputlb.c | 2 +-
> tcg/tcg.c | 5 ++
> util/cpuinfo-loongarch.c | 35 +++++++++++++
> accel/tcg/ldst_atomicity.c.inc | 14 ++---
> tcg/loongarch64/tcg-target.c.inc | 25 +++++----
> util/meson.build | 2 +
> 12 files changed, 189 insertions(+), 28 deletions(-)
> create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
> create mode 100644 host/include/loongarch64/host/cpuinfo.h
> create mode 100644 host/include/loongarch64/host/load-extract-al16-al8.h
> create mode 100644 host/include/loongarch64/host/store-insert-al16.h
> create mode 100644 util/cpuinfo-loongarch.c
>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 1/7] tcg: Add C_N2_I1
2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
@ 2023-09-30 11:39 ` Jiajie Chen
0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:39 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan
On 2023/9/17 06:01, Richard Henderson wrote:
> Constraint with two outputs, both in new registers.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> tcg/tcg.c | 5 +++++
> 1 file changed, 5 insertions(+)
>
> diff --git a/tcg/tcg.c b/tcg/tcg.c
> index 604fa9bf3e..fdbf79689a 100644
> --- a/tcg/tcg.c
> +++ b/tcg/tcg.c
> @@ -644,6 +644,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1,
> #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4),
>
> #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2),
> +#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1),
>
> #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1),
> #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2),
> @@ -666,6 +667,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
> #undef C_O1_I3
> #undef C_O1_I4
> #undef C_N1_I2
> +#undef C_N2_I1
> #undef C_O2_I1
> #undef C_O2_I2
> #undef C_O2_I3
> @@ -685,6 +687,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
> #define C_O1_I4(O1, I1, I2, I3, I4) { .args_ct_str = { #O1, #I1, #I2, #I3, #I4 } },
>
> #define C_N1_I2(O1, I1, I2) { .args_ct_str = { "&" #O1, #I1, #I2 } },
> +#define C_N2_I1(O1, O2, I1) { .args_ct_str = { "&" #O1, "&" #O2, #I1 } },
>
> #define C_O2_I1(O1, O2, I1) { .args_ct_str = { #O1, #O2, #I1 } },
> #define C_O2_I2(O1, O2, I1, I2) { .args_ct_str = { #O1, #O2, #I1, #I2 } },
> @@ -706,6 +709,7 @@ static const TCGTargetOpDef constraint_sets[] = {
> #undef C_O1_I3
> #undef C_O1_I4
> #undef C_N1_I2
> +#undef C_N2_I1
> #undef C_O2_I1
> #undef C_O2_I2
> #undef C_O2_I3
> @@ -725,6 +729,7 @@ static const TCGTargetOpDef constraint_sets[] = {
> #define C_O1_I4(O1, I1, I2, I3, I4) C_PFX5(c_o1_i4_, O1, I1, I2, I3, I4)
>
> #define C_N1_I2(O1, I1, I2) C_PFX3(c_n1_i2_, O1, I1, I2)
> +#define C_N2_I1(O1, O2, I1) C_PFX3(c_n2_i1_, O1, O2, I1)
>
> #define C_O2_I1(O1, O2, I1) C_PFX3(c_o2_i1_, O1, O2, I1)
> #define C_O2_I2(O1, O2, I1, I2) C_PFX4(c_o2_i2_, O1, O2, I1, I2)
Reviewed-by: Jiajie Chen <c@jia.je>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
@ 2023-09-30 11:39 ` Jiajie Chen
0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:39 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan
On 2023/9/17 06:01, Richard Henderson wrote:
> Use new registers for the output, so that we never overlap
> the input address, which could happen for user-only.
> This avoids a "tmp = addr + 0" in that case.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> tcg/loongarch64/tcg-target-con-set.h | 2 +-
> tcg/loongarch64/tcg-target.c.inc | 17 +++++++++++------
> 2 files changed, 12 insertions(+), 7 deletions(-)
>
> diff --git a/tcg/loongarch64/tcg-target-con-set.h b/tcg/loongarch64/tcg-target-con-set.h
> index 77d62e38e7..cae6c2aad6 100644
> --- a/tcg/loongarch64/tcg-target-con-set.h
> +++ b/tcg/loongarch64/tcg-target-con-set.h
> @@ -38,4 +38,4 @@ C_O1_I2(w, w, wM)
> C_O1_I2(w, w, wA)
> C_O1_I3(w, w, w, w)
> C_O1_I4(r, rZ, rJ, rZ, rZ)
> -C_O2_I1(r, r, r)
> +C_N2_I1(r, r, r)
> diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
> index b701df50db..40074c46b8 100644
> --- a/tcg/loongarch64/tcg-target.c.inc
> +++ b/tcg/loongarch64/tcg-target.c.inc
> @@ -1105,13 +1105,18 @@ static void tcg_out_qemu_ldst_i128(TCGContext *s, TCGReg data_lo, TCGReg data_hi
> }
> } else {
> /* Otherwise use a pair of LD/ST. */
> - tcg_out_opc_add_d(s, TCG_REG_TMP0, h.base, h.index);
> + TCGReg base = h.base;
> + if (h.index != TCG_REG_ZERO) {
> + base = TCG_REG_TMP0;
> + tcg_out_opc_add_d(s, base, h.base, h.index);
> + }
> if (is_ld) {
> - tcg_out_opc_ld_d(s, data_lo, TCG_REG_TMP0, 0);
> - tcg_out_opc_ld_d(s, data_hi, TCG_REG_TMP0, 8);
> + tcg_debug_assert(base != data_lo);
> + tcg_out_opc_ld_d(s, data_lo, base, 0);
> + tcg_out_opc_ld_d(s, data_hi, base, 8);
> } else {
> - tcg_out_opc_st_d(s, data_lo, TCG_REG_TMP0, 0);
> - tcg_out_opc_st_d(s, data_hi, TCG_REG_TMP0, 8);
> + tcg_out_opc_st_d(s, data_lo, base, 0);
> + tcg_out_opc_st_d(s, data_hi, base, 8);
> }
> }
>
> @@ -2049,7 +2054,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
>
> case INDEX_op_qemu_ld_a32_i128:
> case INDEX_op_qemu_ld_a64_i128:
> - return C_O2_I1(r, r, r);
> + return C_N2_I1(r, r, r);
>
> case INDEX_op_qemu_st_a32_i128:
> case INDEX_op_qemu_st_a64_i128:
Reviewed-by: Jiajie Chen <c@jia.je>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 3/7] util: Add cpuinfo for loongarch64
2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
@ 2023-09-30 11:40 ` Jiajie Chen
0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:40 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan
On 2023/9/17 06:01, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> host/include/loongarch64/host/cpuinfo.h | 21 +++++++++++++++
> util/cpuinfo-loongarch.c | 35 +++++++++++++++++++++++++
> util/meson.build | 2 ++
> 3 files changed, 58 insertions(+)
> create mode 100644 host/include/loongarch64/host/cpuinfo.h
> create mode 100644 util/cpuinfo-loongarch.c
>
> diff --git a/host/include/loongarch64/host/cpuinfo.h b/host/include/loongarch64/host/cpuinfo.h
> new file mode 100644
> index 0000000000..fab664a10b
> --- /dev/null
> +++ b/host/include/loongarch64/host/cpuinfo.h
> @@ -0,0 +1,21 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * Host specific cpu identification for LoongArch
> + */
> +
> +#ifndef HOST_CPUINFO_H
> +#define HOST_CPUINFO_H
> +
> +#define CPUINFO_ALWAYS (1u << 0) /* so cpuinfo is nonzero */
> +#define CPUINFO_LSX (1u << 1)
> +
> +/* Initialized with a constructor. */
> +extern unsigned cpuinfo;
> +
> +/*
> + * We cannot rely on constructor ordering, so other constructors must
> + * use the function interface rather than the variable above.
> + */
> +unsigned cpuinfo_init(void);
> +
> +#endif /* HOST_CPUINFO_H */
> diff --git a/util/cpuinfo-loongarch.c b/util/cpuinfo-loongarch.c
> new file mode 100644
> index 0000000000..08b6d7460c
> --- /dev/null
> +++ b/util/cpuinfo-loongarch.c
> @@ -0,0 +1,35 @@
> +/*
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + * Host specific cpu identification for LoongArch.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "host/cpuinfo.h"
> +
> +#ifdef CONFIG_GETAUXVAL
> +# include <sys/auxv.h>
> +#else
> +# include "elf.h"
> +#endif
> +#include <asm/hwcap.h>
> +
> +unsigned cpuinfo;
> +
> +/* Called both as constructor and (possibly) via other constructors. */
> +unsigned __attribute__((constructor)) cpuinfo_init(void)
> +{
> + unsigned info = cpuinfo;
> + unsigned long hwcap;
> +
> + if (info) {
> + return info;
> + }
> +
> + hwcap = qemu_getauxval(AT_HWCAP);
> +
> + info = CPUINFO_ALWAYS;
> + info |= (hwcap & HWCAP_LOONGARCH_LSX ? CPUINFO_LSX : 0);
> +
> + cpuinfo = info;
> + return info;
> +}
> diff --git a/util/meson.build b/util/meson.build
> index c4827fd70a..b136f02aa0 100644
> --- a/util/meson.build
> +++ b/util/meson.build
> @@ -112,6 +112,8 @@ if cpu == 'aarch64'
> util_ss.add(files('cpuinfo-aarch64.c'))
> elif cpu in ['x86', 'x86_64']
> util_ss.add(files('cpuinfo-i386.c'))
> +elif cpu == 'loongarch64'
> + util_ss.add(files('cpuinfo-loongarch.c'))
> elif cpu in ['ppc', 'ppc64']
> util_ss.add(files('cpuinfo-ppc.c'))
> endif
Reviewed-by: Jiajie Chen <c@jia.je>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h
2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
@ 2023-09-30 11:41 ` Jiajie Chen
0 siblings, 0 replies; 14+ messages in thread
From: Jiajie Chen @ 2023-09-30 11:41 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: git, gaosong, yangxiaojuan
On 2023/9/17 06:01, Richard Henderson wrote:
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> tcg/loongarch64/tcg-target.h | 8 ++++----
> tcg/loongarch64/tcg-target.c.inc | 8 +-------
> 2 files changed, 5 insertions(+), 11 deletions(-)
>
> diff --git a/tcg/loongarch64/tcg-target.h b/tcg/loongarch64/tcg-target.h
> index 03017672f6..1bea15b02e 100644
> --- a/tcg/loongarch64/tcg-target.h
> +++ b/tcg/loongarch64/tcg-target.h
> @@ -29,6 +29,8 @@
> #ifndef LOONGARCH_TCG_TARGET_H
> #define LOONGARCH_TCG_TARGET_H
>
> +#include "host/cpuinfo.h"
> +
> #define TCG_TARGET_INSN_UNIT_SIZE 4
> #define TCG_TARGET_NB_REGS 64
>
> @@ -85,8 +87,6 @@ typedef enum {
> TCG_VEC_TMP0 = TCG_REG_V23,
> } TCGReg;
>
> -extern bool use_lsx_instructions;
> -
> /* used for function call generation */
> #define TCG_REG_CALL_STACK TCG_REG_SP
> #define TCG_TARGET_STACK_ALIGN 16
> @@ -171,10 +171,10 @@ extern bool use_lsx_instructions;
> #define TCG_TARGET_HAS_muluh_i64 1
> #define TCG_TARGET_HAS_mulsh_i64 1
>
> -#define TCG_TARGET_HAS_qemu_ldst_i128 use_lsx_instructions
> +#define TCG_TARGET_HAS_qemu_ldst_i128 (cpuinfo & CPUINFO_LSX)
>
> #define TCG_TARGET_HAS_v64 0
> -#define TCG_TARGET_HAS_v128 use_lsx_instructions
> +#define TCG_TARGET_HAS_v128 (cpuinfo & CPUINFO_LSX)
> #define TCG_TARGET_HAS_v256 0
>
> #define TCG_TARGET_HAS_not_vec 1
> diff --git a/tcg/loongarch64/tcg-target.c.inc b/tcg/loongarch64/tcg-target.c.inc
> index 40074c46b8..52f2c26ce1 100644
> --- a/tcg/loongarch64/tcg-target.c.inc
> +++ b/tcg/loongarch64/tcg-target.c.inc
> @@ -32,8 +32,6 @@
> #include "../tcg-ldst.c.inc"
> #include <asm/hwcap.h>
>
> -bool use_lsx_instructions;
> -
> #ifdef CONFIG_DEBUG_TCG
> static const char * const tcg_target_reg_names[TCG_TARGET_NB_REGS] = {
> "zero",
> @@ -2316,10 +2314,6 @@ static void tcg_target_init(TCGContext *s)
> exit(EXIT_FAILURE);
> }
>
> - if (hwcap & HWCAP_LOONGARCH_LSX) {
> - use_lsx_instructions = 1;
> - }
> -
> tcg_target_available_regs[TCG_TYPE_I32] = ALL_GENERAL_REGS;
> tcg_target_available_regs[TCG_TYPE_I64] = ALL_GENERAL_REGS;
>
> @@ -2335,7 +2329,7 @@ static void tcg_target_init(TCGContext *s)
> tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S8);
> tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_S9);
>
> - if (use_lsx_instructions) {
> + if (cpuinfo & CPUINFO_LSX) {
> tcg_target_available_regs[TCG_TYPE_V128] = ALL_VECTOR_REGS;
> tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V24);
> tcg_regset_reset_reg(tcg_target_call_clobber_regs, TCG_REG_V25);
Reviewed-by: Jiajie Chen <c@jia.je>
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store
2023-09-30 2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
@ 2023-09-30 19:04 ` WANG Xuerui
0 siblings, 0 replies; 14+ messages in thread
From: WANG Xuerui @ 2023-09-30 19:04 UTC (permalink / raw)
To: Richard Henderson, qemu-devel; +Cc: git, c, gaosong, yangxiaojuan
On 9/30/23 10:13, Richard Henderson wrote:
> Ping.
>
> r~
>
> On 9/16/23 15:01, Richard Henderson wrote:
>> For tcg generated code, use new registers with load so that we never
>> overlap the input address, so that we can simplify address build for
>> 64-bit user-only.
>>
>> For tcg out-of-line code, implement the host/ headers to for atomic
>> 128-bit
>> load and store, reducing the cases for which we must raise EXCP_ATOMIC.
>>
>>
>> r~
>>
>> Based-on: 20230916171223.521545-1-richard.henderson@linaro.org
>> ("[PULL v2 00/39] tcg patch queue")
>>
>> Richard Henderson (7):
>> tcg: Add C_N2_I1
>> tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128
>> util: Add cpuinfo for loongarch64
>> tcg/loongarch64: Use cpuinfo.h
>> host/include/loongarch64: Add atomic16 load and store
>> accel/tcg: Remove redundant case in store_atom_16
>> accel/tcg: Fix condition for store_atom_insert_al16
>>
>> .../include/loongarch64/host/atomic128-ldst.h | 52 +++++++++++++++++++
>> host/include/loongarch64/host/cpuinfo.h | 21 ++++++++
>> .../loongarch64/host/load-extract-al16-al8.h | 39 ++++++++++++++
>> .../loongarch64/host/store-insert-al16.h | 12 +++++
>> tcg/loongarch64/tcg-target-con-set.h | 2 +-
>> tcg/loongarch64/tcg-target.h | 8 +--
>> accel/tcg/cputlb.c | 2 +-
>> tcg/tcg.c | 5 ++
>> util/cpuinfo-loongarch.c | 35 +++++++++++++
>> accel/tcg/ldst_atomicity.c.inc | 14 ++---
>> tcg/loongarch64/tcg-target.c.inc | 25 +++++----
>> util/meson.build | 2 +
>> 12 files changed, 189 insertions(+), 28 deletions(-)
>> create mode 100644 host/include/loongarch64/host/atomic128-ldst.h
>> create mode 100644 host/include/loongarch64/host/cpuinfo.h
>> create mode 100644
>> host/include/loongarch64/host/load-extract-al16-al8.h
>> create mode 100644 host/include/loongarch64/host/store-insert-al16.h
>> create mode 100644 util/cpuinfo-loongarch.c
Sorry for the delay; I've skimmed through the series and tested on
Loongson 3C5000L hardware, so
Reviewed-by: WANG Xuerui <git@xen0n.name>
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2023-09-30 19:05 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-16 22:01 [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
2023-09-16 22:01 ` [PATCH 1/7] tcg: Add C_N2_I1 Richard Henderson
2023-09-30 11:39 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 2/7] tcg/loongarch64: Use C_N2_I1 for INDEX_op_qemu_ld_a*_i128 Richard Henderson
2023-09-30 11:39 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 3/7] util: Add cpuinfo for loongarch64 Richard Henderson
2023-09-30 11:40 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 4/7] tcg/loongarch64: Use cpuinfo.h Richard Henderson
2023-09-30 11:41 ` Jiajie Chen
2023-09-16 22:01 ` [PATCH 5/7] host/include/loongarch64: Add atomic16 load and store Richard Henderson
2023-09-16 22:01 ` [PATCH 6/7] accel/tcg: Remove redundant case in store_atom_16 Richard Henderson
2023-09-16 22:01 ` [PATCH 7/7] accel/tcg: Fix condition for store_atom_insert_al16 Richard Henderson
2023-09-30 2:13 ` [PATCH 0/7] tcg/loongarch64: Improvements for 128-bit load/store Richard Henderson
2023-09-30 19:04 ` WANG Xuerui
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).