* [PULL v2 00/28] tcg patch queue
@ 2023-05-24 1:14 Richard Henderson
2023-05-24 1:14 ` [PULL v2 26/28] qemu/atomic128: Add runtime test for FEAT_LSE2 Richard Henderson
2023-05-24 3:25 ` [PULL v2 00/28] tcg patch queue Richard Henderson
0 siblings, 2 replies; 3+ messages in thread
From: Richard Henderson @ 2023-05-24 1:14 UTC (permalink / raw)
To: qemu-devel
v2: Testing revealed a missing earlyclober in the aa64 inline asm,
which showed up with macos testing.
r~
The following changes since commit aa33508196f4e2da04625bee36e1f7be5b9267e7:
Merge tag 'mem-2023-05-23' of https://github.com/davidhildenbrand/qemu into staging (2023-05-23 10:57:25 -0700)
are available in the Git repository at:
https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230523-2
for you to fetch changes up to a57663c5a38c26516bde24ecb3992adff4861a31:
tcg: Remove USE_TCG_OPTIMIZATIONS (2023-05-24 01:10:44 +0000)
----------------------------------------------------------------
util: Host cpu detection for x86 and aa64
util: Use cpu detection for bufferiszero
migration: Use cpu detection for xbzrle
tcg: Replace and remove cpu_atomic_{ld,st}o*
host/include: Split qemu/atomic128.h
tcg: Remove DEBUG_DISAS
tcg: Remove USE_TCG_OPTIMIZATIONS
----------------------------------------------------------------
Richard Henderson (28):
util: Introduce host-specific cpuinfo.h
util: Add cpuinfo-i386.c
util: Add i386 CPUINFO_ATOMIC_VMOVDQU
tcg/i386: Use host/cpuinfo.h
util/bufferiszero: Use i386 host/cpuinfo.h
migration/xbzrle: Shuffle function order
migration/xbzrle: Use i386 host/cpuinfo.h
migration: Build migration_files once
util: Add cpuinfo-aarch64.c
include/host: Split out atomic128-cas.h
include/host: Split out atomic128-ldst.h
meson: Fix detect atomic128 support with optimization
include/qemu: Move CONFIG_ATOMIC128_OPT handling to atomic128.h
target/ppc: Use tcg_gen_qemu_{ld,st}_i128 for LQARX, LQ, STQ
target/s390x: Use tcg_gen_qemu_{ld,st}_i128 for LPQ, STPQ
accel/tcg: Unify cpu_{ld,st}*_{be,le}_mmu
target/s390x: Use cpu_{ld,st}*_mmu in do_csst
target/s390x: Always use cpu_atomic_cmpxchgl_be_mmu in do_csst
accel/tcg: Remove cpu_atomic_{ld,st}o_*_mmu
accel/tcg: Remove prot argument to atomic_mmu_lookup
accel/tcg: Eliminate #if on HAVE_ATOMIC128 and HAVE_CMPXCHG128
qemu/atomic128: Split atomic16_read
accel/tcg: Correctly use atomic128.h in ldst_atomicity.c.inc
tcg: Split out tcg/debug-assert.h
qemu/atomic128: Improve cmpxchg fallback for atomic16_set
qemu/atomic128: Add runtime test for FEAT_LSE2
tcg: Remove DEBUG_DISAS
tcg: Remove USE_TCG_OPTIMIZATIONS
accel/tcg/atomic_template.h | 93 +-----
host/include/aarch64/host/atomic128-cas.h | 45 +++
host/include/aarch64/host/atomic128-ldst.h | 79 +++++
host/include/aarch64/host/cpuinfo.h | 22 ++
host/include/generic/host/atomic128-cas.h | 47 +++
host/include/generic/host/atomic128-ldst.h | 81 +++++
host/include/generic/host/cpuinfo.h | 4 +
host/include/i386/host/cpuinfo.h | 39 +++
host/include/x86_64/host/cpuinfo.h | 1 +
include/exec/cpu_ldst.h | 67 +----
include/exec/exec-all.h | 3 -
include/qemu/atomic128.h | 146 ++-------
include/tcg/debug-assert.h | 17 ++
include/tcg/tcg.h | 9 +-
migration/xbzrle.h | 5 +-
target/ppc/cpu.h | 1 -
target/ppc/helper.h | 9 -
target/s390x/cpu.h | 3 -
target/s390x/helper.h | 4 -
tcg/aarch64/tcg-target.h | 6 +-
tcg/i386/tcg-target.h | 28 +-
accel/tcg/cpu-exec.c | 2 -
accel/tcg/cputlb.c | 211 ++++---------
accel/tcg/translate-all.c | 2 -
accel/tcg/translator.c | 2 -
accel/tcg/user-exec.c | 332 ++++++--------------
migration/ram.c | 34 +--
migration/xbzrle.c | 268 +++++++++--------
target/arm/tcg/m_helper.c | 4 +-
target/ppc/mem_helper.c | 48 ---
target/ppc/translate.c | 34 +--
target/s390x/tcg/mem_helper.c | 137 ++-------
target/s390x/tcg/translate.c | 30 +-
target/sh4/translate.c | 2 -
target/sparc/ldst_helper.c | 18 +-
target/sparc/translate.c | 2 -
tcg/tcg.c | 14 +-
tests/bench/xbzrle-bench.c | 469 -----------------------------
tests/unit/test-xbzrle.c | 49 +--
util/bufferiszero.c | 127 +++-----
util/cpuinfo-aarch64.c | 67 +++++
util/cpuinfo-i386.c | 99 ++++++
MAINTAINERS | 3 +
accel/tcg/atomic_common.c.inc | 14 -
accel/tcg/ldst_atomicity.c.inc | 135 ++-------
accel/tcg/ldst_common.c.inc | 24 +-
meson.build | 12 +-
migration/meson.build | 1 -
target/ppc/translate/fixedpoint-impl.c.inc | 51 +---
target/s390x/tcg/insn-data.h.inc | 2 +-
tcg/aarch64/tcg-target.c.inc | 40 ---
tcg/i386/tcg-target.c.inc | 123 +-------
tests/bench/meson.build | 6 -
util/meson.build | 6 +
54 files changed, 1035 insertions(+), 2042 deletions(-)
create mode 100644 host/include/aarch64/host/atomic128-cas.h
create mode 100644 host/include/aarch64/host/atomic128-ldst.h
create mode 100644 host/include/aarch64/host/cpuinfo.h
create mode 100644 host/include/generic/host/atomic128-cas.h
create mode 100644 host/include/generic/host/atomic128-ldst.h
create mode 100644 host/include/generic/host/cpuinfo.h
create mode 100644 host/include/i386/host/cpuinfo.h
create mode 100644 host/include/x86_64/host/cpuinfo.h
create mode 100644 include/tcg/debug-assert.h
delete mode 100644 tests/bench/xbzrle-bench.c
create mode 100644 util/cpuinfo-aarch64.c
create mode 100644 util/cpuinfo-i386.c
^ permalink raw reply [flat|nested] 3+ messages in thread* [PULL v2 26/28] qemu/atomic128: Add runtime test for FEAT_LSE2 2023-05-24 1:14 [PULL v2 00/28] tcg patch queue Richard Henderson @ 2023-05-24 1:14 ` Richard Henderson 2023-05-24 3:25 ` [PULL v2 00/28] tcg patch queue Richard Henderson 1 sibling, 0 replies; 3+ messages in thread From: Richard Henderson @ 2023-05-24 1:14 UTC (permalink / raw) To: qemu-devel; +Cc: Alex Bennée With FEAT_LSE2, load and store of int128 is directly supported. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> --- host/include/aarch64/host/atomic128-ldst.h | 53 ++++++++++++++++------ 1 file changed, 40 insertions(+), 13 deletions(-) diff --git a/host/include/aarch64/host/atomic128-ldst.h b/host/include/aarch64/host/atomic128-ldst.h index 4b1360de39..a08f62c40a 100644 --- a/host/include/aarch64/host/atomic128-ldst.h +++ b/host/include/aarch64/host/atomic128-ldst.h @@ -11,27 +11,48 @@ #ifndef AARCH64_ATOMIC128_LDST_H #define AARCH64_ATOMIC128_LDST_H +#include "host/cpuinfo.h" +#include "tcg/debug-assert.h" + /* * Through gcc 10, aarch64 has no support for 128-bit atomics. * Through clang 16, without -march=armv8.4-a, __atomic_load_16 * is incorrectly expanded to a read-write operation. + * + * Anyway, this method allows runtime detection of FEAT_LSE2. */ -#define HAVE_ATOMIC128_RO 0 +#define HAVE_ATOMIC128_RO (cpuinfo & CPUINFO_LSE2) #define HAVE_ATOMIC128_RW 1 -Int128 QEMU_ERROR("unsupported atomic") atomic16_read_ro(const Int128 *ptr); +static inline Int128 atomic16_read_ro(const Int128 *ptr) +{ + uint64_t l, h; + + tcg_debug_assert(HAVE_ATOMIC128_RO); + /* With FEAT_LSE2, 16-byte aligned LDP is atomic. */ + asm("ldp %[l], %[h], %[mem]" + : [l] "=r"(l), [h] "=r"(h) : [mem] "m"(*ptr)); + + return int128_make128(l, h); +} static inline Int128 atomic16_read_rw(Int128 *ptr) { uint64_t l, h; uint32_t tmp; - /* The load must be paired with the store to guarantee not tearing. */ - asm("0: ldxp %[l], %[h], %[mem]\n\t" - "stxp %w[tmp], %[l], %[h], %[mem]\n\t" - "cbnz %w[tmp], 0b" - : [mem] "+m"(*ptr), [tmp] "=r"(tmp), [l] "=r"(l), [h] "=r"(h)); + if (cpuinfo & CPUINFO_LSE2) { + /* With FEAT_LSE2, 16-byte aligned LDP is atomic. */ + asm("ldp %[l], %[h], %[mem]" + : [l] "=r"(l), [h] "=r"(h) : [mem] "m"(*ptr)); + } else { + /* The load must be paired with the store to guarantee not tearing. */ + asm("0: ldxp %[l], %[h], %[mem]\n\t" + "stxp %w[tmp], %[l], %[h], %[mem]\n\t" + "cbnz %w[tmp], 0b" + : [mem] "+m"(*ptr), [tmp] "=&r"(tmp), [l] "=&r"(l), [h] "=&r"(h)); + } return int128_make128(l, h); } @@ -41,12 +62,18 @@ static inline void atomic16_set(Int128 *ptr, Int128 val) uint64_t l = int128_getlo(val), h = int128_gethi(val); uint64_t t1, t2; - /* Load into temporaries to acquire the exclusive access lock. */ - asm("0: ldxp %[t1], %[t2], %[mem]\n\t" - "stxp %w[t1], %[l], %[h], %[mem]\n\t" - "cbnz %w[t1], 0b" - : [mem] "+m"(*ptr), [t1] "=&r"(t1), [t2] "=&r"(t2) - : [l] "r"(l), [h] "r"(h)); + if (cpuinfo & CPUINFO_LSE2) { + /* With FEAT_LSE2, 16-byte aligned STP is atomic. */ + asm("stp %[l], %[h], %[mem]" + : [mem] "=m"(*ptr) : [l] "r"(l), [h] "r"(h)); + } else { + /* Load into temporaries to acquire the exclusive access lock. */ + asm("0: ldxp %[t1], %[t2], %[mem]\n\t" + "stxp %w[t1], %[l], %[h], %[mem]\n\t" + "cbnz %w[t1], 0b" + : [mem] "+m"(*ptr), [t1] "=&r"(t1), [t2] "=&r"(t2) + : [l] "r"(l), [h] "r"(h)); + } } #endif /* AARCH64_ATOMIC128_LDST_H */ -- 2.34.1 ^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: [PULL v2 00/28] tcg patch queue 2023-05-24 1:14 [PULL v2 00/28] tcg patch queue Richard Henderson 2023-05-24 1:14 ` [PULL v2 26/28] qemu/atomic128: Add runtime test for FEAT_LSE2 Richard Henderson @ 2023-05-24 3:25 ` Richard Henderson 1 sibling, 0 replies; 3+ messages in thread From: Richard Henderson @ 2023-05-24 3:25 UTC (permalink / raw) To: qemu-devel On 5/23/23 18:14, Richard Henderson wrote: > v2: Testing revealed a missing earlyclober in the aa64 inline asm, > which showed up with macos testing. > > r~ > > The following changes since commit aa33508196f4e2da04625bee36e1f7be5b9267e7: > > Merge tag 'mem-2023-05-23' of https://github.com/davidhildenbrand/qemu into staging (2023-05-23 10:57:25 -0700) > > are available in the Git repository at: > > https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230523-2 > > for you to fetch changes up to a57663c5a38c26516bde24ecb3992adff4861a31: > > tcg: Remove USE_TCG_OPTIMIZATIONS (2023-05-24 01:10:44 +0000) > > ---------------------------------------------------------------- > util: Host cpu detection for x86 and aa64 > util: Use cpu detection for bufferiszero > migration: Use cpu detection for xbzrle > tcg: Replace and remove cpu_atomic_{ld,st}o* > host/include: Split qemu/atomic128.h > tcg: Remove DEBUG_DISAS > tcg: Remove USE_TCG_OPTIMIZATIONS Applied, thanks. Please update https://wiki.qemu.org/ChangeLog/8.1 as appropriate. r~ ^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2023-05-24 3:25 UTC | newest] Thread overview: 3+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-05-24 1:14 [PULL v2 00/28] tcg patch queue Richard Henderson 2023-05-24 1:14 ` [PULL v2 26/28] qemu/atomic128: Add runtime test for FEAT_LSE2 Richard Henderson 2023-05-24 3:25 ` [PULL v2 00/28] tcg patch queue Richard Henderson
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).