[PATCH 0/7] target/arm: Implement FEAT

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/7] target/arm: Implement FEAT_LSE128
@ 2025-08-15 12:26 Richard Henderson
  2025-08-15 12:26 ` [PATCH 1/7] qemu/atomic: Finish renaming atomic128-cas.h headers Richard Henderson
                   ` (7 more replies)
  0 siblings, 8 replies; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel

This extension has instructions for atomic 128-bit swap, fetch-and,
and fetch-or.  This is fairly easy to implement with existing host
support for 128-bit compare-and-swap.

Unlike for 64-bit operations, I did not implement the multitude of
atomic fetch-op and op-fetch functions.  Those can wait until there
is a need for them.


r~


Richard Henderson (7):
  qemu/atomic: Finish renaming atomic128-cas.h headers
  qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or
  accel/tcg: Add cpu_atomic_*_mmu for 16-byte xchg, fetch_and, fetch_or
  tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128
  target/arm: Rename isar_feature_aa64_atomics
  target/arm: Implement FEAT_LSE128
  target/arm: Enable FEAT_LSE128 for -cpu max

 accel/tcg/atomic_template.h                   |  80 +++++++++++++-
 accel/tcg/tcg-runtime.h                       |  12 +++
 host/include/aarch64/host/atomic128-cas.h     |  45 --------
 include/accel/tcg/cpu-ldst-common.h           |  13 ++-
 include/tcg/tcg-op-common.h                   |   7 ++
 include/tcg/tcg-op.h                          |   3 +
 target/arm/cpu-features.h                     |   7 +-
 linux-user/elfload.c                          |   3 +-
 target/arm/tcg/cpu64.c                        |   2 +-
 target/arm/tcg/translate-a64.c                |  73 ++++++++++---
 tcg/tcg-op-ldst.c                             |  97 ++++++++++++++++-
 accel/tcg/atomic_common.c.inc                 |   9 ++
 docs/system/arm/emulation.rst                 |   1 +
 host/include/aarch64/host/atomic128-cas.h.inc | 102 ++++++++++++++++++
 host/include/generic/host/atomic128-cas.h.inc |  96 +++++++++++++++++
 target/arm/tcg/a64.decode                     |   7 ++
 16 files changed, 487 insertions(+), 70 deletions(-)
 delete mode 100644 host/include/aarch64/host/atomic128-cas.h
 create mode 100644 host/include/aarch64/host/atomic128-cas.h.inc

-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 1/7] qemu/atomic: Finish renaming atomic128-cas.h headers
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
@ 2025-08-15 12:26 ` Richard Henderson
  2025-08-19 14:01   ` Peter Maydell
  2025-08-15 12:26 ` [PATCH 2/7] qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or Richard Henderson
                   ` (6 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-stable

The aarch64 header was not renamed with the others, meaning it
was skipped in favor of the generic version.

Cc: qemu-stable@nongnu.org
Fixes: 15606965400b ("qemu/atomic: Rename atomic128-cas.h headers using .h.inc suffix")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 .../include/aarch64/host/{atomic128-cas.h => atomic128-cas.h.inc} | 0
 1 file changed, 0 insertions(+), 0 deletions(-)
 rename host/include/aarch64/host/{atomic128-cas.h => atomic128-cas.h.inc} (100%)

diff --git a/host/include/aarch64/host/atomic128-cas.h b/host/include/aarch64/host/atomic128-cas.h.inc
similarity index 100%
rename from host/include/aarch64/host/atomic128-cas.h
rename to host/include/aarch64/host/atomic128-cas.h.inc
-- 
2.43.0



^ permalink raw reply	[flat|nested] 16+ messages in thread

* [PATCH 2/7] qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
  2025-08-15 12:26 ` [PATCH 1/7] qemu/atomic: Finish renaming atomic128-cas.h headers Richard Henderson
@ 2025-08-15 12:26 ` Richard Henderson
  2025-08-19 14:13   ` Peter Maydell
  2025-08-15 12:26 ` [PATCH 3/7] accel/tcg: Add cpu_atomic_*_mmu for 16-byte " Richard Henderson
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 host/include/aarch64/host/atomic128-cas.h.inc | 57 +++++++++++
 host/include/generic/host/atomic128-cas.h.inc | 96 +++++++++++++++++++
 2 files changed, 153 insertions(+)

diff --git a/host/include/aarch64/host/atomic128-cas.h.inc b/host/include/aarch64/host/atomic128-cas.h.inc
index 991da4ef54..aec27df182 100644
--- a/host/include/aarch64/host/atomic128-cas.h.inc
+++ b/host/include/aarch64/host/atomic128-cas.h.inc
@@ -38,6 +38,63 @@ static inline Int128 atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
     return int128_make128(oldl, oldh);
 }
 
+static inline Int128 atomic16_xchg(Int128 *ptr, Int128 new)
+{
+    uint64_t newl = int128_getlo(new), newh = int128_gethi(new);
+    uint64_t oldl, oldh;
+    uint32_t tmp;
+
+    asm("0: ldaxp %[oldl], %[oldh], %[mem]\n\t"
+        "stlxp %w[tmp], %[newl], %[newh], %[mem]\n\t"
+        "cbnz %w[tmp], 0b"
+        : [mem] "+m"(*ptr), [tmp] "=&r"(tmp),
+          [oldl] "=&r"(oldl), [oldh] "=&r"(oldh)
+        : [newl] "r"(newl), [newh] "r"(newh)
+        : "memory");
+
+    return int128_make128(oldl, oldh);
+}
+
+static inline Int128 atomic16_fetch_and(Int128 *ptr, Int128 new)
+{
+    uint64_t newl = int128_getlo(new), newh = int128_gethi(new);
+    uint64_t oldl, oldh, tmpl, tmph;
+    uint32_t tmp;
+
+    asm("0: ldaxp %[oldl], %[oldh], %[mem]\n\t"
+        "and %[tmpl], %[oldl], %[newl]\n\t"
+        "and %[tmph], %[oldh], %[newh]\n\t"
+        "stlxp %w[tmp], %[tmpl], %[tmph], %[mem]\n\t"
+        "cbnz %w[tmp], 0b"
+        : [mem] "+m"(*ptr), [tmp] "=&r"(tmp),
+          [oldl] "=&r"(oldl), [oldh] "=&r"(oldh)
+        : [newl] "r"(newl), [newh] "r"(newh),
+          [tmpl] "r"(tmpl), [tmph] "r"(tmph)
+        : "memory");
+
+    return int128_make128(oldl, oldh);
+}
+
+static inline Int128 atomic16_fetch_or(Int128 *ptr, Int128 new)
+{
+    uint64_t newl = int128_getlo(new), newh = int128_gethi(new);
+    uint64_t oldl, oldh, tmpl, tmph;
+    uint32_t tmp;
+
+    asm("0: ldaxp %[oldl], %[oldh], %[mem]\n\t"
+        "orr %[tmpl], %[oldl], %[newl]\n\t"
+        "orr %[tmph], %[oldh], %[newh]\n\t"
+        "stlxp %w[tmp], %[tmpl], %[tmph], %[mem]\n\t"
+        "cbnz %w[tmp], 0b"
+        : [mem] "+m"(*ptr), [tmp] "=&r"(tmp),
+          [oldl] "=&r"(oldl), [oldh] "=&r"(oldh)
+        : [newl] "r"(newl), [newh] "r"(newh),
+          [tmpl] "r"(tmpl), [tmph] "r"(tmph)
+        : "memory");
+
+    return int128_make128(oldl, oldh);
+}
+
 # define CONFIG_CMPXCHG128 1
 # define HAVE_CMPXCHG128 1
 #endif
diff --git a/host/include/generic/host/atomic128-cas.h.inc b/host/include/generic/host/atomic128-cas.h.inc
index 6b40cc2271..990162c56f 100644
--- a/host/include/generic/host/atomic128-cas.h.inc
+++ b/host/include/generic/host/atomic128-cas.h.inc
@@ -23,6 +23,51 @@ atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
     r.i = qatomic_cmpxchg__nocheck(ptr_align, c.i, n.i);
     return r.s;
 }
+
+/*
+ * Since we're looping anyway, use weak compare and swap.
+ * If the host supports weak, this will eliminate a second loop hidden
+ * within the atomic operation itself; otherwise the weak parameter is
+ * ignored.
+ */
+static inline Int128 ATTRIBUTE_ATOMIC128_OPT
+atomic16_xchg(Int128 *ptr, Int128 new)
+{
+    __int128_t *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128 old = *ptr_align;
+
+    while (!__atomic_compare_exchange_n(ptr_align, &old, new, true,
+                                        __ATOMIC_SEQ_CST, 0)) {
+        continue;
+    }
+    return old;
+}
+
+static inline Int128 ATTRIBUTE_ATOMIC128_OPT
+atomic16_fetch_and(Int128 *ptr, Int128 val)
+{
+    __int128_t *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128 old = *ptr_align;
+
+    while (!__atomic_compare_exchange_n(ptr_align, &old, old & val, true,
+                                        __ATOMIC_SEQ_CST, 0)) {
+        continue;
+    }
+    return old;
+}
+
+static inline Int128 ATTRIBUTE_ATOMIC128_OPT
+atomic16_fetch_or(Int128 *ptr, Int128 val)
+{
+    __int128_t *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128 old = *ptr_align;
+
+    while (!__atomic_compare_exchange_n(ptr_align, &old, old | val, true,
+                                        __ATOMIC_SEQ_CST, 0)) {
+        continue;
+    }
+    return old;
+}
 # define HAVE_CMPXCHG128 1
 #elif defined(CONFIG_CMPXCHG128)
 static inline Int128 ATTRIBUTE_ATOMIC128_OPT
@@ -36,6 +81,57 @@ atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
     r.i = __sync_val_compare_and_swap_16(ptr_align, c.i, n.i);
     return r.s;
 }
+
+static inline Int128 ATTRIBUTE_ATOMIC128_OPT
+atomic16_xchg(Int128 *ptr, Int128 new)
+{
+    Int128Aligned *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128Alias o, n;
+
+    n.s = new;
+    o.s = *ptr_align;
+    while (1) {
+        __int128 c = __sync_val_compare_and_swap_16(ptr_align, o.i, n.i);
+        if (c == o.i) {
+            return o.s;
+        }
+        o.i = c;
+    }
+}
+
+static inline Int128 ATTRIBUTE_ATOMIC128_OPT
+atomic16_fetch_and(Int128 *ptr, Int128 val)
+{
+    Int128Aligned *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128Alias o, v;
+
+    v.s = val;
+    o.s = *ptr_align;
+    while (1) {
+        __int128 c = __sync_val_compare_and_swap_16(ptr_align, o.i, o.i & v.i);
+        if (c == o.i) {
+            return o.s;
+        }
+        o.i = c;
+    }
+}
+
+static inline Int128 ATTRIBUTE_ATOMIC128_OPT
+atomic16_fetch_or(Int128 *ptr, Int128 val)
+{
+    Int128Aligned *ptr_align = __builtin_assume_aligned(ptr, 16);
+    Int128Alias o, v;
+
+    v.s = val;
+    o.s = *ptr_align;
+    while (1) {
+        __int128 c = __sync_val_compare_and_swap_16(ptr_align, o.i, o.i | v.i);
+        if (c == o.i) {
+            return o.s;
+        }
+        o.i = c;
+    }
+}
 # define HAVE_CMPXCHG128 1
 #else
 /* Fallback definition that must be optimized away, or error.  */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 3/7] accel/tcg: Add cpu_atomic_*_mmu for 16-byte xchg, fetch_and, fetch_or
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
  2025-08-15 12:26 ` [PATCH 1/7] qemu/atomic: Finish renaming atomic128-cas.h headers Richard Henderson
  2025-08-15 12:26 ` [PATCH 2/7] qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or Richard Henderson
@ 2025-08-15 12:26 ` Richard Henderson
  2025-08-19 14:15   ` Peter Maydell
  2025-08-15 12:26 ` [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128 Richard Henderson
                   ` (4 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/atomic_template.h         | 80 +++++++++++++++++++++++++++--
 include/accel/tcg/cpu-ldst-common.h | 13 +++--
 2 files changed, 86 insertions(+), 7 deletions(-)

diff --git a/accel/tcg/atomic_template.h b/accel/tcg/atomic_template.h
index 08a475c10c..ae5203b439 100644
--- a/accel/tcg/atomic_template.h
+++ b/accel/tcg/atomic_template.h
@@ -100,7 +100,6 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, vaddr addr,
     return ret;
 }
 
-#if DATA_SIZE < 16
 ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, vaddr addr, ABI_TYPE val,
                            MemOpIdx oi, uintptr_t retaddr)
 {
@@ -108,7 +107,11 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, vaddr addr, ABI_TYPE val,
                                          DATA_SIZE, retaddr);
     DATA_TYPE ret;
 
+#if DATA_SIZE == 16
+    ret = atomic16_xchg(haddr, val);
+#else
     ret = qatomic_xchg__nocheck(haddr, val);
+#endif
     ATOMIC_MMU_CLEANUP;
     atomic_trace_rmw_post(env, addr,
                           VALUE_LOW(ret),
@@ -119,6 +122,39 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, vaddr addr, ABI_TYPE val,
     return ret;
 }
 
+#if DATA_SIZE == 16
+ABI_TYPE ATOMIC_NAME(fetch_and)(CPUArchState *env, vaddr addr, ABI_TYPE val,
+                                MemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+                                         DATA_SIZE, retaddr);
+    DATA_TYPE ret = atomic16_fetch_and(haddr, val);
+    ATOMIC_MMU_CLEANUP;
+    atomic_trace_rmw_post(env, addr,
+                          VALUE_LOW(ret),
+                          VALUE_HIGH(ret),
+                          VALUE_LOW(val),
+                          VALUE_HIGH(val),
+                          oi);
+    return ret;
+}
+
+ABI_TYPE ATOMIC_NAME(fetch_or)(CPUArchState *env, vaddr addr, ABI_TYPE val,
+                               MemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+                                         DATA_SIZE, retaddr);
+    DATA_TYPE ret = atomic16_fetch_or(haddr, val);
+    ATOMIC_MMU_CLEANUP;
+    atomic_trace_rmw_post(env, addr,
+                          VALUE_LOW(ret),
+                          VALUE_HIGH(ret),
+                          VALUE_LOW(val),
+                          VALUE_HIGH(val),
+                          oi);
+    return ret;
+}
+#else
 #define GEN_ATOMIC_HELPER(X)                                        \
 ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, vaddr addr,              \
                         ABI_TYPE val, MemOpIdx oi, uintptr_t retaddr) \
@@ -188,7 +224,7 @@ GEN_ATOMIC_HELPER_FN(smax_fetch, MAX, SDATA_TYPE, new)
 GEN_ATOMIC_HELPER_FN(umax_fetch, MAX,  DATA_TYPE, new)
 
 #undef GEN_ATOMIC_HELPER_FN
-#endif /* DATA SIZE < 16 */
+#endif /* DATA SIZE == 16 */
 
 #undef END
 
@@ -225,7 +261,6 @@ ABI_TYPE ATOMIC_NAME(cmpxchg)(CPUArchState *env, vaddr addr,
     return BSWAP(ret);
 }
 
-#if DATA_SIZE < 16
 ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, vaddr addr, ABI_TYPE val,
                            MemOpIdx oi, uintptr_t retaddr)
 {
@@ -233,7 +268,11 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, vaddr addr, ABI_TYPE val,
                                          DATA_SIZE, retaddr);
     ABI_TYPE ret;
 
+#if DATA_SIZE == 16
+    ret = atomic16_xchg(haddr, BSWAP(val));
+#else
     ret = qatomic_xchg__nocheck(haddr, BSWAP(val));
+#endif
     ATOMIC_MMU_CLEANUP;
     atomic_trace_rmw_post(env, addr,
                           VALUE_LOW(ret),
@@ -244,6 +283,39 @@ ABI_TYPE ATOMIC_NAME(xchg)(CPUArchState *env, vaddr addr, ABI_TYPE val,
     return BSWAP(ret);
 }
 
+#if DATA_SIZE == 16
+ABI_TYPE ATOMIC_NAME(fetch_and)(CPUArchState *env, vaddr addr, ABI_TYPE val,
+                                MemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+                                         DATA_SIZE, retaddr);
+    DATA_TYPE ret = atomic16_fetch_and(haddr, BSWAP(val));
+    ATOMIC_MMU_CLEANUP;
+    atomic_trace_rmw_post(env, addr,
+                          VALUE_LOW(ret),
+                          VALUE_HIGH(ret),
+                          VALUE_LOW(val),
+                          VALUE_HIGH(val),
+                          oi);
+    return BSWAP(ret);
+}
+
+ABI_TYPE ATOMIC_NAME(fetch_or)(CPUArchState *env, vaddr addr, ABI_TYPE val,
+                               MemOpIdx oi, uintptr_t retaddr)
+{
+    DATA_TYPE *haddr = atomic_mmu_lookup(env_cpu(env), addr, oi,
+                                         DATA_SIZE, retaddr);
+    DATA_TYPE ret = atomic16_fetch_or(haddr, BSWAP(val));
+    ATOMIC_MMU_CLEANUP;
+    atomic_trace_rmw_post(env, addr,
+                          VALUE_LOW(ret),
+                          VALUE_HIGH(ret),
+                          VALUE_LOW(val),
+                          VALUE_HIGH(val),
+                          oi);
+    return BSWAP(ret);
+}
+#else
 #define GEN_ATOMIC_HELPER(X)                                        \
 ABI_TYPE ATOMIC_NAME(X)(CPUArchState *env, vaddr addr,              \
                         ABI_TYPE val, MemOpIdx oi, uintptr_t retaddr) \
@@ -317,7 +389,7 @@ GEN_ATOMIC_HELPER_FN(add_fetch, ADD, DATA_TYPE, new)
 #undef ADD
 
 #undef GEN_ATOMIC_HELPER_FN
-#endif /* DATA_SIZE < 16 */
+#endif /* DATA_SIZE == 16 */
 
 #undef END
 #endif /* DATA_SIZE > 1 */
diff --git a/include/accel/tcg/cpu-ldst-common.h b/include/accel/tcg/cpu-ldst-common.h
index 8bf17c2fab..17a3250ded 100644
--- a/include/accel/tcg/cpu-ldst-common.h
+++ b/include/accel/tcg/cpu-ldst-common.h
@@ -100,9 +100,6 @@ GEN_ATOMIC_HELPER_ALL(umax_fetch)
 
 GEN_ATOMIC_HELPER_ALL(xchg)
 
-#undef GEN_ATOMIC_HELPER_ALL
-#undef GEN_ATOMIC_HELPER
-
 Int128 cpu_atomic_cmpxchgo_le_mmu(CPUArchState *env, vaddr addr,
                                   Int128 cmpv, Int128 newv,
                                   MemOpIdx oi, uintptr_t retaddr);
@@ -110,6 +107,16 @@ Int128 cpu_atomic_cmpxchgo_be_mmu(CPUArchState *env, vaddr addr,
                                   Int128 cmpv, Int128 newv,
                                   MemOpIdx oi, uintptr_t retaddr);
 
+GEN_ATOMIC_HELPER(xchg, Int128, o_le)
+GEN_ATOMIC_HELPER(xchg, Int128, o_be)
+GEN_ATOMIC_HELPER(fetch_and, Int128, o_le)
+GEN_ATOMIC_HELPER(fetch_and, Int128, o_be)
+GEN_ATOMIC_HELPER(fetch_or, Int128, o_le)
+GEN_ATOMIC_HELPER(fetch_or, Int128, o_be)
+
+#undef GEN_ATOMIC_HELPER_ALL
+#undef GEN_ATOMIC_HELPER
+
 uint8_t cpu_ldb_code_mmu(CPUArchState *env, vaddr addr,
                          MemOpIdx oi, uintptr_t ra);
 uint16_t cpu_ldw_code_mmu(CPUArchState *env, vaddr addr,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
                   ` (2 preceding siblings ...)
  2025-08-15 12:26 ` [PATCH 3/7] accel/tcg: Add cpu_atomic_*_mmu for 16-byte " Richard Henderson
@ 2025-08-15 12:26 ` Richard Henderson
  2025-08-19 14:17   ` [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg, fetch_and, fetch_or}_i128 Peter Maydell
  2025-08-15 12:26 ` [PATCH 5/7] target/arm: Rename isar_feature_aa64_atomics Richard Henderson
                   ` (3 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/tcg-runtime.h       | 12 +++++
 include/tcg/tcg-op-common.h   |  7 +++
 include/tcg/tcg-op.h          |  3 ++
 tcg/tcg-op-ldst.c             | 97 +++++++++++++++++++++++++++++++++--
 accel/tcg/atomic_common.c.inc |  9 ++++
 5 files changed, 125 insertions(+), 3 deletions(-)

diff --git a/accel/tcg/tcg-runtime.h b/accel/tcg/tcg-runtime.h
index c23b5e66c4..8436599b9f 100644
--- a/accel/tcg/tcg-runtime.h
+++ b/accel/tcg/tcg-runtime.h
@@ -63,6 +63,18 @@ DEF_HELPER_FLAGS_5(atomic_cmpxchgo_be, TCG_CALL_NO_WG,
                    i128, env, i64, i128, i128, i32)
 DEF_HELPER_FLAGS_5(atomic_cmpxchgo_le, TCG_CALL_NO_WG,
                    i128, env, i64, i128, i128, i32)
+DEF_HELPER_FLAGS_4(atomic_xchgo_be, TCG_CALL_NO_WG,
+                   i128, env, i64, i128, i32)
+DEF_HELPER_FLAGS_4(atomic_xchgo_le, TCG_CALL_NO_WG,
+                   i128, env, i64, i128, i32)
+DEF_HELPER_FLAGS_4(atomic_fetch_ando_be, TCG_CALL_NO_WG,
+                   i128, env, i64, i128, i32)
+DEF_HELPER_FLAGS_4(atomic_fetch_ando_le, TCG_CALL_NO_WG,
+                   i128, env, i64, i128, i32)
+DEF_HELPER_FLAGS_4(atomic_fetch_oro_be, TCG_CALL_NO_WG,
+                   i128, env, i64, i128, i32)
+DEF_HELPER_FLAGS_4(atomic_fetch_oro_le, TCG_CALL_NO_WG,
+                   i128, env, i64, i128, i32)
 #endif
 
 DEF_HELPER_FLAGS_5(nonatomic_cmpxchgo, TCG_CALL_NO_WG,
diff --git a/include/tcg/tcg-op-common.h b/include/tcg/tcg-op-common.h
index e1071adebf..f752ef440b 100644
--- a/include/tcg/tcg-op-common.h
+++ b/include/tcg/tcg-op-common.h
@@ -344,6 +344,8 @@ void tcg_gen_atomic_xchg_i32_chk(TCGv_i32, TCGTemp *, TCGv_i32,
                                  TCGArg, MemOp, TCGType);
 void tcg_gen_atomic_xchg_i64_chk(TCGv_i64, TCGTemp *, TCGv_i64,
                                  TCGArg, MemOp, TCGType);
+void tcg_gen_atomic_xchg_i128_chk(TCGv_i128, TCGTemp *, TCGv_i128,
+                                  TCGArg, MemOp, TCGType);
 
 void tcg_gen_atomic_fetch_add_i32_chk(TCGv_i32, TCGTemp *, TCGv_i32,
                                       TCGArg, MemOp, TCGType);
@@ -411,6 +413,11 @@ void tcg_gen_atomic_umax_fetch_i32_chk(TCGv_i32, TCGTemp *, TCGv_i32,
 void tcg_gen_atomic_umax_fetch_i64_chk(TCGv_i64, TCGTemp *, TCGv_i64,
                                        TCGArg, MemOp, TCGType);
 
+void tcg_gen_atomic_fetch_and_i128_chk(TCGv_i128, TCGTemp *, TCGv_i128,
+                                       TCGArg, MemOp, TCGType);
+void tcg_gen_atomic_fetch_or_i128_chk(TCGv_i128, TCGTemp *, TCGv_i128,
+                                      TCGArg, MemOp, TCGType);
+
 /* Vector ops */
 
 void tcg_gen_mov_vec(TCGv_vec, TCGv_vec);
diff --git a/include/tcg/tcg-op.h b/include/tcg/tcg-op.h
index c912578fdd..232733cb71 100644
--- a/include/tcg/tcg-op.h
+++ b/include/tcg/tcg-op.h
@@ -134,13 +134,16 @@ DEF_ATOMIC3(tcg_gen_nonatomic_cmpxchg, i128)
 
 DEF_ATOMIC2(tcg_gen_atomic_xchg, i32)
 DEF_ATOMIC2(tcg_gen_atomic_xchg, i64)
+DEF_ATOMIC2(tcg_gen_atomic_xchg, i128)
 
 DEF_ATOMIC2(tcg_gen_atomic_fetch_add, i32)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_add, i64)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_and, i32)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_and, i64)
+DEF_ATOMIC2(tcg_gen_atomic_fetch_and, i128)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_or, i32)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_or, i64)
+DEF_ATOMIC2(tcg_gen_atomic_fetch_or, i128)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_xor, i32)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_xor, i64)
 DEF_ATOMIC2(tcg_gen_atomic_fetch_smin, i32)
diff --git a/tcg/tcg-op-ldst.c b/tcg/tcg-op-ldst.c
index 548496002d..67c15fd4d0 100644
--- a/tcg/tcg-op-ldst.c
+++ b/tcg/tcg-op-ldst.c
@@ -801,6 +801,8 @@ typedef void (*gen_atomic_op_i32)(TCGv_i32, TCGv_env, TCGv_i64,
                                   TCGv_i32, TCGv_i32);
 typedef void (*gen_atomic_op_i64)(TCGv_i64, TCGv_env, TCGv_i64,
                                   TCGv_i64, TCGv_i32);
+typedef void (*gen_atomic_op_i128)(TCGv_i128, TCGv_env, TCGv_i64,
+                                   TCGv_i128, TCGv_i32);
 
 #ifdef CONFIG_ATOMIC64
 # define WITH_ATOMIC64(X) X,
@@ -1201,6 +1203,94 @@ static void do_atomic_op_i64(TCGv_i64 ret, TCGTemp *addr, TCGv_i64 val,
     }
 }
 
+static void do_nonatomic_op_i128(TCGv_i128 ret, TCGTemp *addr, TCGv_i128 val,
+                                 TCGArg idx, MemOp memop, bool new_val,
+                                 void (*gen)(TCGv_i64, TCGv_i64, TCGv_i64))
+{
+    TCGv_i128 t = tcg_temp_ebb_new_i128();
+    TCGv_i128 r = tcg_temp_ebb_new_i128();
+
+    tcg_gen_qemu_ld_i128_int(r, addr, idx, memop);
+    gen(TCGV128_LOW(t), TCGV128_LOW(r), TCGV128_LOW(val));
+    gen(TCGV128_HIGH(t), TCGV128_HIGH(r), TCGV128_HIGH(val));
+    tcg_gen_qemu_st_i128_int(t, addr, idx, memop);
+
+    tcg_gen_mov_i128(ret, r);
+    tcg_temp_free_i128(t);
+    tcg_temp_free_i128(r);
+}
+
+static void do_atomic_op_i128(TCGv_i128 ret, TCGTemp *addr, TCGv_i128 val,
+                              TCGArg idx, MemOp memop, void * const table[])
+{
+    gen_atomic_op_i128 gen = table[memop & (MO_SIZE | MO_BSWAP)];
+
+    if (gen) {
+        MemOpIdx oi = make_memop_idx(memop & ~MO_SIGN, idx);
+        TCGv_i64 a64 = maybe_extend_addr64(addr);
+        gen(ret, tcg_env, a64, val, tcg_constant_i32(oi));
+        maybe_free_addr64(a64);
+        return;
+    }
+
+    gen_helper_exit_atomic(tcg_env);
+    /* Produce a result */
+    tcg_gen_movi_i64(TCGV128_LOW(ret), 0);
+    tcg_gen_movi_i64(TCGV128_HIGH(ret), 0);
+}
+
+#define GEN_ATOMIC_HELPER128(NAME, OP, NEW)                             \
+static void * const table_##NAME[(MO_SIZE | MO_BSWAP) + 1] = {          \
+    [MO_8] = gen_helper_atomic_##NAME##b,                               \
+    [MO_16 | MO_LE] = gen_helper_atomic_##NAME##w_le,                   \
+    [MO_16 | MO_BE] = gen_helper_atomic_##NAME##w_be,                   \
+    [MO_32 | MO_LE] = gen_helper_atomic_##NAME##l_le,                   \
+    [MO_32 | MO_BE] = gen_helper_atomic_##NAME##l_be,                   \
+    WITH_ATOMIC64([MO_64 | MO_LE] = gen_helper_atomic_##NAME##q_le)     \
+    WITH_ATOMIC64([MO_64 | MO_BE] = gen_helper_atomic_##NAME##q_be)     \
+    WITH_ATOMIC128([MO_128 | MO_LE] = gen_helper_atomic_##NAME##o_le)   \
+    WITH_ATOMIC128([MO_128 | MO_BE] = gen_helper_atomic_##NAME##o_be)   \
+};                                                                      \
+void tcg_gen_atomic_##NAME##_i32_chk(TCGv_i32 ret, TCGTemp *addr,       \
+                                     TCGv_i32 val, TCGArg idx,          \
+                                     MemOp memop, TCGType addr_type)    \
+{                                                                       \
+    tcg_debug_assert(addr_type == tcg_ctx->addr_type);                  \
+    tcg_debug_assert((memop & MO_SIZE) <= MO_32);                       \
+    if (tcg_ctx->gen_tb->cflags & CF_PARALLEL) {                        \
+        do_atomic_op_i32(ret, addr, val, idx, memop, table_##NAME);     \
+    } else {                                                            \
+        do_nonatomic_op_i32(ret, addr, val, idx, memop, NEW,            \
+                            tcg_gen_##OP##_i32);                        \
+    }                                                                   \
+}                                                                       \
+void tcg_gen_atomic_##NAME##_i64_chk(TCGv_i64 ret, TCGTemp *addr,       \
+                                     TCGv_i64 val, TCGArg idx,          \
+                                     MemOp memop, TCGType addr_type)    \
+{                                                                       \
+    tcg_debug_assert(addr_type == tcg_ctx->addr_type);                  \
+    tcg_debug_assert((memop & MO_SIZE) <= MO_64);                       \
+    if (tcg_ctx->gen_tb->cflags & CF_PARALLEL) {                        \
+        do_atomic_op_i64(ret, addr, val, idx, memop, table_##NAME);     \
+    } else {                                                            \
+        do_nonatomic_op_i64(ret, addr, val, idx, memop, NEW,            \
+                            tcg_gen_##OP##_i64);                        \
+    }                                                                   \
+}                                                                       \
+void tcg_gen_atomic_##NAME##_i128_chk(TCGv_i128 ret, TCGTemp *addr,     \
+                                      TCGv_i128 val, TCGArg idx,        \
+                                      MemOp memop, TCGType addr_type)   \
+{                                                                       \
+    tcg_debug_assert(addr_type == tcg_ctx->addr_type);                  \
+    tcg_debug_assert((memop & MO_SIZE) == MO_128);                      \
+    if (tcg_ctx->gen_tb->cflags & CF_PARALLEL) {                        \
+        do_atomic_op_i128(ret, addr, val, idx, memop, table_##NAME);    \
+    } else {                                                            \
+        do_nonatomic_op_i128(ret, addr, val, idx, memop, NEW,           \
+                             tcg_gen_##OP##_i64);                       \
+    }                                                                   \
+}
+
 #define GEN_ATOMIC_HELPER(NAME, OP, NEW)                                \
 static void * const table_##NAME[(MO_SIZE | MO_BSWAP) + 1] = {          \
     [MO_8] = gen_helper_atomic_##NAME##b,                               \
@@ -1239,8 +1329,8 @@ void tcg_gen_atomic_##NAME##_i64_chk(TCGv_i64 ret, TCGTemp *addr,       \
 }
 
 GEN_ATOMIC_HELPER(fetch_add, add, 0)
-GEN_ATOMIC_HELPER(fetch_and, and, 0)
-GEN_ATOMIC_HELPER(fetch_or, or, 0)
+GEN_ATOMIC_HELPER128(fetch_and, and, 0)
+GEN_ATOMIC_HELPER128(fetch_or, or, 0)
 GEN_ATOMIC_HELPER(fetch_xor, xor, 0)
 GEN_ATOMIC_HELPER(fetch_smin, smin, 0)
 GEN_ATOMIC_HELPER(fetch_umin, umin, 0)
@@ -1266,6 +1356,7 @@ static void tcg_gen_mov2_i64(TCGv_i64 r, TCGv_i64 a, TCGv_i64 b)
     tcg_gen_mov_i64(r, b);
 }
 
-GEN_ATOMIC_HELPER(xchg, mov2, 0)
+GEN_ATOMIC_HELPER128(xchg, mov2, 0)
 
 #undef GEN_ATOMIC_HELPER
+#undef GEN_ATOMIC_HELPER128
diff --git a/accel/tcg/atomic_common.c.inc b/accel/tcg/atomic_common.c.inc
index 6056598c23..bca93a0ac4 100644
--- a/accel/tcg/atomic_common.c.inc
+++ b/accel/tcg/atomic_common.c.inc
@@ -122,5 +122,14 @@ GEN_ATOMIC_HELPERS(umax_fetch)
 
 GEN_ATOMIC_HELPERS(xchg)
 
+#if HAVE_CMPXCHG128
+ATOMIC_HELPER(xchgo_be, Int128)
+ATOMIC_HELPER(xchgo_le, Int128)
+ATOMIC_HELPER(fetch_ando_be, Int128)
+ATOMIC_HELPER(fetch_ando_le, Int128)
+ATOMIC_HELPER(fetch_oro_be, Int128)
+ATOMIC_HELPER(fetch_oro_le, Int128)
+#endif
+
 #undef ATOMIC_HELPER
 #undef GEN_ATOMIC_HELPERS
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 5/7] target/arm: Rename isar_feature_aa64_atomics
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
                   ` (3 preceding siblings ...)
  2025-08-15 12:26 ` [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128 Richard Henderson
@ 2025-08-15 12:26 ` Richard Henderson
  2025-08-19 14:27   ` Peter Maydell
  2025-08-15 12:26 ` [PATCH 6/7] target/arm: Implement FEAT_LSE128 Richard Henderson
                   ` (2 subsequent siblings)
  7 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel

This is FEAT_LSE -- rename the predicate to match.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu-features.h      |  2 +-
 linux-user/elfload.c           |  2 +-
 target/arm/tcg/translate-a64.c | 24 ++++++++++++------------
 3 files changed, 14 insertions(+), 14 deletions(-)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index 5876162428..e3d4c3d382 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -406,7 +406,7 @@ static inline bool isar_feature_aa64_crc32(const ARMISARegisters *id)
     return FIELD_EX64_IDREG(id, ID_AA64ISAR0, CRC32) != 0;
 }
 
-static inline bool isar_feature_aa64_atomics(const ARMISARegisters *id)
+static inline bool isar_feature_aa64_lse(const ARMISARegisters *id)
 {
     return FIELD_EX64_IDREG(id, ID_AA64ISAR0, ATOMIC) != 0;
 }
diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index ea214105ff..9f36ec06a4 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -862,7 +862,7 @@ uint32_t get_elf_hwcap(void)
     GET_FEATURE_ID(aa64_sm3, ARM_HWCAP_A64_SM3);
     GET_FEATURE_ID(aa64_sm4, ARM_HWCAP_A64_SM4);
     GET_FEATURE_ID(aa64_fp16, ARM_HWCAP_A64_FPHP | ARM_HWCAP_A64_ASIMDHP);
-    GET_FEATURE_ID(aa64_atomics, ARM_HWCAP_A64_ATOMICS);
+    GET_FEATURE_ID(aa64_lse, ARM_HWCAP_A64_ATOMICS);
     GET_FEATURE_ID(aa64_lse2, ARM_HWCAP_A64_USCAT);
     GET_FEATURE_ID(aa64_rdm, ARM_HWCAP_A64_ASIMDRDM);
     GET_FEATURE_ID(aa64_dp, ARM_HWCAP_A64_ASIMDDP);
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index dbf47595db..d0639e29cf 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -3237,7 +3237,7 @@ static bool trans_LDXP(DisasContext *s, arg_stxr *a)
 
 static bool trans_CASP(DisasContext *s, arg_CASP *a)
 {
-    if (!dc_isar_feature(aa64_atomics, s)) {
+    if (!dc_isar_feature(aa64_lse, s)) {
         return false;
     }
     if (((a->rt | a->rs) & 1) != 0) {
@@ -3250,7 +3250,7 @@ static bool trans_CASP(DisasContext *s, arg_CASP *a)
 
 static bool trans_CAS(DisasContext *s, arg_CAS *a)
 {
-    if (!dc_isar_feature(aa64_atomics, s)) {
+    if (!dc_isar_feature(aa64_lse, s)) {
         return false;
     }
     gen_compare_and_swap(s, a->rs, a->rt, a->rn, a->sz);
@@ -3743,15 +3743,15 @@ static bool do_atomic_ld(DisasContext *s, arg_atomic *a, AtomicThreeOpFn *fn,
     return true;
 }
 
-TRANS_FEAT(LDADD, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_add_i64, 0, false)
-TRANS_FEAT(LDCLR, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_and_i64, 0, true)
-TRANS_FEAT(LDEOR, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_xor_i64, 0, false)
-TRANS_FEAT(LDSET, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_or_i64, 0, false)
-TRANS_FEAT(LDSMAX, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_smax_i64, MO_SIGN, false)
-TRANS_FEAT(LDSMIN, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_smin_i64, MO_SIGN, false)
-TRANS_FEAT(LDUMAX, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_umax_i64, 0, false)
-TRANS_FEAT(LDUMIN, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_fetch_umin_i64, 0, false)
-TRANS_FEAT(SWP, aa64_atomics, do_atomic_ld, a, tcg_gen_atomic_xchg_i64, 0, false)
+TRANS_FEAT(LDADD, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_add_i64, 0, false)
+TRANS_FEAT(LDCLR, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_and_i64, 0, true)
+TRANS_FEAT(LDEOR, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_xor_i64, 0, false)
+TRANS_FEAT(LDSET, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_or_i64, 0, false)
+TRANS_FEAT(LDSMAX, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_smax_i64, MO_SIGN, false)
+TRANS_FEAT(LDSMIN, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_smin_i64, MO_SIGN, false)
+TRANS_FEAT(LDUMAX, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_umax_i64, 0, false)
+TRANS_FEAT(LDUMIN, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_umin_i64, 0, false)
+TRANS_FEAT(SWP, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_xchg_i64, 0, false)
 
 static bool trans_LDAPR(DisasContext *s, arg_LDAPR *a)
 {
@@ -3759,7 +3759,7 @@ static bool trans_LDAPR(DisasContext *s, arg_LDAPR *a)
     TCGv_i64 clean_addr;
     MemOp mop;
 
-    if (!dc_isar_feature(aa64_atomics, s) ||
+    if (!dc_isar_feature(aa64_lse, s) ||
         !dc_isar_feature(aa64_rcpc_8_3, s)) {
         return false;
     }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 6/7] target/arm: Implement FEAT_LSE128
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
                   ` (4 preceding siblings ...)
  2025-08-15 12:26 ` [PATCH 5/7] target/arm: Rename isar_feature_aa64_atomics Richard Henderson
@ 2025-08-15 12:26 ` Richard Henderson
  2025-08-19 14:34   ` Peter Maydell
  2025-08-15 12:26 ` [PATCH 7/7] target/arm: Enable FEAT_LSE128 for -cpu max Richard Henderson
  2025-08-26 11:31 ` [PATCH 0/7] target/arm: Implement FEAT_LSE128 Peter Maydell
  7 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel

This feature contains the LDCLRP, LDSETP, and SWPP instructions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu-features.h      |  5 ++++
 target/arm/tcg/translate-a64.c | 49 ++++++++++++++++++++++++++++++++++
 target/arm/tcg/a64.decode      |  7 +++++
 3 files changed, 61 insertions(+)

diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
index e3d4c3d382..182b301c86 100644
--- a/target/arm/cpu-features.h
+++ b/target/arm/cpu-features.h
@@ -411,6 +411,11 @@ static inline bool isar_feature_aa64_lse(const ARMISARegisters *id)
     return FIELD_EX64_IDREG(id, ID_AA64ISAR0, ATOMIC) != 0;
 }
 
+static inline bool isar_feature_aa64_lse128(const ARMISARegisters *id)
+{
+    return FIELD_EX64_IDREG(id, ID_AA64ISAR0, ATOMIC) >= 3;
+}
+
 static inline bool isar_feature_aa64_rdm(const ARMISARegisters *id)
 {
     return FIELD_EX64_IDREG(id, ID_AA64ISAR0, RDM) != 0;
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index d0639e29cf..976bf4df32 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -3753,6 +3753,55 @@ TRANS_FEAT(LDUMAX, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_umax_i64, 0,
 TRANS_FEAT(LDUMIN, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_fetch_umin_i64, 0, false)
 TRANS_FEAT(SWP, aa64_lse, do_atomic_ld, a, tcg_gen_atomic_xchg_i64, 0, false)
 
+typedef void Atomic128ThreeOpFn(TCGv_i128, TCGv_i64, TCGv_i128, TCGArg, MemOp);
+
+static bool do_atomic128_ld(DisasContext *s, arg_atomic128 *a,
+                            Atomic128ThreeOpFn *fn, bool invert)
+{
+    MemOp mop;
+    int rlo, rhi;
+    TCGv_i64 clean_addr, tlo, thi;
+    TCGv_i128 t16;
+
+    if (a->rt == 31 || a->rt2 == 31 || a->rt == a->rt2) {
+        return false;
+    }
+    if (a->rn == 31) {
+        gen_check_sp_alignment(s);
+    }
+    mop = check_atomic_align(s, a->rn, MO_128);
+    clean_addr = gen_mte_check1(s, cpu_reg_sp(s, a->rn), false,
+                                a->rn != 31, mop);
+
+    rlo = (s->be_data == MO_LE ? a->rt : a->rt2);
+    rhi = (s->be_data == MO_LE ? a->rt2 : a->rt);
+
+    tlo = read_cpu_reg(s, rlo, true);
+    thi = read_cpu_reg(s, rhi, true);
+    if (invert) {
+        tcg_gen_not_i64(tlo, tlo);
+        tcg_gen_not_i64(thi, thi);
+    }
+    /*
+     * The tcg atomic primitives are all full barriers.  Therefore we
+     * can ignore the Acquire and Release bits of this instruction.
+     */
+    t16 = tcg_temp_new_i128();
+    tcg_gen_concat_i64_i128(t16, tlo, thi);
+
+    fn(t16, clean_addr, t16, get_mem_index(s), mop);
+
+    tcg_gen_extr_i128_i64(cpu_reg(s, rlo), cpu_reg(s, rhi), t16);
+    return true;
+}
+
+TRANS_FEAT(LDCLRP, aa64_lse128, do_atomic128_ld,
+           a, tcg_gen_atomic_fetch_and_i128, true)
+TRANS_FEAT(LDSETP, aa64_lse128, do_atomic128_ld,
+           a, tcg_gen_atomic_fetch_or_i128, false)
+TRANS_FEAT(SWPP, aa64_lse128, do_atomic128_ld,
+           a, tcg_gen_atomic_xchg_i128, false)
+
 static bool trans_LDAPR(DisasContext *s, arg_LDAPR *a)
 {
     bool iss_sf = ldst_iss_sf(a->sz, false, false);
diff --git a/target/arm/tcg/a64.decode b/target/arm/tcg/a64.decode
index 8c798cde2b..70ed9610af 100644
--- a/target/arm/tcg/a64.decode
+++ b/target/arm/tcg/a64.decode
@@ -536,6 +536,13 @@ SWP             .. 111 0 00 . . 1 ..... 1000 00 ..... ..... @atomic
 
 LDAPR           sz:2 111 0 00 1 0 1 11111 1100 00 rn:5 rt:5
 
+# Atomic 128-bit memory operations
+&atomic128      rn rt rt2 a r
+@atomic128      ........ a:1 r:1 . rt2:5 ...... rn:5 rt:5   &atomic128
+LDCLRP          00011001 .   .   1 ..... 000100 ..... ..... @atomic128
+LDSETP          00011001 .   .   1 ..... 001100 ..... ..... @atomic128
+SWPP            00011001 .   .   1 ..... 100000 ..... ..... @atomic128
+
 # Load/store register (pointer authentication)
 
 # LDRA immediate is 10 bits signed and scaled, but the bits aren't all contiguous
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* [PATCH 7/7] target/arm: Enable FEAT_LSE128 for -cpu max
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
                   ` (5 preceding siblings ...)
  2025-08-15 12:26 ` [PATCH 6/7] target/arm: Implement FEAT_LSE128 Richard Henderson
@ 2025-08-15 12:26 ` Richard Henderson
  2025-08-19 14:27   ` Peter Maydell
  2025-08-26 11:31 ` [PATCH 0/7] target/arm: Implement FEAT_LSE128 Peter Maydell
  7 siblings, 1 reply; 16+ messages in thread
From: Richard Henderson @ 2025-08-15 12:26 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/elfload.c          | 1 +
 target/arm/tcg/cpu64.c        | 2 +-
 docs/system/arm/emulation.rst | 1 +
 3 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/linux-user/elfload.c b/linux-user/elfload.c
index 9f36ec06a4..85f67d3d44 100644
--- a/linux-user/elfload.c
+++ b/linux-user/elfload.c
@@ -923,6 +923,7 @@ uint64_t get_elf_hwcap2(void)
     GET_FEATURE_ID(aa64_sme_b16b16, ARM_HWCAP2_A64_SME_B16B16);
     GET_FEATURE_ID(aa64_sme_f16f16, ARM_HWCAP2_A64_SME_F16F16);
     GET_FEATURE_ID(aa64_sve_b16b16, ARM_HWCAP2_A64_SVE_B16B16);
+    GET_FEATURE_ID(aa64_lse128, ARM_HWCAP2_A64_LSE128);
 
     return hwcaps;
 }
diff --git a/target/arm/tcg/cpu64.c b/target/arm/tcg/cpu64.c
index 35cddbafa4..370818d11b 100644
--- a/target/arm/tcg/cpu64.c
+++ b/target/arm/tcg/cpu64.c
@@ -1145,7 +1145,7 @@ void aarch64_max_tcg_initfn(Object *obj)
     t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1);     /* FEAT_SHA1 */
     t = FIELD_DP64(t, ID_AA64ISAR0, SHA2, 2);     /* FEAT_SHA512 */
     t = FIELD_DP64(t, ID_AA64ISAR0, CRC32, 1);    /* FEAT_CRC32 */
-    t = FIELD_DP64(t, ID_AA64ISAR0, ATOMIC, 2);   /* FEAT_LSE */
+    t = FIELD_DP64(t, ID_AA64ISAR0, ATOMIC, 3);   /* FEAT_LSE, FEAT_LSE128 */
     t = FIELD_DP64(t, ID_AA64ISAR0, RDM, 1);      /* FEAT_RDM */
     t = FIELD_DP64(t, ID_AA64ISAR0, SHA3, 1);     /* FEAT_SHA3 */
     t = FIELD_DP64(t, ID_AA64ISAR0, SM3, 1);      /* FEAT_SM3 */
diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 890dc6fee2..a5c0e61393 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -88,6 +88,7 @@ the following architecture extensions:
 - FEAT_LRCPC2 (Load-acquire RCpc instructions v2)
 - FEAT_LSE (Large System Extensions)
 - FEAT_LSE2 (Large System Extensions v2)
+- FEAT_LSE128 (128-bit Atomics)
 - FEAT_LVA (Large Virtual Address space)
 - FEAT_MixedEnd (Mixed-endian support)
 - FEAT_MixedEndEL0 (Mixed-endian support at EL0)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: [PATCH 1/7] qemu/atomic: Finish renaming atomic128-cas.h headers
  2025-08-15 12:26 ` [PATCH 1/7] qemu/atomic: Finish renaming atomic128-cas.h headers Richard Henderson
@ 2025-08-19 14:01   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-19 14:01 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel, qemu-stable

On Fri, 15 Aug 2025 at 13:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The aarch64 header was not renamed with the others, meaning it
> was skipped in favor of the generic version.
>
> Cc: qemu-stable@nongnu.org
> Fixes: 15606965400b ("qemu/atomic: Rename atomic128-cas.h headers using .h.inc suffix")
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  .../include/aarch64/host/{atomic128-cas.h => atomic128-cas.h.inc} | 0
>  1 file changed, 0 insertions(+), 0 deletions(-)
>  rename host/include/aarch64/host/{atomic128-cas.h => atomic128-cas.h.inc} (100%)
>
> diff --git a/host/include/aarch64/host/atomic128-cas.h b/host/include/aarch64/host/atomic128-cas.h.inc
> similarity index 100%
> rename from host/include/aarch64/host/atomic128-cas.h
> rename to host/include/aarch64/host/atomic128-cas.h.inc
> --
> 2.43.0

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 2/7] qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or
  2025-08-15 12:26 ` [PATCH 2/7] qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or Richard Henderson
@ 2025-08-19 14:13   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-19 14:13 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, 15 Aug 2025 at 13:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  host/include/aarch64/host/atomic128-cas.h.inc | 57 +++++++++++
>  host/include/generic/host/atomic128-cas.h.inc | 96 +++++++++++++++++++
>  2 files changed, 153 insertions(+)
>
> diff --git a/host/include/aarch64/host/atomic128-cas.h.inc b/host/include/aarch64/host/atomic128-cas.h.inc
> index 991da4ef54..aec27df182 100644
> --- a/host/include/aarch64/host/atomic128-cas.h.inc
> +++ b/host/include/aarch64/host/atomic128-cas.h.inc
> @@ -38,6 +38,63 @@ static inline Int128 atomic16_cmpxchg(Int128 *ptr, Int128 cmp, Int128 new)
>      return int128_make128(oldl, oldh);
>  }
>
> +static inline Int128 atomic16_xchg(Int128 *ptr, Int128 new)
> +{
> +    uint64_t newl = int128_getlo(new), newh = int128_gethi(new);
> +    uint64_t oldl, oldh;
> +    uint32_t tmp;
> +
> +    asm("0: ldaxp %[oldl], %[oldh], %[mem]\n\t"

This looks like it won't do the right thing on a big-endian
host, but nor will the existing atomic16_cmpxchg(), so
I assume we don't care about that particular unicorn.

> +        "stlxp %w[tmp], %[newl], %[newh], %[mem]\n\t"
> +        "cbnz %w[tmp], 0b"
> +        : [mem] "+m"(*ptr), [tmp] "=&r"(tmp),
> +          [oldl] "=&r"(oldl), [oldh] "=&r"(oldh)
> +        : [newl] "r"(newl), [newh] "r"(newh)
> +        : "memory");
> +    return int128_make128(oldl, oldh);
> +}

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 3/7] accel/tcg: Add cpu_atomic_*_mmu for 16-byte xchg, fetch_and, fetch_or
  2025-08-15 12:26 ` [PATCH 3/7] accel/tcg: Add cpu_atomic_*_mmu for 16-byte " Richard Henderson
@ 2025-08-19 14:15   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-19 14:15 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, 15 Aug 2025 at 13:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---


Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg, fetch_and, fetch_or}_i128
  2025-08-15 12:26 ` [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128 Richard Henderson
@ 2025-08-19 14:17   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-19 14:17 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, 15 Aug 2025 at 13:29, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/tcg-runtime.h       | 12 +++++
>  include/tcg/tcg-op-common.h   |  7 +++
>  include/tcg/tcg-op.h          |  3 ++
>  tcg/tcg-op-ldst.c             | 97 +++++++++++++++++++++++++++++++++--
>  accel/tcg/atomic_common.c.inc |  9 ++++
>  5 files changed, 125 insertions(+), 3 deletions(-)
>

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 5/7] target/arm: Rename isar_feature_aa64_atomics
  2025-08-15 12:26 ` [PATCH 5/7] target/arm: Rename isar_feature_aa64_atomics Richard Henderson
@ 2025-08-19 14:27   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-19 14:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, 15 Aug 2025 at 13:29, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This is FEAT_LSE -- rename the predicate to match.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu-features.h      |  2 +-
>  linux-user/elfload.c           |  2 +-
>  target/arm/tcg/translate-a64.c | 24 ++++++++++++------------
>  3 files changed, 14 insertions(+), 14 deletions(-)
>
> diff --git a/target/arm/cpu-features.h b/target/arm/cpu-features.h
> index 5876162428..e3d4c3d382 100644
> --- a/target/arm/cpu-features.h
> +++ b/target/arm/cpu-features.h
> @@ -406,7 +406,7 @@ static inline bool isar_feature_aa64_crc32(const ARMISARegisters *id)
>      return FIELD_EX64_IDREG(id, ID_AA64ISAR0, CRC32) != 0;
>  }
>
> -static inline bool isar_feature_aa64_atomics(const ARMISARegisters *id)
> +static inline bool isar_feature_aa64_lse(const ARMISARegisters *id)
>  {
>      return FIELD_EX64_IDREG(id, ID_AA64ISAR0, ATOMIC) != 0;
>  }

The Arm ARM says that FEAT_LSE is for ATOMIC >= 2.
Older versions of the Arm ARM also say that the old name
ARMv8.1-Atomics was for >= 2, though, so this is something
we've always got wrong.

I just checked and all the CPUs we define do correctly set
the ATOMIC field to either 0 or 2 and not the reserved value
1, so we can add another patch that corrects this feature
function to do the correct check.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 7/7] target/arm: Enable FEAT_LSE128 for -cpu max
  2025-08-15 12:26 ` [PATCH 7/7] target/arm: Enable FEAT_LSE128 for -cpu max Richard Henderson
@ 2025-08-19 14:27   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-19 14:27 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, 15 Aug 2025 at 13:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  linux-user/elfload.c          | 1 +
>  target/arm/tcg/cpu64.c        | 2 +-
>  docs/system/arm/emulation.rst | 1 +
>  3 files changed, 3 insertions(+), 1 deletion(-)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 6/7] target/arm: Implement FEAT_LSE128
  2025-08-15 12:26 ` [PATCH 6/7] target/arm: Implement FEAT_LSE128 Richard Henderson
@ 2025-08-19 14:34   ` Peter Maydell
  0 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-19 14:34 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, 15 Aug 2025 at 13:29, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This feature contains the LDCLRP, LDSETP, and SWPP instructions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu-features.h      |  5 ++++
>  target/arm/tcg/translate-a64.c | 49 ++++++++++++++++++++++++++++++++++
>  target/arm/tcg/a64.decode      |  7 +++++
>  3 files changed, 61 insertions(+)

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: [PATCH 0/7] target/arm: Implement FEAT_LSE128
  2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
                   ` (6 preceding siblings ...)
  2025-08-15 12:26 ` [PATCH 7/7] target/arm: Enable FEAT_LSE128 for -cpu max Richard Henderson
@ 2025-08-26 11:31 ` Peter Maydell
  7 siblings, 0 replies; 16+ messages in thread
From: Peter Maydell @ 2025-08-26 11:31 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Fri, 15 Aug 2025 at 13:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This extension has instructions for atomic 128-bit swap, fetch-and,
> and fetch-or.  This is fairly easy to implement with existing host
> support for 128-bit compare-and-swap.
>
> Unlike for 64-bit operations, I did not implement the multitude of
> atomic fetch-op and op-fetch functions.  Those can wait until there
> is a need for them.
>
>
> r~
>
>
> Richard Henderson (7):
>   qemu/atomic: Finish renaming atomic128-cas.h headers
>   qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or
>   accel/tcg: Add cpu_atomic_*_mmu for 16-byte xchg, fetch_and, fetch_or
>   tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128
>   target/arm: Rename isar_feature_aa64_atomics
>   target/arm: Implement FEAT_LSE128
>   target/arm: Enable FEAT_LSE128 for -cpu max



Applied to target-arm.next for 10.2, thanks.

-- PMM


^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2025-08-26 11:33 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-08-15 12:26 [PATCH 0/7] target/arm: Implement FEAT_LSE128 Richard Henderson
2025-08-15 12:26 ` [PATCH 1/7] qemu/atomic: Finish renaming atomic128-cas.h headers Richard Henderson
2025-08-19 14:01   ` Peter Maydell
2025-08-15 12:26 ` [PATCH 2/7] qemu/atomic: Add atomic16 primitives for xchg, fetch_and, fetch_or Richard Henderson
2025-08-19 14:13   ` Peter Maydell
2025-08-15 12:26 ` [PATCH 3/7] accel/tcg: Add cpu_atomic_*_mmu for 16-byte " Richard Henderson
2025-08-19 14:15   ` Peter Maydell
2025-08-15 12:26 ` [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg,fetch_and,fetch_or}_i128 Richard Henderson
2025-08-19 14:17   ` [PATCH 4/7] tcg: Add tcg_gen_atomic_{xchg, fetch_and, fetch_or}_i128 Peter Maydell
2025-08-15 12:26 ` [PATCH 5/7] target/arm: Rename isar_feature_aa64_atomics Richard Henderson
2025-08-19 14:27   ` Peter Maydell
2025-08-15 12:26 ` [PATCH 6/7] target/arm: Implement FEAT_LSE128 Richard Henderson
2025-08-19 14:34   ` Peter Maydell
2025-08-15 12:26 ` [PATCH 7/7] target/arm: Enable FEAT_LSE128 for -cpu max Richard Henderson
2025-08-19 14:27   ` Peter Maydell
2025-08-26 11:31 ` [PATCH 0/7] target/arm: Implement FEAT_LSE128 Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).