qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v2 00/13] Fixes for user-only munmap races
@ 2024-07-10  3:28 Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h Richard Henderson
                   ` (12 more replies)
  0 siblings, 13 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Supercedes: 20240702234155.2106399-1-richard.henderson@linaro.org
("[PATCH 0/2] target/arm: Fix unwind from dc zva and FEAT_MOPS")
Supercedes: 20240702234659.2106870-1-richard.henderson@linaro.org
("[PATCH 0/4] target/ppc: Cleanups for dcbz")

After looking at the first dc zva patch set again, I can see no
difference between the memset used by dc dva and the plain memory
accesses used by SVE and SME.  In all cases it's a host memory
access that might fault even after probe_access, due to a race.

So I've dropped memset_ra and memmove_ra, and instead expose the
basic set/clear_helper_retaddr interface.  This allows one set/clear
to cover entire loops, instead of trebling the overhead of each
individual access.

I've included the ppc dcbz cleanups, so that the final improvement
applies cleanly.

I've updated s390x, though it isn't as clean as I would like.

I've tidied the riscv use of tlb_vaddr_to_host, which Peter noticed.
The usage was incorrect in general.  There is no race condition
here because it still uses cpu_ld*_data_ra in the end and not a
bare host memory access.  But the ongoing work to improve riscv
vector memory instructions should take note.


r~


BALATON Zoltan (1):
  target/ppc/mem_helper.c: Remove a conditional from dcbz_common()

Richard Henderson (12):
  accel/tcg: Move {set,clear}_helper_retaddr to cpu_ldst.h
  target/arm: Use cpu_env in cpu_untagged_addr
  target/arm: Use set/clear_helper_retaddr in helper-a64.c
  target/arm: Use set/clear_helper_retaddr in SVE and SME helpers
  target/ppc: Hoist dcbz_size out of dcbz_common
  target/ppc: Split out helper_dbczl for 970
  target/ppc: Merge helper_{dcbz,dcbzep}
  target/ppc: Improve helper_dcbz for user-only
  target/s390x: Use user_or_likely in do_access_memset
  target/s390x: Use user_or_likely in access_memmove
  target/s390x: Use set/clear_helper_retaddr in mem_helper.c
  target/riscv: Simplify probing in vext_ldff

 accel/tcg/user-retaddr.h      |  28 ---------
 include/exec/cpu_ldst.h       |  34 +++++++++++
 target/arm/cpu.h              |   4 +-
 target/ppc/helper.h           |   6 +-
 accel/tcg/cpu-exec.c          |   3 -
 accel/tcg/user-exec.c         |   1 -
 target/arm/tcg/helper-a64.c   |  14 ++++-
 target/arm/tcg/sme_helper.c   |  16 ++++++
 target/arm/tcg/sve_helper.c   |  26 +++++++++
 target/ppc/mem_helper.c       |  51 +++++++++--------
 target/ppc/translate.c        |  24 ++++----
 target/riscv/vector_helper.c  |  34 +++++------
 target/s390x/tcg/mem_helper.c | 103 +++++++++++++++++++++-------------
 13 files changed, 219 insertions(+), 125 deletions(-)
 delete mode 100644 accel/tcg/user-retaddr.h

-- 
2.43.0



^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-12 12:48   ` Peter Maydell
  2024-07-10  3:28 ` [PATCH v2 02/13] target/arm: Use cpu_env in cpu_untagged_addr Richard Henderson
                   ` (11 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Use of these in helpers goes hand-in-hand with tlb_vaddr_to_host
and other probing functions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/user-retaddr.h | 28 ----------------------------
 include/exec/cpu_ldst.h  | 34 ++++++++++++++++++++++++++++++++++
 accel/tcg/cpu-exec.c     |  3 ---
 accel/tcg/user-exec.c    |  1 -
 4 files changed, 34 insertions(+), 32 deletions(-)
 delete mode 100644 accel/tcg/user-retaddr.h

diff --git a/accel/tcg/user-retaddr.h b/accel/tcg/user-retaddr.h
deleted file mode 100644
index e0f57e1994..0000000000
--- a/accel/tcg/user-retaddr.h
+++ /dev/null
@@ -1,28 +0,0 @@
-#ifndef ACCEL_TCG_USER_RETADDR_H
-#define ACCEL_TCG_USER_RETADDR_H
-
-#include "qemu/atomic.h"
-
-extern __thread uintptr_t helper_retaddr;
-
-static inline void set_helper_retaddr(uintptr_t ra)
-{
-    helper_retaddr = ra;
-    /*
-     * Ensure that this write is visible to the SIGSEGV handler that
-     * may be invoked due to a subsequent invalid memory operation.
-     */
-    signal_barrier();
-}
-
-static inline void clear_helper_retaddr(void)
-{
-    /*
-     * Ensure that previous memory operations have succeeded before
-     * removing the data visible to the signal handler.
-     */
-    signal_barrier();
-    helper_retaddr = 0;
-}
-
-#endif
diff --git a/include/exec/cpu_ldst.h b/include/exec/cpu_ldst.h
index 71009f84f5..dac12bd8eb 100644
--- a/include/exec/cpu_ldst.h
+++ b/include/exec/cpu_ldst.h
@@ -379,4 +379,38 @@ void *tlb_vaddr_to_host(CPUArchState *env, abi_ptr addr,
                         MMUAccessType access_type, int mmu_idx);
 #endif
 
+/*
+ * For user-only, helpers that use guest to host address translation
+ * must protect the actual host memory access by recording 'retaddr'
+ * for the signal handler.  This is required for a race condition in
+ * which another thread unmaps the page between a probe and the
+ * actual access.
+ */
+#ifdef CONFIG_USER_ONLY
+extern __thread uintptr_t helper_retaddr;
+
+static inline void set_helper_retaddr(uintptr_t ra)
+{
+    helper_retaddr = ra;
+    /*
+     * Ensure that this write is visible to the SIGSEGV handler that
+     * may be invoked due to a subsequent invalid memory operation.
+     */
+    signal_barrier();
+}
+
+static inline void clear_helper_retaddr(void)
+{
+    /*
+     * Ensure that previous memory operations have succeeded before
+     * removing the data visible to the signal handler.
+     */
+    signal_barrier();
+    helper_retaddr = 0;
+}
+#else
+#define set_helper_retaddr(ra)   do { } while (0)
+#define clear_helper_retaddr()   do { } while (0)
+#endif
+
 #endif /* CPU_LDST_H */
diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
index 6711b58e0b..0aa42591bf 100644
--- a/accel/tcg/cpu-exec.c
+++ b/accel/tcg/cpu-exec.c
@@ -41,9 +41,6 @@
 #include "tb-context.h"
 #include "internal-common.h"
 #include "internal-target.h"
-#if defined(CONFIG_USER_ONLY)
-#include "user-retaddr.h"
-#endif
 
 /* -icount align implementation. */
 
diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index 80d24540ed..7ddc47b0ba 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -33,7 +33,6 @@
 #include "tcg/tcg-ldst.h"
 #include "internal-common.h"
 #include "internal-target.h"
-#include "user-retaddr.h"
 
 __thread uintptr_t helper_retaddr;
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 02/13] target/arm: Use cpu_env in cpu_untagged_addr
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-12 12:49   ` Peter Maydell
  2024-07-10  3:28 ` [PATCH v2 03/13] target/arm: Use set/clear_helper_retaddr in helper-a64.c Richard Henderson
                   ` (10 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

In a completely artifical memset benchmark object_dynamic_cast_assert
dominates the profile, even above guest address resolution and
the underlying host memset.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/cpu.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index d8eb986a04..ccfb9349a3 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3309,8 +3309,8 @@ extern const uint64_t pred_esz_masks[5];
  */
 static inline target_ulong cpu_untagged_addr(CPUState *cs, target_ulong x)
 {
-    ARMCPU *cpu = ARM_CPU(cs);
-    if (cpu->env.tagged_addr_enable) {
+    CPUARMState *env = cpu_env(cs);
+    if (env->tagged_addr_enable) {
         /*
          * TBI is enabled for userspace but not kernelspace addresses.
          * Only clear the tag if bit 55 is clear.
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 03/13] target/arm: Use set/clear_helper_retaddr in helper-a64.c
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 02/13] target/arm: Use cpu_env in cpu_untagged_addr Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-12 12:53   ` Peter Maydell
  2024-07-10  3:28 ` [PATCH v2 04/13] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers Richard Henderson
                   ` (9 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Use these in helper_dc_dva and the FEAT_MOPS routines to
avoid a race condition with munmap in another thread.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/helper-a64.c | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
index 0ea8668ab4..c60d2a7ec9 100644
--- a/target/arm/tcg/helper-a64.c
+++ b/target/arm/tcg/helper-a64.c
@@ -928,6 +928,8 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
 
 void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
 {
+    uintptr_t ra = GETPC();
+
     /*
      * Implement DC ZVA, which zeroes a fixed-length block of memory.
      * Note that we do not implement the (architecturally mandated)
@@ -948,8 +950,6 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
 
 #ifndef CONFIG_USER_ONLY
     if (unlikely(!mem)) {
-        uintptr_t ra = GETPC();
-
         /*
          * Trap if accessing an invalid page.  DC_ZVA requires that we supply
          * the original pointer for an invalid page.  But watchpoints require
@@ -971,7 +971,9 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
     }
 #endif
 
+    set_helper_retaddr(ra);
     memset(mem, 0, blocklen);
+    clear_helper_retaddr();
 }
 
 void HELPER(unaligned_access)(CPUARMState *env, uint64_t addr,
@@ -1120,7 +1122,9 @@ static uint64_t set_step(CPUARMState *env, uint64_t toaddr,
     }
 #endif
     /* Easy case: just memset the host memory */
+    set_helper_retaddr(ra);
     memset(mem, data, setsize);
+    clear_helper_retaddr();
     return setsize;
 }
 
@@ -1163,7 +1167,9 @@ static uint64_t set_step_tags(CPUARMState *env, uint64_t toaddr,
     }
 #endif
     /* Easy case: just memset the host memory */
+    set_helper_retaddr(ra);
     memset(mem, data, setsize);
+    clear_helper_retaddr();
     mte_mops_set_tags(env, toaddr, setsize, *mtedesc);
     return setsize;
 }
@@ -1497,7 +1503,9 @@ static uint64_t copy_step(CPUARMState *env, uint64_t toaddr, uint64_t fromaddr,
     }
 #endif
     /* Easy case: just memmove the host memory */
+    set_helper_retaddr(ra);
     memmove(wmem, rmem, copysize);
+    clear_helper_retaddr();
     return copysize;
 }
 
@@ -1572,7 +1580,9 @@ static uint64_t copy_step_rev(CPUARMState *env, uint64_t toaddr,
      * Easy case: just memmove the host memory. Note that wmem and
      * rmem here point to the *last* byte to copy.
      */
+    set_helper_retaddr(ra);
     memmove(wmem - (copysize - 1), rmem - (copysize - 1), copysize);
+    clear_helper_retaddr();
     return copysize;
 }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 04/13] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (2 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 03/13] target/arm: Use set/clear_helper_retaddr in helper-a64.c Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-12 13:00   ` Peter Maydell
  2024-07-10  3:28 ` [PATCH v2 05/13] target/ppc/mem_helper.c: Remove a conditional from dcbz_common() Richard Henderson
                   ` (8 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Avoid a race condition with munmap in another thread.
Use around blocks that exclusively use "host_fn".
Keep the blocks as small as possible, but without setting
and clearing for every operation on one page.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/sme_helper.c | 16 ++++++++++++++++
 target/arm/tcg/sve_helper.c | 26 ++++++++++++++++++++++++++
 2 files changed, 42 insertions(+)

diff --git a/target/arm/tcg/sme_helper.c b/target/arm/tcg/sme_helper.c
index e2e0575039..ab40ced38f 100644
--- a/target/arm/tcg/sme_helper.c
+++ b/target/arm/tcg/sme_helper.c
@@ -517,6 +517,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
         clr_fn(za, 0, reg_off);
     }
 
+    set_helper_retaddr(ra);
+
     while (reg_off <= reg_last) {
         uint64_t pg = vg[reg_off >> 6];
         do {
@@ -529,6 +531,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
         } while (reg_off <= reg_last && (reg_off & 63));
     }
 
+    clear_helper_retaddr();
+
     /*
      * Use the slow path to manage the cross-page misalignment.
      * But we know this is RAM and cannot trap.
@@ -543,6 +547,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
         reg_last = info.reg_off_last[1];
         host = info.page[1].host;
 
+        set_helper_retaddr(ra);
+
         do {
             uint64_t pg = vg[reg_off >> 6];
             do {
@@ -554,6 +560,8 @@ void sme_ld1(CPUARMState *env, void *za, uint64_t *vg,
                 reg_off += esize;
             } while (reg_off & 63);
         } while (reg_off <= reg_last);
+
+        clear_helper_retaddr();
     }
 }
 
@@ -701,6 +709,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
     reg_last = info.reg_off_last[0];
     host = info.page[0].host;
 
+    set_helper_retaddr(ra);
+
     while (reg_off <= reg_last) {
         uint64_t pg = vg[reg_off >> 6];
         do {
@@ -711,6 +721,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
         } while (reg_off <= reg_last && (reg_off & 63));
     }
 
+    clear_helper_retaddr();
+
     /*
      * Use the slow path to manage the cross-page misalignment.
      * But we know this is RAM and cannot trap.
@@ -725,6 +737,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
         reg_last = info.reg_off_last[1];
         host = info.page[1].host;
 
+        set_helper_retaddr(ra);
+
         do {
             uint64_t pg = vg[reg_off >> 6];
             do {
@@ -734,6 +748,8 @@ void sme_st1(CPUARMState *env, void *za, uint64_t *vg,
                 reg_off += 1 << esz;
             } while (reg_off & 63);
         } while (reg_off <= reg_last);
+
+        clear_helper_retaddr();
     }
 }
 
diff --git a/target/arm/tcg/sve_helper.c b/target/arm/tcg/sve_helper.c
index dd49e67d7a..8d0af4bb1c 100644
--- a/target/arm/tcg/sve_helper.c
+++ b/target/arm/tcg/sve_helper.c
@@ -5738,6 +5738,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr,
     reg_last = info.reg_off_last[0];
     host = info.page[0].host;
 
+    set_helper_retaddr(retaddr);
+
     while (reg_off <= reg_last) {
         uint64_t pg = vg[reg_off >> 6];
         do {
@@ -5752,6 +5754,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr,
         } while (reg_off <= reg_last && (reg_off & 63));
     }
 
+    clear_helper_retaddr();
+
     /*
      * Use the slow path to manage the cross-page misalignment.
      * But we know this is RAM and cannot trap.
@@ -5771,6 +5775,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr,
         reg_last = info.reg_off_last[1];
         host = info.page[1].host;
 
+        set_helper_retaddr(retaddr);
+
         do {
             uint64_t pg = vg[reg_off >> 6];
             do {
@@ -5784,6 +5790,8 @@ void sve_ldN_r(CPUARMState *env, uint64_t *vg, const target_ulong addr,
                 mem_off += N << msz;
             } while (reg_off & 63);
         } while (reg_off <= reg_last);
+
+        clear_helper_retaddr();
     }
 }
 
@@ -6093,6 +6101,8 @@ void sve_ldnfff1_r(CPUARMState *env, void *vg, const target_ulong addr,
     reg_last = info.reg_off_last[0];
     host = info.page[0].host;
 
+    set_helper_retaddr(retaddr);
+
     do {
         uint64_t pg = *(uint64_t *)(vg + (reg_off >> 3));
         do {
@@ -6113,6 +6123,8 @@ void sve_ldnfff1_r(CPUARMState *env, void *vg, const target_ulong addr,
         } while (reg_off <= reg_last && (reg_off & 63));
     } while (reg_off <= reg_last);
 
+    clear_helper_retaddr();
+
     /*
      * MemSingleNF is allowed to fail for any reason.  We have special
      * code above to handle the first element crossing a page boundary.
@@ -6348,6 +6360,8 @@ void sve_stN_r(CPUARMState *env, uint64_t *vg, target_ulong addr,
     reg_last = info.reg_off_last[0];
     host = info.page[0].host;
 
+    set_helper_retaddr(retaddr);
+
     while (reg_off <= reg_last) {
         uint64_t pg = vg[reg_off >> 6];
         do {
@@ -6362,6 +6376,8 @@ void sve_stN_r(CPUARMState *env, uint64_t *vg, target_ulong addr,
         } while (reg_off <= reg_last && (reg_off & 63));
     }
 
+    clear_helper_retaddr();
+
     /*
      * Use the slow path to manage the cross-page misalignment.
      * But we know this is RAM and cannot trap.
@@ -6381,6 +6397,8 @@ void sve_stN_r(CPUARMState *env, uint64_t *vg, target_ulong addr,
         reg_last = info.reg_off_last[1];
         host = info.page[1].host;
 
+        set_helper_retaddr(retaddr);
+
         do {
             uint64_t pg = vg[reg_off >> 6];
             do {
@@ -6394,6 +6412,8 @@ void sve_stN_r(CPUARMState *env, uint64_t *vg, target_ulong addr,
                 mem_off += N << msz;
             } while (reg_off & 63);
         } while (reg_off <= reg_last);
+
+        clear_helper_retaddr();
     }
 }
 
@@ -6560,7 +6580,9 @@ void sve_ld1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
                     if (unlikely(info.flags & TLB_MMIO)) {
                         tlb_fn(env, &scratch, reg_off, addr, retaddr);
                     } else {
+                        set_helper_retaddr(retaddr);
                         host_fn(&scratch, reg_off, info.host);
+                        clear_helper_retaddr();
                     }
                 } else {
                     /* Element crosses the page boundary. */
@@ -6782,7 +6804,9 @@ void sve_ldff1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
                     goto fault;
                 }
 
+                set_helper_retaddr(retaddr);
                 host_fn(vd, reg_off, info.host);
+                clear_helper_retaddr();
             }
             reg_off += esize;
         } while (reg_off & 63);
@@ -6986,7 +7010,9 @@ void sve_st1_z(CPUARMState *env, void *vd, uint64_t *vg, void *vm,
     do {
         void *h = host[i];
         if (likely(h != NULL)) {
+            set_helper_retaddr(retaddr);
             host_fn(vd, reg_off, h);
+            clear_helper_retaddr();
         } else if ((vg[reg_off >> 6] >> (reg_off & 63)) & 1) {
             target_ulong addr = base + (off_fn(vm, reg_off) << scale);
             tlb_fn(env, vd, reg_off, addr, retaddr);
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 05/13] target/ppc/mem_helper.c: Remove a conditional from dcbz_common()
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (3 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 04/13] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common Richard Henderson
                   ` (7 subsequent siblings)
  12 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

From: BALATON Zoltan <balaton@eik.bme.hu>

Instead of passing a bool and select a value within dcbz_common() let
the callers pass in the right value to avoid this conditional
statement. On PPC dcbz is often used to zero memory and some code uses
it a lot. This change improves the run time of a test case that copies
memory with a dcbz call in every iteration from 6.23 to 5.83 seconds.

Signed-off-by: BALATON Zoltan <balaton@eik.bme.hu>
Message-Id: <20240622204833.5F7C74E6000@zero.eik.bme.hu>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/mem_helper.c | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index f88155ad45..361fd72226 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -271,12 +271,11 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb,
 }
 
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
-                        uint32_t opcode, bool epid, uintptr_t retaddr)
+                        uint32_t opcode, int mmu_idx, uintptr_t retaddr)
 {
     target_ulong mask, dcbz_size = env->dcache_line_size;
     uint32_t i;
     void *haddr;
-    int mmu_idx = epid ? PPC_TLB_EPID_STORE : ppc_env_mmu_index(env, false);
 
 #if defined(TARGET_PPC64)
     /* Check for dcbz vs dcbzl on 970 */
@@ -309,12 +308,12 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
 
 void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-    dcbz_common(env, addr, opcode, false, GETPC());
+    dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
 }
 
 void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-    dcbz_common(env, addr, opcode, true, GETPC());
+    dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC());
 }
 
 void helper_icbi(CPUPPCState *env, target_ulong addr)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (4 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 05/13] target/ppc/mem_helper.c: Remove a conditional from dcbz_common() Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10 12:11   ` BALATON Zoltan
  2024-07-10  3:28 ` [PATCH v2 07/13] target/ppc: Split out helper_dbczl for 970 Richard Henderson
                   ` (6 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

The 970 logic does not apply to dcbzep, which is an e500 insn.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/mem_helper.c | 30 +++++++++++++++---------------
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 361fd72226..5067919ff8 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -271,22 +271,12 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb,
 }
 
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
-                        uint32_t opcode, int mmu_idx, uintptr_t retaddr)
+                        int dcbz_size, int mmu_idx, uintptr_t retaddr)
 {
-    target_ulong mask, dcbz_size = env->dcache_line_size;
-    uint32_t i;
+    target_ulong mask = ~(target_ulong)(dcbz_size - 1);
     void *haddr;
 
-#if defined(TARGET_PPC64)
-    /* Check for dcbz vs dcbzl on 970 */
-    if (env->excp_model == POWERPC_EXCP_970 &&
-        !(opcode & 0x00200000) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
-        dcbz_size = 32;
-    }
-#endif
-
     /* Align address */
-    mask = ~(dcbz_size - 1);
     addr &= mask;
 
     /* Check reservation */
@@ -300,7 +290,7 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
         memset(haddr, 0, dcbz_size);
     } else {
         /* Slow path */
-        for (i = 0; i < dcbz_size; i += 8) {
+        for (int i = 0; i < dcbz_size; i += 8) {
             cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
         }
     }
@@ -308,12 +298,22 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
 
 void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-    dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
+    int dcbz_size = env->dcache_line_size;
+
+#if defined(TARGET_PPC64)
+    /* Check for dcbz vs dcbzl on 970 */
+    if (env->excp_model == POWERPC_EXCP_970 &&
+        !(opcode & 0x00200000) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
+        dcbz_size = 32;
+    }
+#endif
+
+    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
 }
 
 void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
 {
-    dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC());
+    dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
 }
 
 void helper_icbi(CPUPPCState *env, target_ulong addr)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 07/13] target/ppc: Split out helper_dbczl for 970
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (5 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10 12:17   ` BALATON Zoltan
  2024-07-10  3:28 ` [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep} Richard Henderson
                   ` (5 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

We can determine at translation time whether the insn is or
is not dbczl.  We must retain a runtime check against the
HID5 register, but we can move that to a separate function
that never affects other ppc models.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h     |  7 +++++--
 target/ppc/mem_helper.c | 34 +++++++++++++++++++++-------------
 target/ppc/translate.c  | 24 ++++++++++++++----------
 3 files changed, 40 insertions(+), 25 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 76b8f25c77..afc56855ff 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -46,8 +46,11 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
 DEF_HELPER_4(lsw, void, env, tl, i32, i32)
 DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
 DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
-DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32)
-DEF_HELPER_FLAGS_3(dcbzep, TCG_CALL_NO_WG, void, env, tl, i32)
+DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl)
+DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl)
+#ifdef TARGET_PPC64
+DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl)
+#endif
 DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl)
 DEF_HELPER_FLAGS_2(icbiep, TCG_CALL_NO_WG, void, env, tl)
 DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32)
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 5067919ff8..d4957efd6e 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -296,26 +296,34 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
     }
 }
 
-void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
+void helper_dcbz(CPUPPCState *env, target_ulong addr)
 {
-    int dcbz_size = env->dcache_line_size;
-
-#if defined(TARGET_PPC64)
-    /* Check for dcbz vs dcbzl on 970 */
-    if (env->excp_model == POWERPC_EXCP_970 &&
-        !(opcode & 0x00200000) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
-        dcbz_size = 32;
-    }
-#endif
-
-    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
+    dcbz_common(env, addr, env->dcache_line_size,
+                ppc_env_mmu_index(env, false), GETPC());
 }
 
-void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
+void helper_dcbzep(CPUPPCState *env, target_ulong addr)
 {
     dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
 }
 
+#ifdef TARGET_PPC64
+void helper_dcbzl(CPUPPCState *env, target_ulong addr)
+{
+    int dcbz_size = env->dcache_line_size;
+
+    /*
+     * The translator checked for POWERPC_EXCP_970.
+     * All that's left is to check HID5.
+     */
+    if (((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
+        dcbz_size = 32;
+    }
+
+    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
+}
+#endif
+
 void helper_icbi(CPUPPCState *env, target_ulong addr)
 {
     addr &= ~(env->dcache_line_size - 1);
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 0bc16d7251..2664c94522 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -200,6 +200,7 @@ struct DisasContext {
     uint32_t flags;
     uint64_t insns_flags;
     uint64_t insns_flags2;
+    powerpc_excp_t excp_model;
 };
 
 #define DISAS_EXIT         DISAS_TARGET_0  /* exit to main loop, pc updated */
@@ -4445,27 +4446,29 @@ static void gen_dcblc(DisasContext *ctx)
 /* dcbz */
 static void gen_dcbz(DisasContext *ctx)
 {
-    TCGv tcgv_addr;
-    TCGv_i32 tcgv_op;
+    TCGv tcgv_addr = tcg_temp_new();
 
     gen_set_access_type(ctx, ACCESS_CACHE);
-    tcgv_addr = tcg_temp_new();
-    tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000);
     gen_addr_reg_index(ctx, tcgv_addr);
-    gen_helper_dcbz(tcg_env, tcgv_addr, tcgv_op);
+
+#ifdef TARGET_PPC64
+    if (ctx->excp_model == POWERPC_EXCP_970 && !(ctx->opcode & 0x00200000)) {
+        gen_helper_dcbzl(tcg_env, tcgv_addr);
+        return;
+    }
+#endif
+
+    gen_helper_dcbz(tcg_env, tcgv_addr);
 }
 
 /* dcbzep */
 static void gen_dcbzep(DisasContext *ctx)
 {
-    TCGv tcgv_addr;
-    TCGv_i32 tcgv_op;
+    TCGv tcgv_addr = tcg_temp_new();
 
     gen_set_access_type(ctx, ACCESS_CACHE);
-    tcgv_addr = tcg_temp_new();
-    tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000);
     gen_addr_reg_index(ctx, tcgv_addr);
-    gen_helper_dcbzep(tcg_env, tcgv_addr, tcgv_op);
+    gen_helper_dcbzep(tcg_env, tcgv_addr);
 }
 
 /* dst / dstt */
@@ -6480,6 +6483,7 @@ static void ppc_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
     ctx->hv = (hflags >> HFLAGS_HV) & 1;
     ctx->insns_flags = env->insns_flags;
     ctx->insns_flags2 = env->insns_flags2;
+    ctx->excp_model = env->excp_model;
     ctx->access_type = -1;
     ctx->need_access_type = !mmu_is_64bit(env->mmu_model);
     ctx->le_mode = (hflags >> HFLAGS_LE) & 1;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep}
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (6 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 07/13] target/ppc: Split out helper_dbczl for 970 Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10 12:20   ` BALATON Zoltan
  2024-07-10  3:28 ` [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only Richard Henderson
                   ` (4 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Merge the two and pass the mmu_idx directly from translation.
Swap the argument order in dcbz_common to avoid extra swaps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/helper.h     |  3 +--
 target/ppc/mem_helper.c | 14 ++++----------
 target/ppc/translate.c  |  4 ++--
 3 files changed, 7 insertions(+), 14 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index afc56855ff..4fa089cbf9 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -46,8 +46,7 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
 DEF_HELPER_4(lsw, void, env, tl, i32, i32)
 DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
 DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
-DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl)
-DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl)
+DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, int)
 #ifdef TARGET_PPC64
 DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl)
 #endif
diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index d4957efd6e..24bae3b80c 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -271,7 +271,7 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb,
 }
 
 static void dcbz_common(CPUPPCState *env, target_ulong addr,
-                        int dcbz_size, int mmu_idx, uintptr_t retaddr)
+                        int mmu_idx, int dcbz_size, uintptr_t retaddr)
 {
     target_ulong mask = ~(target_ulong)(dcbz_size - 1);
     void *haddr;
@@ -296,15 +296,9 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
     }
 }
 
-void helper_dcbz(CPUPPCState *env, target_ulong addr)
+void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx)
 {
-    dcbz_common(env, addr, env->dcache_line_size,
-                ppc_env_mmu_index(env, false), GETPC());
-}
-
-void helper_dcbzep(CPUPPCState *env, target_ulong addr)
-{
-    dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
+    dcbz_common(env, addr, mmu_idx, env->dcache_line_size, GETPC());
 }
 
 #ifdef TARGET_PPC64
@@ -320,7 +314,7 @@ void helper_dcbzl(CPUPPCState *env, target_ulong addr)
         dcbz_size = 32;
     }
 
-    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
+    dcbz_common(env, addr, ppc_env_mmu_index(env, false), dcbz_size, GETPC());
 }
 #endif
 
diff --git a/target/ppc/translate.c b/target/ppc/translate.c
index 2664c94522..285734065b 100644
--- a/target/ppc/translate.c
+++ b/target/ppc/translate.c
@@ -4458,7 +4458,7 @@ static void gen_dcbz(DisasContext *ctx)
     }
 #endif
 
-    gen_helper_dcbz(tcg_env, tcgv_addr);
+    gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(ctx->mem_idx));
 }
 
 /* dcbzep */
@@ -4468,7 +4468,7 @@ static void gen_dcbzep(DisasContext *ctx)
 
     gen_set_access_type(ctx, ACCESS_CACHE);
     gen_addr_reg_index(ctx, tcgv_addr);
-    gen_helper_dcbzep(tcg_env, tcgv_addr);
+    gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(PPC_TLB_EPID_STORE));
 }
 
 /* dst / dstt */
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (7 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep} Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10 12:25   ` BALATON Zoltan
  2024-07-10  3:28 ` [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset Richard Henderson
                   ` (3 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Mark the reserve_addr check unlikely.  Use tlb_vaddr_to_host
instead of probe_write, relying on the memset itself to test
for page writability.  Use set/clear_helper_retaddr so that
we can properly unwind on segfault.

With this, a trivial loop around guest memset will spend
nearly 50% of runtime within helper_dcbz and host memset.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/ppc/mem_helper.c | 14 ++++++++++----
 1 file changed, 10 insertions(+), 4 deletions(-)

diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
index 24bae3b80c..fa4c4f9fa9 100644
--- a/target/ppc/mem_helper.c
+++ b/target/ppc/mem_helper.c
@@ -280,20 +280,26 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
     addr &= mask;
 
     /* Check reservation */
-    if ((env->reserve_addr & mask) == addr)  {
+    if (unlikely((env->reserve_addr & mask) == addr))  {
         env->reserve_addr = (target_ulong)-1ULL;
     }
 
     /* Try fast path translate */
+#ifdef CONFIG_USER_ONLY
+    haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, mmu_idx);
+#else
     haddr = probe_write(env, addr, dcbz_size, mmu_idx, retaddr);
-    if (haddr) {
-        memset(haddr, 0, dcbz_size);
-    } else {
+    if (unlikely(!haddr)) {
         /* Slow path */
         for (int i = 0; i < dcbz_size; i += 8) {
             cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
         }
     }
+#endif
+
+    set_helper_retaddr(retaddr);
+    memset(haddr, 0, dcbz_size);
+    clear_helper_retaddr();
 }
 
 void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx)
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (8 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-12 13:02   ` Peter Maydell
  2024-07-10  3:28 ` [PATCH v2 11/13] target/s390x: Use user_or_likely in access_memmove Richard Henderson
                   ` (2 subsequent siblings)
  12 siblings, 1 reply; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Eliminate the ifdef by using a predicate that is
always true with CONFIG_USER_ONLY.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/tcg/mem_helper.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 6cdbc34178..5311a15a09 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -225,10 +225,7 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr,
                              uint8_t byte, uint16_t size, int mmu_idx,
                              uintptr_t ra)
 {
-#ifdef CONFIG_USER_ONLY
-    memset(haddr, byte, size);
-#else
-    if (likely(haddr)) {
+    if (user_or_likely(haddr)) {
         memset(haddr, byte, size);
     } else {
         MemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
@@ -236,7 +233,6 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr,
             cpu_stb_mmu(env, vaddr + i, byte, oi, ra);
         }
     }
-#endif
 }
 
 static void access_memset(CPUS390XState *env, S390Access *desta,
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 11/13] target/s390x: Use user_or_likely in access_memmove
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (9 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 12/13] target/s390x: Use set/clear_helper_retaddr in mem_helper.c Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff Richard Henderson
  12 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Invert the conditional, indent the block, and use the macro
that expands to true for user-only.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/tcg/mem_helper.c | 54 +++++++++++++++++------------------
 1 file changed, 26 insertions(+), 28 deletions(-)

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index 5311a15a09..a76eb6182f 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -296,41 +296,39 @@ static void access_memmove(CPUS390XState *env, S390Access *desta,
                            S390Access *srca, uintptr_t ra)
 {
     int len = desta->size1 + desta->size2;
-    int diff;
 
     assert(len == srca->size1 + srca->size2);
 
     /* Fallback to slow access in case we don't have access to all host pages */
-    if (unlikely(!desta->haddr1 || (desta->size2 && !desta->haddr2) ||
-                 !srca->haddr1 || (srca->size2 && !srca->haddr2))) {
-        int i;
+    if (user_or_likely(desta->haddr &&
+                       srca->haddr &&
+                       (!desta->size2 || desta->haddr2) &&
+                       (!srca->size2 || srca->haddr2))) {
+        int diff = desta->size1 - srca->size1;
 
-        for (i = 0; i < len; i++) {
-            uint8_t byte = access_get_byte(env, srca, i, ra);
-
-            access_set_byte(env, desta, i, byte, ra);
-        }
-        return;
-    }
-
-    diff = desta->size1 - srca->size1;
-    if (likely(diff == 0)) {
-        memmove(desta->haddr1, srca->haddr1, srca->size1);
-        if (unlikely(srca->size2)) {
-            memmove(desta->haddr2, srca->haddr2, srca->size2);
-        }
-    } else if (diff > 0) {
-        memmove(desta->haddr1, srca->haddr1, srca->size1);
-        memmove(desta->haddr1 + srca->size1, srca->haddr2, diff);
-        if (likely(desta->size2)) {
-            memmove(desta->haddr2, srca->haddr2 + diff, desta->size2);
+        if (likely(diff == 0)) {
+            memmove(desta->haddr1, srca->haddr1, srca->size1);
+            if (unlikely(srca->size2)) {
+                memmove(desta->haddr2, srca->haddr2, srca->size2);
+            }
+        } else if (diff > 0) {
+            memmove(desta->haddr1, srca->haddr1, srca->size1);
+            memmove(desta->haddr1 + srca->size1, srca->haddr2, diff);
+            if (likely(desta->size2)) {
+                memmove(desta->haddr2, srca->haddr2 + diff, desta->size2);
+            }
+        } else {
+            diff = -diff;
+            memmove(desta->haddr1, srca->haddr1, desta->size1);
+            memmove(desta->haddr2, srca->haddr1 + desta->size1, diff);
+            if (likely(srca->size2)) {
+                memmove(desta->haddr2 + diff, srca->haddr2, srca->size2);
+            }
         }
     } else {
-        diff = -diff;
-        memmove(desta->haddr1, srca->haddr1, desta->size1);
-        memmove(desta->haddr2, srca->haddr1 + desta->size1, diff);
-        if (likely(srca->size2)) {
-            memmove(desta->haddr2 + diff, srca->haddr2, srca->size2);
+        for (int i = 0; i < len; i++) {
+            uint8_t byte = access_get_byte(env, srca, i, ra);
+            access_set_byte(env, desta, i, byte, ra);
         }
     }
 }
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 12/13] target/s390x: Use set/clear_helper_retaddr in mem_helper.c
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (10 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 11/13] target/s390x: Use user_or_likely in access_memmove Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10  3:28 ` [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff Richard Henderson
  12 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

Avoid a race condition with munmap in another thread.
For access_memset and access_memmove, manage the value
within the helper.  For uses of access_{get,set}_byte,
manage the value across the for loops.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/tcg/mem_helper.c | 43 ++++++++++++++++++++++++++++++-----
 1 file changed, 37 insertions(+), 6 deletions(-)

diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
index a76eb6182f..eb1f8880cd 100644
--- a/target/s390x/tcg/mem_helper.c
+++ b/target/s390x/tcg/mem_helper.c
@@ -238,14 +238,14 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr,
 static void access_memset(CPUS390XState *env, S390Access *desta,
                           uint8_t byte, uintptr_t ra)
 {
-
+    set_helper_retaddr(ra);
     do_access_memset(env, desta->vaddr1, desta->haddr1, byte, desta->size1,
                      desta->mmu_idx, ra);
-    if (likely(!desta->size2)) {
-        return;
+    if (unlikely(desta->size2)) {
+        do_access_memset(env, desta->vaddr2, desta->haddr2, byte,
+                         desta->size2, desta->mmu_idx, ra);
     }
-    do_access_memset(env, desta->vaddr2, desta->haddr2, byte, desta->size2,
-                     desta->mmu_idx, ra);
+    clear_helper_retaddr();
 }
 
 static uint8_t access_get_byte(CPUS390XState *env, S390Access *access,
@@ -366,6 +366,8 @@ static uint32_t do_helper_nc(CPUS390XState *env, uint32_t l, uint64_t dest,
     access_prepare(&srca1, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&srca2, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&desta, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+    set_helper_retaddr(ra);
+
     for (i = 0; i < l; i++) {
         const uint8_t x = access_get_byte(env, &srca1, i, ra) &
                           access_get_byte(env, &srca2, i, ra);
@@ -373,6 +375,8 @@ static uint32_t do_helper_nc(CPUS390XState *env, uint32_t l, uint64_t dest,
         c |= x;
         access_set_byte(env, &desta, i, x, ra);
     }
+
+    clear_helper_retaddr();
     return c != 0;
 }
 
@@ -407,6 +411,7 @@ static uint32_t do_helper_xc(CPUS390XState *env, uint32_t l, uint64_t dest,
         return 0;
     }
 
+    set_helper_retaddr(ra);
     for (i = 0; i < l; i++) {
         const uint8_t x = access_get_byte(env, &srca1, i, ra) ^
                           access_get_byte(env, &srca2, i, ra);
@@ -414,6 +419,7 @@ static uint32_t do_helper_xc(CPUS390XState *env, uint32_t l, uint64_t dest,
         c |= x;
         access_set_byte(env, &desta, i, x, ra);
     }
+    clear_helper_retaddr();
     return c != 0;
 }
 
@@ -441,6 +447,8 @@ static uint32_t do_helper_oc(CPUS390XState *env, uint32_t l, uint64_t dest,
     access_prepare(&srca1, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&srca2, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&desta, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+    set_helper_retaddr(ra);
+
     for (i = 0; i < l; i++) {
         const uint8_t x = access_get_byte(env, &srca1, i, ra) |
                           access_get_byte(env, &srca2, i, ra);
@@ -448,6 +456,8 @@ static uint32_t do_helper_oc(CPUS390XState *env, uint32_t l, uint64_t dest,
         c |= x;
         access_set_byte(env, &desta, i, x, ra);
     }
+
+    clear_helper_retaddr();
     return c != 0;
 }
 
@@ -484,11 +494,13 @@ static uint32_t do_helper_mvc(CPUS390XState *env, uint32_t l, uint64_t dest,
     } else if (!is_destructive_overlap(env, dest, src, l)) {
         access_memmove(env, &desta, &srca, ra);
     } else {
+        set_helper_retaddr(ra);
         for (i = 0; i < l; i++) {
             uint8_t byte = access_get_byte(env, &srca, i, ra);
 
             access_set_byte(env, &desta, i, byte, ra);
         }
+        clear_helper_retaddr();
     }
 
     return env->cc_op;
@@ -514,10 +526,12 @@ void HELPER(mvcrl)(CPUS390XState *env, uint64_t l, uint64_t dest, uint64_t src)
     access_prepare(&srca, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&desta, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
 
+    set_helper_retaddr(ra);
     for (i = l - 1; i >= 0; i--) {
         uint8_t byte = access_get_byte(env, &srca, i, ra);
         access_set_byte(env, &desta, i, byte, ra);
     }
+    clear_helper_retaddr();
 }
 
 /* move inverse  */
@@ -534,11 +548,13 @@ void HELPER(mvcin)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t src)
     src = wrap_address(env, src - l + 1);
     access_prepare(&srca, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&desta, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+
+    set_helper_retaddr(ra);
     for (i = 0; i < l; i++) {
         const uint8_t x = access_get_byte(env, &srca, l - i - 1, ra);
-
         access_set_byte(env, &desta, i, x, ra);
     }
+    clear_helper_retaddr();
 }
 
 /* move numerics  */
@@ -555,12 +571,15 @@ void HELPER(mvn)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t src)
     access_prepare(&srca1, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&srca2, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&desta, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+
+    set_helper_retaddr(ra);
     for (i = 0; i < l; i++) {
         const uint8_t x = (access_get_byte(env, &srca1, i, ra) & 0x0f) |
                           (access_get_byte(env, &srca2, i, ra) & 0xf0);
 
         access_set_byte(env, &desta, i, x, ra);
     }
+    clear_helper_retaddr();
 }
 
 /* move with offset  */
@@ -580,6 +599,8 @@ void HELPER(mvo)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t src)
 
     /* Handle rightmost byte */
     byte_dest = cpu_ldub_data_ra(env, dest + len_dest - 1, ra);
+
+    set_helper_retaddr(ra);
     byte_src = access_get_byte(env, &srca, len_src - 1, ra);
     byte_dest = (byte_dest & 0x0f) | (byte_src << 4);
     access_set_byte(env, &desta, len_dest - 1, byte_dest, ra);
@@ -595,6 +616,7 @@ void HELPER(mvo)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t src)
         byte_dest |= byte_src << 4;
         access_set_byte(env, &desta, i, byte_dest, ra);
     }
+    clear_helper_retaddr();
 }
 
 /* move zones  */
@@ -611,12 +633,15 @@ void HELPER(mvz)(CPUS390XState *env, uint32_t l, uint64_t dest, uint64_t src)
     access_prepare(&srca1, env, src, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&srca2, env, dest, l, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&desta, env, dest, l, MMU_DATA_STORE, mmu_idx, ra);
+
+    set_helper_retaddr(ra);
     for (i = 0; i < l; i++) {
         const uint8_t x = (access_get_byte(env, &srca1, i, ra) & 0xf0) |
                           (access_get_byte(env, &srca2, i, ra) & 0x0f);
 
         access_set_byte(env, &desta, i, x, ra);
     }
+    clear_helper_retaddr();
 }
 
 /* compare unsigned byte arrays */
@@ -961,15 +986,19 @@ uint32_t HELPER(mvst)(CPUS390XState *env, uint32_t r1, uint32_t r2)
      */
     access_prepare(&srca, env, s, len, MMU_DATA_LOAD, mmu_idx, ra);
     access_prepare(&desta, env, d, len, MMU_DATA_STORE, mmu_idx, ra);
+
+    set_helper_retaddr(ra);
     for (i = 0; i < len; i++) {
         const uint8_t v = access_get_byte(env, &srca, i, ra);
 
         access_set_byte(env, &desta, i, v, ra);
         if (v == c) {
+            clear_helper_retaddr();
             set_address_zero(env, r1, d + i);
             return 1;
         }
     }
+    clear_helper_retaddr();
     set_address_zero(env, r1, d + len);
     set_address_zero(env, r2, s + len);
     return 3;
@@ -1060,6 +1089,7 @@ static inline uint32_t do_mvcl(CPUS390XState *env,
         *dest = wrap_address(env, *dest + len);
     } else {
         access_prepare(&desta, env, *dest, len, MMU_DATA_STORE, mmu_idx, ra);
+        set_helper_retaddr(ra);
 
         /* The remaining length selects the padding byte. */
         for (i = 0; i < len; (*destlen)--, i++) {
@@ -1069,6 +1099,7 @@ static inline uint32_t do_mvcl(CPUS390XState *env,
                 access_set_byte(env, &desta, i, pad >> 8, ra);
             }
         }
+        clear_helper_retaddr();
         *dest = wrap_address(env, *dest + len);
     }
 
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
  2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
                   ` (11 preceding siblings ...)
  2024-07-10  3:28 ` [PATCH v2 12/13] target/s390x: Use set/clear_helper_retaddr in mem_helper.c Richard Henderson
@ 2024-07-10  3:28 ` Richard Henderson
  2024-07-10  4:09   ` Alistair Francis
  2024-07-15  7:06   ` Max Chou
  12 siblings, 2 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10  3:28 UTC (permalink / raw)
  To: qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton, max.chou

The current pairing of tlb_vaddr_to_host with extra is either
inefficient (user-only, with page_check_range) or incorrect
(system, with probe_pages).

For proper non-fault behaviour, use probe_access_flags with
its nonfault parameter set to true.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/vector_helper.c | 34 ++++++++++++++++++----------------
 1 file changed, 18 insertions(+), 16 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 1b4d5a8e37..4d72eb74d3 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -474,7 +474,6 @@ vext_ldff(void *vd, void *v0, target_ulong base,
           vext_ldst_elem_fn *ldst_elem,
           uint32_t log2_esz, uintptr_t ra)
 {
-    void *host;
     uint32_t i, k, vl = 0;
     uint32_t nf = vext_nf(desc);
     uint32_t vm = vext_vm(desc);
@@ -493,27 +492,31 @@ vext_ldff(void *vd, void *v0, target_ulong base,
         }
         addr = adjust_addr(env, base + i * (nf << log2_esz));
         if (i == 0) {
+            /* Allow fault on first element. */
             probe_pages(env, addr, nf << log2_esz, ra, MMU_DATA_LOAD);
         } else {
-            /* if it triggers an exception, no need to check watchpoint */
             remain = nf << log2_esz;
             while (remain > 0) {
+                void *host;
+                int flags;
+
                 offset = -(addr | TARGET_PAGE_MASK);
-                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_index);
-                if (host) {
-#ifdef CONFIG_USER_ONLY
-                    if (!page_check_range(addr, offset, PAGE_READ)) {
-                        vl = i;
-                        goto ProbeSuccess;
-                    }
-#else
-                    probe_pages(env, addr, offset, ra, MMU_DATA_LOAD);
-#endif
-                } else {
+
+                /* Probe nonfault on subsequent elements. */
+                flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
+                                           mmu_index, true, &host, 0);
+                if (flags) {
+                    /*
+                     * Stop any flag bit set:
+                     *   invalid (unmapped)
+                     *   mmio (transaction failed)
+                     *   watchpoint (debug)
+                     * In all cases, handle as the first load next time.
+                     */
                     vl = i;
-                    goto ProbeSuccess;
+                    break;
                 }
-                if (remain <=  offset) {
+                if (remain <= offset) {
                     break;
                 }
                 remain -= offset;
@@ -521,7 +524,6 @@ vext_ldff(void *vd, void *v0, target_ulong base,
             }
         }
     }
-ProbeSuccess:
     /* load bytes from guest memory */
     if (vl != 0) {
         env->vl = vl;
-- 
2.43.0



^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
  2024-07-10  3:28 ` [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff Richard Henderson
@ 2024-07-10  4:09   ` Alistair Francis
  2024-07-15  7:06   ` Max Chou
  1 sibling, 0 replies; 29+ messages in thread
From: Alistair Francis @ 2024-07-10  4:09 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton,
	max.chou

On Wed, Jul 10, 2024 at 1:30 PM Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The current pairing of tlb_vaddr_to_host with extra is either
> inefficient (user-only, with page_check_range) or incorrect
> (system, with probe_pages).
>
> For proper non-fault behaviour, use probe_access_flags with
> its nonfault parameter set to true.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Acked-by: Alistair Francis <alistair.francis@wdc.com>

Alistair

> ---
>  target/riscv/vector_helper.c | 34 ++++++++++++++++++----------------
>  1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 1b4d5a8e37..4d72eb74d3 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -474,7 +474,6 @@ vext_ldff(void *vd, void *v0, target_ulong base,
>            vext_ldst_elem_fn *ldst_elem,
>            uint32_t log2_esz, uintptr_t ra)
>  {
> -    void *host;
>      uint32_t i, k, vl = 0;
>      uint32_t nf = vext_nf(desc);
>      uint32_t vm = vext_vm(desc);
> @@ -493,27 +492,31 @@ vext_ldff(void *vd, void *v0, target_ulong base,
>          }
>          addr = adjust_addr(env, base + i * (nf << log2_esz));
>          if (i == 0) {
> +            /* Allow fault on first element. */
>              probe_pages(env, addr, nf << log2_esz, ra, MMU_DATA_LOAD);
>          } else {
> -            /* if it triggers an exception, no need to check watchpoint */
>              remain = nf << log2_esz;
>              while (remain > 0) {
> +                void *host;
> +                int flags;
> +
>                  offset = -(addr | TARGET_PAGE_MASK);
> -                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_index);
> -                if (host) {
> -#ifdef CONFIG_USER_ONLY
> -                    if (!page_check_range(addr, offset, PAGE_READ)) {
> -                        vl = i;
> -                        goto ProbeSuccess;
> -                    }
> -#else
> -                    probe_pages(env, addr, offset, ra, MMU_DATA_LOAD);
> -#endif
> -                } else {
> +
> +                /* Probe nonfault on subsequent elements. */
> +                flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
> +                                           mmu_index, true, &host, 0);
> +                if (flags) {
> +                    /*
> +                     * Stop any flag bit set:
> +                     *   invalid (unmapped)
> +                     *   mmio (transaction failed)
> +                     *   watchpoint (debug)
> +                     * In all cases, handle as the first load next time.
> +                     */
>                      vl = i;
> -                    goto ProbeSuccess;
> +                    break;
>                  }
> -                if (remain <=  offset) {
> +                if (remain <= offset) {
>                      break;
>                  }
>                  remain -= offset;
> @@ -521,7 +524,6 @@ vext_ldff(void *vd, void *v0, target_ulong base,
>              }
>          }
>      }
> -ProbeSuccess:
>      /* load bytes from guest memory */
>      if (vl != 0) {
>          env->vl = vl;
> --
> 2.43.0
>
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common
  2024-07-10  3:28 ` [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common Richard Henderson
@ 2024-07-10 12:11   ` BALATON Zoltan
  2024-07-10 14:36     ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: BALATON Zoltan @ 2024-07-10 12:11 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, max.chou

On Tue, 9 Jul 2024, Richard Henderson wrote:
> The 970 logic does not apply to dcbzep, which is an e500 insn.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>

As all callers pass GETPC() to retaddr maybe that could be moved into 
dcbz_common() as i had in my patch.

Regards,
BALATON Zoltan

> ---
> target/ppc/mem_helper.c | 30 +++++++++++++++---------------
> 1 file changed, 15 insertions(+), 15 deletions(-)
>
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 361fd72226..5067919ff8 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -271,22 +271,12 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb,
> }
>
> static void dcbz_common(CPUPPCState *env, target_ulong addr,
> -                        uint32_t opcode, int mmu_idx, uintptr_t retaddr)
> +                        int dcbz_size, int mmu_idx, uintptr_t retaddr)
> {
> -    target_ulong mask, dcbz_size = env->dcache_line_size;
> -    uint32_t i;
> +    target_ulong mask = ~(target_ulong)(dcbz_size - 1);
>     void *haddr;
>
> -#if defined(TARGET_PPC64)
> -    /* Check for dcbz vs dcbzl on 970 */
> -    if (env->excp_model == POWERPC_EXCP_970 &&
> -        !(opcode & 0x00200000) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
> -        dcbz_size = 32;
> -    }
> -#endif
> -
>     /* Align address */
> -    mask = ~(dcbz_size - 1);
>     addr &= mask;
>
>     /* Check reservation */
> @@ -300,7 +290,7 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>         memset(haddr, 0, dcbz_size);
>     } else {
>         /* Slow path */
> -        for (i = 0; i < dcbz_size; i += 8) {
> +        for (int i = 0; i < dcbz_size; i += 8) {
>             cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
>         }
>     }
> @@ -308,12 +298,22 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>
> void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
> {
> -    dcbz_common(env, addr, opcode, ppc_env_mmu_index(env, false), GETPC());
> +    int dcbz_size = env->dcache_line_size;
> +
> +#if defined(TARGET_PPC64)
> +    /* Check for dcbz vs dcbzl on 970 */
> +    if (env->excp_model == POWERPC_EXCP_970 &&
> +        !(opcode & 0x00200000) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
> +        dcbz_size = 32;
> +    }
> +#endif
> +
> +    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
> }
>
> void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
> {
> -    dcbz_common(env, addr, opcode, PPC_TLB_EPID_STORE, GETPC());
> +    dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
> }
>
> void helper_icbi(CPUPPCState *env, target_ulong addr)
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 07/13] target/ppc: Split out helper_dbczl for 970
  2024-07-10  3:28 ` [PATCH v2 07/13] target/ppc: Split out helper_dbczl for 970 Richard Henderson
@ 2024-07-10 12:17   ` BALATON Zoltan
  0 siblings, 0 replies; 29+ messages in thread
From: BALATON Zoltan @ 2024-07-10 12:17 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, max.chou

On Tue, 9 Jul 2024, Richard Henderson wrote:
> We can determine at translation time whether the insn is or
> is not dbczl.  We must retain a runtime check against the
> HID5 register, but we can move that to a separate function
> that never affects other ppc models.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>

Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>

Regards,
BALATON Zoltan

> ---
> target/ppc/helper.h     |  7 +++++--
> target/ppc/mem_helper.c | 34 +++++++++++++++++++++-------------
> target/ppc/translate.c  | 24 ++++++++++++++----------
> 3 files changed, 40 insertions(+), 25 deletions(-)
>
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index 76b8f25c77..afc56855ff 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -46,8 +46,11 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
> DEF_HELPER_4(lsw, void, env, tl, i32, i32)
> DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
> DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
> -DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, i32)
> -DEF_HELPER_FLAGS_3(dcbzep, TCG_CALL_NO_WG, void, env, tl, i32)
> +DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl)
> +DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl)
> +#ifdef TARGET_PPC64
> +DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl)
> +#endif
> DEF_HELPER_FLAGS_2(icbi, TCG_CALL_NO_WG, void, env, tl)
> DEF_HELPER_FLAGS_2(icbiep, TCG_CALL_NO_WG, void, env, tl)
> DEF_HELPER_5(lscbx, tl, env, tl, i32, i32, i32)
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 5067919ff8..d4957efd6e 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -296,26 +296,34 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>     }
> }
>
> -void helper_dcbz(CPUPPCState *env, target_ulong addr, uint32_t opcode)
> +void helper_dcbz(CPUPPCState *env, target_ulong addr)
> {
> -    int dcbz_size = env->dcache_line_size;
> -
> -#if defined(TARGET_PPC64)
> -    /* Check for dcbz vs dcbzl on 970 */
> -    if (env->excp_model == POWERPC_EXCP_970 &&
> -        !(opcode & 0x00200000) && ((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
> -        dcbz_size = 32;
> -    }
> -#endif
> -
> -    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
> +    dcbz_common(env, addr, env->dcache_line_size,
> +                ppc_env_mmu_index(env, false), GETPC());
> }
>
> -void helper_dcbzep(CPUPPCState *env, target_ulong addr, uint32_t opcode)
> +void helper_dcbzep(CPUPPCState *env, target_ulong addr)
> {
>     dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
> }
>
> +#ifdef TARGET_PPC64
> +void helper_dcbzl(CPUPPCState *env, target_ulong addr)
> +{
> +    int dcbz_size = env->dcache_line_size;
> +
> +    /*
> +     * The translator checked for POWERPC_EXCP_970.
> +     * All that's left is to check HID5.
> +     */
> +    if (((env->spr[SPR_970_HID5] >> 7) & 0x3) == 1) {
> +        dcbz_size = 32;
> +    }
> +
> +    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
> +}
> +#endif
> +
> void helper_icbi(CPUPPCState *env, target_ulong addr)
> {
>     addr &= ~(env->dcache_line_size - 1);
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 0bc16d7251..2664c94522 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -200,6 +200,7 @@ struct DisasContext {
>     uint32_t flags;
>     uint64_t insns_flags;
>     uint64_t insns_flags2;
> +    powerpc_excp_t excp_model;
> };
>
> #define DISAS_EXIT         DISAS_TARGET_0  /* exit to main loop, pc updated */
> @@ -4445,27 +4446,29 @@ static void gen_dcblc(DisasContext *ctx)
> /* dcbz */
> static void gen_dcbz(DisasContext *ctx)
> {
> -    TCGv tcgv_addr;
> -    TCGv_i32 tcgv_op;
> +    TCGv tcgv_addr = tcg_temp_new();
>
>     gen_set_access_type(ctx, ACCESS_CACHE);
> -    tcgv_addr = tcg_temp_new();
> -    tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000);
>     gen_addr_reg_index(ctx, tcgv_addr);
> -    gen_helper_dcbz(tcg_env, tcgv_addr, tcgv_op);
> +
> +#ifdef TARGET_PPC64
> +    if (ctx->excp_model == POWERPC_EXCP_970 && !(ctx->opcode & 0x00200000)) {
> +        gen_helper_dcbzl(tcg_env, tcgv_addr);
> +        return;
> +    }
> +#endif
> +
> +    gen_helper_dcbz(tcg_env, tcgv_addr);
> }
>
> /* dcbzep */
> static void gen_dcbzep(DisasContext *ctx)
> {
> -    TCGv tcgv_addr;
> -    TCGv_i32 tcgv_op;
> +    TCGv tcgv_addr = tcg_temp_new();
>
>     gen_set_access_type(ctx, ACCESS_CACHE);
> -    tcgv_addr = tcg_temp_new();
> -    tcgv_op = tcg_constant_i32(ctx->opcode & 0x03FF000);
>     gen_addr_reg_index(ctx, tcgv_addr);
> -    gen_helper_dcbzep(tcg_env, tcgv_addr, tcgv_op);
> +    gen_helper_dcbzep(tcg_env, tcgv_addr);
> }
>
> /* dst / dstt */
> @@ -6480,6 +6483,7 @@ static void ppc_tr_init_disas_context(DisasContextBase *dcbase, CPUState *cs)
>     ctx->hv = (hflags >> HFLAGS_HV) & 1;
>     ctx->insns_flags = env->insns_flags;
>     ctx->insns_flags2 = env->insns_flags2;
> +    ctx->excp_model = env->excp_model;
>     ctx->access_type = -1;
>     ctx->need_access_type = !mmu_is_64bit(env->mmu_model);
>     ctx->le_mode = (hflags >> HFLAGS_LE) & 1;
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep}
  2024-07-10  3:28 ` [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep} Richard Henderson
@ 2024-07-10 12:20   ` BALATON Zoltan
  2024-07-10 14:41     ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: BALATON Zoltan @ 2024-07-10 12:20 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, max.chou

On Tue, 9 Jul 2024, Richard Henderson wrote:
> Merge the two and pass the mmu_idx directly from translation.
> Swap the argument order in dcbz_common to avoid extra swaps.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/ppc/helper.h     |  3 +--
> target/ppc/mem_helper.c | 14 ++++----------
> target/ppc/translate.c  |  4 ++--
> 3 files changed, 7 insertions(+), 14 deletions(-)
>
> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
> index afc56855ff..4fa089cbf9 100644
> --- a/target/ppc/helper.h
> +++ b/target/ppc/helper.h
> @@ -46,8 +46,7 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
> DEF_HELPER_4(lsw, void, env, tl, i32, i32)
> DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
> DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
> -DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl)
> -DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl)
> +DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, int)
> #ifdef TARGET_PPC64
> DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl)

Why only dcbz but not dcbzl as well to make them uniform?

Regards,
BALATON Zoltan

> #endif
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index d4957efd6e..24bae3b80c 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -271,7 +271,7 @@ void helper_stsw(CPUPPCState *env, target_ulong addr, uint32_t nb,
> }
>
> static void dcbz_common(CPUPPCState *env, target_ulong addr,
> -                        int dcbz_size, int mmu_idx, uintptr_t retaddr)
> +                        int mmu_idx, int dcbz_size, uintptr_t retaddr)
> {
>     target_ulong mask = ~(target_ulong)(dcbz_size - 1);
>     void *haddr;
> @@ -296,15 +296,9 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>     }
> }
>
> -void helper_dcbz(CPUPPCState *env, target_ulong addr)
> +void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx)
> {
> -    dcbz_common(env, addr, env->dcache_line_size,
> -                ppc_env_mmu_index(env, false), GETPC());
> -}
> -
> -void helper_dcbzep(CPUPPCState *env, target_ulong addr)
> -{
> -    dcbz_common(env, addr, env->dcache_line_size, PPC_TLB_EPID_STORE, GETPC());
> +    dcbz_common(env, addr, mmu_idx, env->dcache_line_size, GETPC());
> }
>
> #ifdef TARGET_PPC64
> @@ -320,7 +314,7 @@ void helper_dcbzl(CPUPPCState *env, target_ulong addr)
>         dcbz_size = 32;
>     }
>
> -    dcbz_common(env, addr, dcbz_size, ppc_env_mmu_index(env, false), GETPC());
> +    dcbz_common(env, addr, ppc_env_mmu_index(env, false), dcbz_size, GETPC());
> }
> #endif
>
> diff --git a/target/ppc/translate.c b/target/ppc/translate.c
> index 2664c94522..285734065b 100644
> --- a/target/ppc/translate.c
> +++ b/target/ppc/translate.c
> @@ -4458,7 +4458,7 @@ static void gen_dcbz(DisasContext *ctx)
>     }
> #endif
>
> -    gen_helper_dcbz(tcg_env, tcgv_addr);
> +    gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(ctx->mem_idx));
> }
>
> /* dcbzep */
> @@ -4468,7 +4468,7 @@ static void gen_dcbzep(DisasContext *ctx)
>
>     gen_set_access_type(ctx, ACCESS_CACHE);
>     gen_addr_reg_index(ctx, tcgv_addr);
> -    gen_helper_dcbzep(tcg_env, tcgv_addr);
> +    gen_helper_dcbz(tcg_env, tcgv_addr, tcg_constant_i32(PPC_TLB_EPID_STORE));
> }
>
> /* dst / dstt */
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only
  2024-07-10  3:28 ` [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only Richard Henderson
@ 2024-07-10 12:25   ` BALATON Zoltan
  2024-07-10 14:42     ` Richard Henderson
  0 siblings, 1 reply; 29+ messages in thread
From: BALATON Zoltan @ 2024-07-10 12:25 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, max.chou

On Tue, 9 Jul 2024, Richard Henderson wrote:
> Mark the reserve_addr check unlikely.  Use tlb_vaddr_to_host
> instead of probe_write, relying on the memset itself to test
> for page writability.  Use set/clear_helper_retaddr so that
> we can properly unwind on segfault.
>
> With this, a trivial loop around guest memset will spend
> nearly 50% of runtime within helper_dcbz and host memset.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
> target/ppc/mem_helper.c | 14 ++++++++++----
> 1 file changed, 10 insertions(+), 4 deletions(-)
>
> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
> index 24bae3b80c..fa4c4f9fa9 100644
> --- a/target/ppc/mem_helper.c
> +++ b/target/ppc/mem_helper.c
> @@ -280,20 +280,26 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>     addr &= mask;
>
>     /* Check reservation */
> -    if ((env->reserve_addr & mask) == addr)  {
> +    if (unlikely((env->reserve_addr & mask) == addr))  {
>         env->reserve_addr = (target_ulong)-1ULL;
>     }
>
>     /* Try fast path translate */
> +#ifdef CONFIG_USER_ONLY
> +    haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, mmu_idx);
> +#else
>     haddr = probe_write(env, addr, dcbz_size, mmu_idx, retaddr);
> -    if (haddr) {
> -        memset(haddr, 0, dcbz_size);
> -    } else {
> +    if (unlikely(!haddr)) {
>         /* Slow path */
>         for (int i = 0; i < dcbz_size; i += 8) {
>             cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
>         }

Is a return needed here to only get to memset below when haddr != NULL?

Regards,
BALATON Zoltan

>     }
> +#endif
> +
> +    set_helper_retaddr(retaddr);
> +    memset(haddr, 0, dcbz_size);
> +    clear_helper_retaddr();
> }
>
> void helper_dcbz(CPUPPCState *env, target_ulong addr, int mmu_idx)
>


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common
  2024-07-10 12:11   ` BALATON Zoltan
@ 2024-07-10 14:36     ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10 14:36 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, max.chou

On 7/10/24 05:11, BALATON Zoltan wrote:
> On Tue, 9 Jul 2024, Richard Henderson wrote:
>> The 970 logic does not apply to dcbzep, which is an e500 insn.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> 
> Reviewed-by: BALATON Zoltan <balaton@eik.bme.hu>
> 
> As all callers pass GETPC() to retaddr maybe that could be moved into dcbz_common() as i 
> had in my patch.

No, that would be incorrect usage of GETPC.
It *must* be used in the HELPER directly called from TCG.


r~


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep}
  2024-07-10 12:20   ` BALATON Zoltan
@ 2024-07-10 14:41     ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10 14:41 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, max.chou

On 7/10/24 05:20, BALATON Zoltan wrote:
> On Tue, 9 Jul 2024, Richard Henderson wrote:
>> Merge the two and pass the mmu_idx directly from translation.
>> Swap the argument order in dcbz_common to avoid extra swaps.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/ppc/helper.h     |  3 +--
>> target/ppc/mem_helper.c | 14 ++++----------
>> target/ppc/translate.c  |  4 ++--
>> 3 files changed, 7 insertions(+), 14 deletions(-)
>>
>> diff --git a/target/ppc/helper.h b/target/ppc/helper.h
>> index afc56855ff..4fa089cbf9 100644
>> --- a/target/ppc/helper.h
>> +++ b/target/ppc/helper.h
>> @@ -46,8 +46,7 @@ DEF_HELPER_FLAGS_3(stmw, TCG_CALL_NO_WG, void, env, tl, i32)
>> DEF_HELPER_4(lsw, void, env, tl, i32, i32)
>> DEF_HELPER_5(lswx, void, env, tl, i32, i32, i32)
>> DEF_HELPER_FLAGS_4(stsw, TCG_CALL_NO_WG, void, env, tl, i32, i32)
>> -DEF_HELPER_FLAGS_2(dcbz, TCG_CALL_NO_WG, void, env, tl)
>> -DEF_HELPER_FLAGS_2(dcbzep, TCG_CALL_NO_WG, void, env, tl)
>> +DEF_HELPER_FLAGS_3(dcbz, TCG_CALL_NO_WG, void, env, tl, int)
>> #ifdef TARGET_PPC64
>> DEF_HELPER_FLAGS_2(dcbzl, TCG_CALL_NO_WG, void, env, tl)
> 
> Why only dcbz but not dcbzl as well to make them uniform?

Because dcbzl has extra logic, so doesn't get merged.
The expansion of ppc_env_mmu_index() is not large, so there's no gain in passing it in.

The point was to eliminate duplication where it was easy.


r~


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only
  2024-07-10 12:25   ` BALATON Zoltan
@ 2024-07-10 14:42     ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-10 14:42 UTC (permalink / raw)
  To: BALATON Zoltan
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, max.chou

On 7/10/24 05:25, BALATON Zoltan wrote:
> On Tue, 9 Jul 2024, Richard Henderson wrote:
>> Mark the reserve_addr check unlikely.  Use tlb_vaddr_to_host
>> instead of probe_write, relying on the memset itself to test
>> for page writability.  Use set/clear_helper_retaddr so that
>> we can properly unwind on segfault.
>>
>> With this, a trivial loop around guest memset will spend
>> nearly 50% of runtime within helper_dcbz and host memset.
>>
>> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
>> ---
>> target/ppc/mem_helper.c | 14 ++++++++++----
>> 1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/target/ppc/mem_helper.c b/target/ppc/mem_helper.c
>> index 24bae3b80c..fa4c4f9fa9 100644
>> --- a/target/ppc/mem_helper.c
>> +++ b/target/ppc/mem_helper.c
>> @@ -280,20 +280,26 @@ static void dcbz_common(CPUPPCState *env, target_ulong addr,
>>     addr &= mask;
>>
>>     /* Check reservation */
>> -    if ((env->reserve_addr & mask) == addr)  {
>> +    if (unlikely((env->reserve_addr & mask) == addr))  {
>>         env->reserve_addr = (target_ulong)-1ULL;
>>     }
>>
>>     /* Try fast path translate */
>> +#ifdef CONFIG_USER_ONLY
>> +    haddr = tlb_vaddr_to_host(env, addr, MMU_DATA_STORE, mmu_idx);
>> +#else
>>     haddr = probe_write(env, addr, dcbz_size, mmu_idx, retaddr);
>> -    if (haddr) {
>> -        memset(haddr, 0, dcbz_size);
>> -    } else {
>> +    if (unlikely(!haddr)) {
>>         /* Slow path */
>>         for (int i = 0; i < dcbz_size; i += 8) {
>>             cpu_stq_mmuidx_ra(env, addr + i, 0, mmu_idx, retaddr);
>>         }
> 
> Is a return needed here to only get to memset below when haddr != NULL?

Oops, yes.


r~


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h
  2024-07-10  3:28 ` [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h Richard Henderson
@ 2024-07-12 12:48   ` Peter Maydell
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-07-12 12:48 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton,
	max.chou

On Wed, 10 Jul 2024 at 04:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Use of these in helpers goes hand-in-hand with tlb_vaddr_to_host
> and other probing functions.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  accel/tcg/user-retaddr.h | 28 ----------------------------
>  include/exec/cpu_ldst.h  | 34 ++++++++++++++++++++++++++++++++++
>  accel/tcg/cpu-exec.c     |  3 ---
>  accel/tcg/user-exec.c    |  1 -
>  4 files changed, 34 insertions(+), 32 deletions(-)
>  delete mode 100644 accel/tcg/user-retaddr.h

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 02/13] target/arm: Use cpu_env in cpu_untagged_addr
  2024-07-10  3:28 ` [PATCH v2 02/13] target/arm: Use cpu_env in cpu_untagged_addr Richard Henderson
@ 2024-07-12 12:49   ` Peter Maydell
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-07-12 12:49 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton,
	max.chou

On Wed, 10 Jul 2024 at 04:30, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> In a completely artifical memset benchmark object_dynamic_cast_assert
> dominates the profile, even above guest address resolution and
> the underlying host memset.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/cpu.h | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/cpu.h b/target/arm/cpu.h
> index d8eb986a04..ccfb9349a3 100644
> --- a/target/arm/cpu.h
> +++ b/target/arm/cpu.h
> @@ -3309,8 +3309,8 @@ extern const uint64_t pred_esz_masks[5];
>   */
>  static inline target_ulong cpu_untagged_addr(CPUState *cs, target_ulong x)
>  {
> -    ARMCPU *cpu = ARM_CPU(cs);
> -    if (cpu->env.tagged_addr_enable) {
> +    CPUARMState *env = cpu_env(cs);
> +    if (env->tagged_addr_enable) {
>          /*
>           * TBI is enabled for userspace but not kernelspace addresses.
>           * Only clear the tag if bit 55 is clear.

(This one's now upstream as commit efceb7d2bd5c.)

-- PMM


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 03/13] target/arm: Use set/clear_helper_retaddr in helper-a64.c
  2024-07-10  3:28 ` [PATCH v2 03/13] target/arm: Use set/clear_helper_retaddr in helper-a64.c Richard Henderson
@ 2024-07-12 12:53   ` Peter Maydell
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-07-12 12:53 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton,
	max.chou

On Wed, 10 Jul 2024 at 04:29, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Use these in helper_dc_dva and the FEAT_MOPS routines to
> avoid a race condition with munmap in another thread.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/helper-a64.c | 14 ++++++++++++--
>  1 file changed, 12 insertions(+), 2 deletions(-)
>
> diff --git a/target/arm/tcg/helper-a64.c b/target/arm/tcg/helper-a64.c
> index 0ea8668ab4..c60d2a7ec9 100644
> --- a/target/arm/tcg/helper-a64.c
> +++ b/target/arm/tcg/helper-a64.c
> @@ -928,6 +928,8 @@ uint32_t HELPER(sqrt_f16)(uint32_t a, void *fpstp)
>
>  void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
>  {
> +    uintptr_t ra = GETPC();
> +
>      /*
>       * Implement DC ZVA, which zeroes a fixed-length block of memory.
>       * Note that we do not implement the (architecturally mandated)
> @@ -948,8 +950,6 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
>
>  #ifndef CONFIG_USER_ONLY
>      if (unlikely(!mem)) {
> -        uintptr_t ra = GETPC();
> -
>          /*
>           * Trap if accessing an invalid page.  DC_ZVA requires that we supply
>           * the original pointer for an invalid page.  But watchpoints require
> @@ -971,7 +971,9 @@ void HELPER(dc_zva)(CPUARMState *env, uint64_t vaddr_in)
>      }
>  #endif
>
> +    set_helper_retaddr(ra);
>      memset(mem, 0, blocklen);
> +    clear_helper_retaddr();
>  }
>
>  void HELPER(unaligned_access)(CPUARMState *env, uint64_t addr,
> @@ -1120,7 +1122,9 @@ static uint64_t set_step(CPUARMState *env, uint64_t toaddr,
>      }
>  #endif
>      /* Easy case: just memset the host memory */
> +    set_helper_retaddr(ra);
>      memset(mem, data, setsize);
> +    clear_helper_retaddr();
>      return setsize;
>  }
>
> @@ -1163,7 +1167,9 @@ static uint64_t set_step_tags(CPUARMState *env, uint64_t toaddr,
>      }
>  #endif
>      /* Easy case: just memset the host memory */
> +    set_helper_retaddr(ra);
>      memset(mem, data, setsize);
> +    clear_helper_retaddr();
>      mte_mops_set_tags(env, toaddr, setsize, *mtedesc);
>      return setsize;
>  }

I was briefly uncertain about this, but since we restrict
the size of the amount of memory we're setting to not cross
a guest page, if the memset() does fault it should only
be for the case of "the whole block went away", so you never
get the situation of "successfully wrote a chunk and then
faulted partway through, and we should have written the
tags for the partial write and did not".

So

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 04/13] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers
  2024-07-10  3:28 ` [PATCH v2 04/13] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers Richard Henderson
@ 2024-07-12 13:00   ` Peter Maydell
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-07-12 13:00 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton,
	max.chou

On Wed, 10 Jul 2024 at 04:31, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Avoid a race condition with munmap in another thread.
> Use around blocks that exclusively use "host_fn".
> Keep the blocks as small as possible, but without setting
> and clearing for every operation on one page.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/arm/tcg/sme_helper.c | 16 ++++++++++++++++
>  target/arm/tcg/sve_helper.c | 26 ++++++++++++++++++++++++++
>  2 files changed, 42 insertions(+)


> @@ -6093,6 +6101,8 @@ void sve_ldnfff1_r(CPUARMState *env, void *vg, const target_ulong addr,
>      reg_last = info.reg_off_last[0];
>      host = info.page[0].host;
>
> +    set_helper_retaddr(retaddr);
> +
>      do {
>          uint64_t pg = *(uint64_t *)(vg + (reg_off >> 3));
>          do {
> @@ -6113,6 +6123,8 @@ void sve_ldnfff1_r(CPUARMState *env, void *vg, const target_ulong addr,
>          } while (reg_off <= reg_last && (reg_off & 63));
>      } while (reg_off <= reg_last);
>
> +    clear_helper_retaddr();
> +
>      /*
>       * MemSingleNF is allowed to fail for any reason.  We have special
>       * code above to handle the first element crossing a page boundary.

There's a "goto do_fault" inside the loop that we've bracketed here with
the set/clear_helper_retaddr() calls -- don't we need to call
clear_helper_retaddr() on that failure path too?

There's a TODO comment at the top of this file:

/*
 * Load contiguous data, first-fault and no-fault.
 *
 * For user-only, one could argue that we should hold the mmap_lock during
 * the operation so that there is no race between page_check_range and the
 * load operation.  However, unmapping pages out from under a running thread
 * is extraordinarily unlikely.  This theoretical race condition also affects
 * linux-user/ in its get_user/put_user macros.
 *
 * TODO: Construct some helpers, written in assembly, that interact with
 * host_signal_handler to produce memory ops which can properly report errors
 * without racing.
 */

Should we update it to note that we make at least some attempt to
handle the pages-unmapped-from-under-a-running-thread situation now?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset
  2024-07-10  3:28 ` [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset Richard Henderson
@ 2024-07-12 13:02   ` Peter Maydell
  0 siblings, 0 replies; 29+ messages in thread
From: Peter Maydell @ 2024-07-12 13:02 UTC (permalink / raw)
  To: Richard Henderson
  Cc: qemu-devel, qemu-arm, qemu-s390x, qemu-riscv, balaton, max.chou

On Wed, 10 Jul 2024 at 04:28, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> Eliminate the ifdef by using a predicate that is
> always true with CONFIG_USER_ONLY.
>
> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
> ---
>  target/s390x/tcg/mem_helper.c | 6 +-----
>  1 file changed, 1 insertion(+), 5 deletions(-)
>
> diff --git a/target/s390x/tcg/mem_helper.c b/target/s390x/tcg/mem_helper.c
> index 6cdbc34178..5311a15a09 100644
> --- a/target/s390x/tcg/mem_helper.c
> +++ b/target/s390x/tcg/mem_helper.c
> @@ -225,10 +225,7 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr,
>                               uint8_t byte, uint16_t size, int mmu_idx,
>                               uintptr_t ra)
>  {
> -#ifdef CONFIG_USER_ONLY
> -    memset(haddr, byte, size);
> -#else
> -    if (likely(haddr)) {
> +    if (user_or_likely(haddr)) {
>          memset(haddr, byte, size);
>      } else {
>          MemOpIdx oi = make_memop_idx(MO_UB, mmu_idx);
> @@ -236,7 +233,6 @@ static void do_access_memset(CPUS390XState *env, vaddr vaddr, char *haddr,
>              cpu_stb_mmu(env, vaddr + i, byte, oi, ra);
>          }
>      }
> -#endif

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>

thanks
-- PMM


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
  2024-07-10  3:28 ` [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff Richard Henderson
  2024-07-10  4:09   ` Alistair Francis
@ 2024-07-15  7:06   ` Max Chou
  2024-07-15 21:42     ` Richard Henderson
  1 sibling, 1 reply; 29+ messages in thread
From: Max Chou @ 2024-07-15  7:06 UTC (permalink / raw)
  To: Richard Henderson, qemu-devel
  Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton

[-- Attachment #1: Type: text/plain, Size: 6082 bytes --]

On 2024/7/10 11:28 AM, Richard Henderson wrote:
> The current pairing of tlb_vaddr_to_host with extra is either
> inefficient (user-only, with page_check_range) or incorrect
> (system, with probe_pages).
>
> For proper non-fault behaviour, use probe_access_flags with
> its nonfault parameter set to true.
>
> Signed-off-by: Richard Henderson<richard.henderson@linaro.org>
> ---
>   target/riscv/vector_helper.c | 34 ++++++++++++++++++----------------
>   1 file changed, 18 insertions(+), 16 deletions(-)
>
> diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
> index 1b4d5a8e37..4d72eb74d3 100644
> --- a/target/riscv/vector_helper.c
> +++ b/target/riscv/vector_helper.c
> @@ -474,7 +474,6 @@ vext_ldff(void *vd, void *v0, target_ulong base,
>             vext_ldst_elem_fn *ldst_elem,
>             uint32_t log2_esz, uintptr_t ra)
>   {
> -    void *host;
>       uint32_t i, k, vl = 0;
>       uint32_t nf = vext_nf(desc);
>       uint32_t vm = vext_vm(desc);
> @@ -493,27 +492,31 @@ vext_ldff(void *vd, void *v0, target_ulong base,
>           }
>           addr = adjust_addr(env, base + i * (nf << log2_esz));
>           if (i == 0) {
> +            /* Allow fault on first element. */
>               probe_pages(env, addr, nf << log2_esz, ra, MMU_DATA_LOAD);
>           } else {
> -            /* if it triggers an exception, no need to check watchpoint */
>               remain = nf << log2_esz;
>               while (remain > 0) {
> +                void *host;
> +                int flags;
> +
>                   offset = -(addr | TARGET_PAGE_MASK);
> -                host = tlb_vaddr_to_host(env, addr, MMU_DATA_LOAD, mmu_index);
> -                if (host) {
> -#ifdef CONFIG_USER_ONLY
> -                    if (!page_check_range(addr, offset, PAGE_READ)) {
> -                        vl = i;
> -                        goto ProbeSuccess;
> -                    }
> -#else
> -                    probe_pages(env, addr, offset, ra, MMU_DATA_LOAD);
> -#endif
> -                } else {
> +
> +                /* Probe nonfault on subsequent elements. */
> +                flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
> +                                           mmu_index, true, &host, 0);
> +                if (flags) {
According to the section 7.7. Unit-stride Fault-Only-First Loads in the 
v spec (v1.0)

      When the fault-only-first instruction would trigger a debug 
data-watchpoint trap on an element after the first,
      implementations should not reduce vl but instead should trigger 
the debug trap as otherwise the event might be lost.

I think that we need to filter out the watchpoint bit in the flag here 
liked below.
+ if (flags & ~TLB_WATCHPOINT) {


> +                    /*
> +                     * Stop any flag bit set:
> +                     *   invalid (unmapped)
> +                     *   mmio (transaction failed)
> +                     *   watchpoint (debug)
> +                     * In all cases, handle as the first load next time.
> +                     */
>                       vl = i;
> -                    goto ProbeSuccess;
> +                    break;
This break will just break the while loop, so the outside for loop will 
continue checking the following elements that we may get unexpected vl.


>                   }
> -                if (remain <=  offset) {
> +                if (remain <= offset) {
>                       break;
>                   }
>                   remain -= offset;
> @@ -521,7 +524,6 @@ vext_ldff(void *vd, void *v0, target_ulong base,
>               }
>           }
>       }
> -ProbeSuccess:
>       /* load bytes from guest memory */
>       if (vl != 0) {
>           env->vl = vl;


Thanks for this series patch set, I’m trying to rebase the RVV 
optimization RFC patch set to this series then optimize the vext_ldff part.
And I think that there is a potential issue in the original 
implementation that maybe we can fix in this patch.

We need to assign the correct element load size to the 
probe_access_internal function called by tlb_vaddr_to_host in original 
implementation or is called directly in this patch.
The size parameter will be used by the pmp_hart_has_privs function to do 
the physical memory protection (PMP) checking.
If we set the size parameter to the remain page size, we may get 
unexpected trap caused by the PMP rules that covered the regions of 
masked-off elements.

Maybe we can replace the while loop liked below.


vext_ldff(void *vd, void *v0, target_ulong base,
           ...
{
     ...
     uint32_t size = nf << log2_esz;

     VSTART_CHECK_EARLY_EXIT(env);

     /* probe every access */
     for (i = env->vstart; i < env->vl; i++) {
         if (!vm && !vext_elem_mask(v0, i)) {
             continue;
         }
         addr = adjust_addr(env, base + i * size);
         if (i == 0) {
             probe_pages(env, addr, size, ra, MMU_DATA_LOAD);
         } else {
             /* if it triggers an exception, no need to check watchpoint */
             void *host;
             int flags;

             /* Probe nonfault on subsequent elements. */
             flags = probe_access_flags(env, addr, size, MMU_DATA_LOAD,
                     mmu_index, true, &host, 0);
             if (flags & ~TLB_WATCHPOINT) {
                 /*
                  * Stop any flag bit set:
                  *   invalid (unmapped)
                  *   mmio (transaction failed)
                  * In all cases, handle as the first load next time.
                  */
                 vl = i;
                 break;
             }
         }
     }

     /* load bytes from guest memory */
     if (vl != 0) {
         env->vl = vl;
     }
     ...
}


[-- Attachment #2: Type: text/html, Size: 9104 bytes --]

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff
  2024-07-15  7:06   ` Max Chou
@ 2024-07-15 21:42     ` Richard Henderson
  0 siblings, 0 replies; 29+ messages in thread
From: Richard Henderson @ 2024-07-15 21:42 UTC (permalink / raw)
  To: Max Chou, qemu-devel; +Cc: qemu-arm, qemu-ppc, qemu-s390x, qemu-riscv, balaton

On 7/15/24 17:06, Max Chou wrote:
>> +                /* Probe nonfault on subsequent elements. */
>> +                flags = probe_access_flags(env, addr, offset, MMU_DATA_LOAD,
>> +                                           mmu_index, true, &host, 0);
>> +                if (flags) {
> According to the section 7.7. Unit-stride Fault-Only-First Loads in the v spec (v1.0)
> 
>       When the fault-only- data-watchpoint trap on an element after the      
> implementations should not reduce vl but instead should trigger the debug trap as 
> otherwise the event might be lost.

Hmm, ok.  Interesting.


> And I think that there is a potential issue in the original implementation that maybe we 
> can fix in this patch.
> 
> We need to assign the correct element load size to the probe_access_internal function 
> called by tlb_vaddr_to_host in original implementation or is called directly in this patch.
> The size parameter will be used by the pmp_hart_has_privs function to do the physical 
> memory protection (PMP) checking.
> If we set the size parameter to the remain page size, we may get unexpected trap caused by 
> the PMP rules that covered the regions of masked-off elements.
> 
> Maybe we can replace the while loop liked below.
> 
> 
> vext_ldff(void *vd, void *v0, target_ulong base,
>            ...
> {
>      ...
>      uint32_t size = nf << log2_esz;
> 
>      VSTART_CHECK_EARLY_EXIT(env);
> 
>      /* probe every access */
>      for (i = env->vstart; i < env->vl; i++) {
>          if (!vm && !vext_elem_mask(v0, i)) {
>              continue;
>          }
>          addr = adjust_addr(env, base + i * size);
>          if (i == 0) {
>              probe_pages(env, addr, size, ra, MMU_DATA_LOAD);
>          } else {
>              /* if it triggers an exception, no need to check watchpoint */
>              void *host;
>              int flags;
> 
>              /* Probe nonfault on subsequent elements. */
>              flags = probe_access_flags(env, addr, size, MMU_DATA_LOAD,
>                      mmu_index, true, &host, 0);
>              if (flags & ~TLB_WATCHPOINT) {
>                  /*
>                   * Stop any flag bit set:
>                   *   invalid (unmapped)
>                   *   mmio (transaction failed)
>                   * In all cases, handle as the first load next time.
>                   */
>                  vl = i;
>                  break;
>              }
>          }
>      }

No, I don't think repeated probing is a good idea.
You'll lose everything you attempted to gain with the other improvements.

It seems, to handle watchpoints, you need to start by probing the entire length non-fault. 
  That will tell you if any portion of the length has any of the problem cases.  The fast 
path will not, of course.

After probing, you have flags for the 1 or two pages, and you can make a choice about the 
actual load length:

   - invalid on first page: either the first element faults,
     or you need to check PMP via some alternate mechanism.
     Do not be afraid to add something to CPUTLBEntryFull.extra.riscv
     during tlb_fill in order to accelerate this, if needed.

   - mmio on first page: just one element, as the second might fault
     during the transaction.

     It would be possible to enhance riscv_cpu_do_transaction_failed to
     suppress the fault and set a flag noting the fault.  This would allow
     multiple elements to be loaded, at the expense of another check after
     each element within the slow tlb-load path.  I don't know if this is
     desirable, really.  Using vector operations on mmio is usually a
     programming error.  :-)

   - invalid or mmio on second page, continue to the end of the first page.

Once we have the actual load length, handle watchpoints by hand.
See sve_cont_ldst_watchpoints.

Finally, the loop loading the elements, likely in ram via host pointer.


r~


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2024-07-15 21:43 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-10  3:28 [PATCH v2 00/13] Fixes for user-only munmap races Richard Henderson
2024-07-10  3:28 ` [PATCH v2 01/13] accel/tcg: Move {set, clear}_helper_retaddr to cpu_ldst.h Richard Henderson
2024-07-12 12:48   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 02/13] target/arm: Use cpu_env in cpu_untagged_addr Richard Henderson
2024-07-12 12:49   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 03/13] target/arm: Use set/clear_helper_retaddr in helper-a64.c Richard Henderson
2024-07-12 12:53   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 04/13] target/arm: Use set/clear_helper_retaddr in SVE and SME helpers Richard Henderson
2024-07-12 13:00   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 05/13] target/ppc/mem_helper.c: Remove a conditional from dcbz_common() Richard Henderson
2024-07-10  3:28 ` [PATCH v2 06/13] target/ppc: Hoist dcbz_size out of dcbz_common Richard Henderson
2024-07-10 12:11   ` BALATON Zoltan
2024-07-10 14:36     ` Richard Henderson
2024-07-10  3:28 ` [PATCH v2 07/13] target/ppc: Split out helper_dbczl for 970 Richard Henderson
2024-07-10 12:17   ` BALATON Zoltan
2024-07-10  3:28 ` [PATCH v2 08/13] target/ppc: Merge helper_{dcbz,dcbzep} Richard Henderson
2024-07-10 12:20   ` BALATON Zoltan
2024-07-10 14:41     ` Richard Henderson
2024-07-10  3:28 ` [PATCH v2 09/13] target/ppc: Improve helper_dcbz for user-only Richard Henderson
2024-07-10 12:25   ` BALATON Zoltan
2024-07-10 14:42     ` Richard Henderson
2024-07-10  3:28 ` [PATCH v2 10/13] target/s390x: Use user_or_likely in do_access_memset Richard Henderson
2024-07-12 13:02   ` Peter Maydell
2024-07-10  3:28 ` [PATCH v2 11/13] target/s390x: Use user_or_likely in access_memmove Richard Henderson
2024-07-10  3:28 ` [PATCH v2 12/13] target/s390x: Use set/clear_helper_retaddr in mem_helper.c Richard Henderson
2024-07-10  3:28 ` [PATCH v2 13/13] target/riscv: Simplify probing in vext_ldff Richard Henderson
2024-07-10  4:09   ` Alistair Francis
2024-07-15  7:06   ` Max Chou
2024-07-15 21:42     ` Richard Henderson

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).