qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL for-8.1-rc1 0/7] tcg patch queue
@ 2023-07-24  8:53 Richard Henderson
  2023-07-24  8:53 ` [PULL 1/7] tcg/ppc: Fix race in goto_tb implementation Richard Henderson
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

The following changes since commit d1181d29370a4318a9f11ea92065bea6bb159f83:

  Merge tag 'pull-nbd-2023-07-19' of https://repo.or.cz/qemu/ericb into staging (2023-07-20 09:54:07 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230724

for you to fetch changes up to 32b120394c578bc824f1db4835b3bffbeca88fae:

  accel/tcg: Fix type of 'last' for pageflags_{find,next} (2023-07-24 09:48:49 +0100)

----------------------------------------------------------------
accel/tcg: Zero-pad vaddr in tlb debug output
accel/tcg: Fix type of 'last' for pageflags_{find,next}
accel/tcg: Fix sense of read-only probes in ldst_atomicity
accel/tcg: Take mmap_lock in load_atomic*_or_exit
tcg: Add earlyclobber to op_add2 for x86 and s390x
tcg/ppc: Fix race in goto_tb implementation

----------------------------------------------------------------
Anton Johansson (1):
      accel/tcg: Zero-pad vaddr in tlb_debug output

Ilya Leoshkevich (1):
      tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output

Jordan Niethe (1):
      tcg/ppc: Fix race in goto_tb implementation

Luca Bonissi (1):
      accel/tcg: Fix type of 'last' for pageflags_{find,next}

Richard Henderson (3):
      include/exec: Add WITH_MMAP_LOCK_GUARD
      accel/tcg: Fix sense of read-only probes in ldst_atomicity
      accel/tcg: Take mmap_lock in load_atomic*_or_exit

 include/exec/exec-all.h        | 10 ++++++++++
 tcg/i386/tcg-target-con-set.h  |  5 ++++-
 tcg/s390x/tcg-target-con-set.h |  8 +++++---
 accel/tcg/cputlb.c             | 20 ++++++++++----------
 accel/tcg/user-exec.c          |  4 ++--
 bsd-user/mmap.c                |  1 +
 linux-user/mmap.c              |  1 +
 tcg/tcg.c                      |  8 +++++++-
 accel/tcg/ldst_atomicity.c.inc | 32 ++++++++++++++++++--------------
 tcg/i386/tcg-target.c.inc      |  2 +-
 tcg/ppc/tcg-target.c.inc       |  9 +++++----
 tcg/s390x/tcg-target.c.inc     |  4 ++--
 12 files changed, 66 insertions(+), 38 deletions(-)


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PULL 1/7] tcg/ppc: Fix race in goto_tb implementation
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
@ 2023-07-24  8:53 ` Richard Henderson
  2023-07-24  8:53 ` [PULL 2/7] include/exec: Add WITH_MMAP_LOCK_GUARD Richard Henderson
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel
  Cc: peter.maydell, Jordan Niethe, Anushree Mathur, Michael Tokarev,
	Benjamin Gray

From: Jordan Niethe <jniethe5@gmail.com>

Commit 20b6643324 ("tcg/ppc: Reorg goto_tb implementation") modified
goto_tb to ensure only a single instruction was patched to prevent
incorrect behavior if a thread was in the middle of multiple
instructions when they were replaced. However this introduced a race
between loading the jmp target into TCG_REG_TB and patching and
executing the direct branch.

The relevant part of the goto_tb implementation:

    ld TCG_REG_TB, TARGET_ADDR_LOCATION(TCG_REG_TB)
  patch_location:
    mtctr TCG_REG_TB
    bctr

tb_target_set_jmp_target() will replace 'patch_location' with a direct
branch if the target is in range. The direct branch now relies on
TCG_REG_TB being set up correctly by the ld. Prior to this commit
multiple instructions were patched in for the direct branch case; these
instructions would initialize TCG_REG_TB to the same value as the branch
target.

Imagine the following sequence:

1) Thread A is executing the goto_tb sequence and loads the jmp
   target into TCG_REG_TB.

2) Thread B updates the jmp target address and calls
   tb_target_set_jmp_target(). This patches a new direct branch into the
   goto_tb sequence.

3) Thread A executes the newly patched direct branch. The value in
   TCG_REG_TB still contains the old jmp target.

TCG_REG_TB MUST contain the translation block's tc.ptr. Execution will
eventually crash after performing memory accesses generated from a
faulty value in TCG_REG_TB.

This presents as segfaults or illegal instruction exceptions.

Do not revert commit 20b6643324 as it did fix a different race
condition. Instead remove the direct branch optimization and always use
indirect branches.

The direct branch optimization can be re-added later with a race free
sequence.

Fixes: 20b6643324 ("tcg/ppc: Reorg goto_tb implementation")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1726
Reported-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Tested-by: Anushree Mathur <anushree.mathur@linux.vnet.ibm.com>
Tested-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Co-developed-by: Benjamin Gray <bgray@linux.ibm.com>
Signed-off-by: Jordan Niethe <jniethe5@gmail.com>
Signed-off-by: Benjamin Gray <bgray@linux.ibm.com>
Message-Id: <20230717093001.13167-1-jniethe5@gmail.com>
---
 tcg/ppc/tcg-target.c.inc | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/tcg/ppc/tcg-target.c.inc b/tcg/ppc/tcg-target.c.inc
index c866f2c997..511e14b180 100644
--- a/tcg/ppc/tcg-target.c.inc
+++ b/tcg/ppc/tcg-target.c.inc
@@ -2496,11 +2496,10 @@ static void tcg_out_goto_tb(TCGContext *s, int which)
         ptrdiff_t offset = tcg_tbrel_diff(s, (void *)ptr);
         tcg_out_mem_long(s, LD, LDX, TCG_REG_TB, TCG_REG_TB, offset);
     
-        /* Direct branch will be patched by tb_target_set_jmp_target. */
+        /* TODO: Use direct branches when possible. */
         set_jmp_insn_offset(s, which);
         tcg_out32(s, MTSPR | RS(TCG_REG_TB) | CTR);
 
-        /* When branch is out of range, fall through to indirect. */
         tcg_out32(s, BCCTR | BO_ALWAYS);
 
         /* For the unlinked case, need to reset TCG_REG_TB.  */
@@ -2528,10 +2527,12 @@ void tb_target_set_jmp_target(const TranslationBlock *tb, int n,
     intptr_t diff = addr - jmp_rx;
     tcg_insn_unit insn;
 
+    if (USE_REG_TB) {
+        return;
+    }
+
     if (in_range_b(diff)) {
         insn = B | (diff & 0x3fffffc);
-    } else if (USE_REG_TB) {
-        insn = MTSPR | RS(TCG_REG_TB) | CTR;
     } else {
         insn = NOP;
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PULL 2/7] include/exec: Add WITH_MMAP_LOCK_GUARD
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
  2023-07-24  8:53 ` [PULL 1/7] tcg/ppc: Fix race in goto_tb implementation Richard Henderson
@ 2023-07-24  8:53 ` Richard Henderson
  2023-07-24  8:53 ` [PULL 3/7] accel/tcg: Fix sense of read-only probes in ldst_atomicity Richard Henderson
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/exec-all.h | 10 ++++++++++
 bsd-user/mmap.c         |  1 +
 linux-user/mmap.c       |  1 +
 3 files changed, 12 insertions(+)

diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 5fa0687cd2..d02517e95f 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -629,6 +629,15 @@ void TSA_NO_TSA mmap_lock(void);
 void TSA_NO_TSA mmap_unlock(void);
 bool have_mmap_lock(void);
 
+static inline void mmap_unlock_guard(void *unused)
+{
+    mmap_unlock();
+}
+
+#define WITH_MMAP_LOCK_GUARD()                                            \
+    for (int _mmap_lock_iter __attribute__((cleanup(mmap_unlock_guard)))  \
+         = (mmap_lock(), 0); _mmap_lock_iter == 0; _mmap_lock_iter = 1)
+
 /**
  * adjust_signal_pc:
  * @pc: raw pc from the host signal ucontext_t.
@@ -683,6 +692,7 @@ G_NORETURN void cpu_loop_exit_sigbus(CPUState *cpu, target_ulong addr,
 #else
 static inline void mmap_lock(void) {}
 static inline void mmap_unlock(void) {}
+#define WITH_MMAP_LOCK_GUARD()
 
 void tlb_reset_dirty(CPUState *cpu, ram_addr_t start1, ram_addr_t length);
 void tlb_set_dirty(CPUState *cpu, vaddr addr);
diff --git a/bsd-user/mmap.c b/bsd-user/mmap.c
index aca8764356..74ed00b9fe 100644
--- a/bsd-user/mmap.c
+++ b/bsd-user/mmap.c
@@ -32,6 +32,7 @@ void mmap_lock(void)
 
 void mmap_unlock(void)
 {
+    assert(mmap_lock_count > 0);
     if (--mmap_lock_count == 0) {
         pthread_mutex_unlock(&mmap_mutex);
     }
diff --git a/linux-user/mmap.c b/linux-user/mmap.c
index 44b53bd446..a5dfb56545 100644
--- a/linux-user/mmap.c
+++ b/linux-user/mmap.c
@@ -36,6 +36,7 @@ void mmap_lock(void)
 
 void mmap_unlock(void)
 {
+    assert(mmap_lock_count > 0);
     if (--mmap_lock_count == 0) {
         pthread_mutex_unlock(&mmap_mutex);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PULL 3/7] accel/tcg: Fix sense of read-only probes in ldst_atomicity
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
  2023-07-24  8:53 ` [PULL 1/7] tcg/ppc: Fix race in goto_tb implementation Richard Henderson
  2023-07-24  8:53 ` [PULL 2/7] include/exec: Add WITH_MMAP_LOCK_GUARD Richard Henderson
@ 2023-07-24  8:53 ` Richard Henderson
  2023-07-24  8:53 ` [PULL 4/7] accel/tcg: Take mmap_lock in load_atomic*_or_exit Richard Henderson
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

In the initial commit, cdfac37be0d, the sense of the test is incorrect,
as the -1/0 return was confusing.  In bef6f008b981, we mechanically
invert all callers while changing to false/true return, preserving the
incorrectness of the test.

Now that the return sense is sane, it's easy to see that if !write,
then the page is not modifiable (i.e. most likely read-only, with
PROT_NONE handled via SIGSEGV).

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/ldst_atomicity.c.inc | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index 4de0a80492..de70531a7a 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -159,7 +159,7 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
      * another process, because the fallback start_exclusive solution
      * provides no protection across processes.
      */
-    if (page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
+    if (!page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
         uint64_t *p = __builtin_assume_aligned(pv, 8);
         return *p;
     }
@@ -194,7 +194,7 @@ static Int128 load_atomic16_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
      * another process, because the fallback start_exclusive solution
      * provides no protection across processes.
      */
-    if (page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) {
+    if (!page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) {
         return *p;
     }
 #endif
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PULL 4/7] accel/tcg: Take mmap_lock in load_atomic*_or_exit
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
                   ` (2 preceding siblings ...)
  2023-07-24  8:53 ` [PULL 3/7] accel/tcg: Fix sense of read-only probes in ldst_atomicity Richard Henderson
@ 2023-07-24  8:53 ` Richard Henderson
  2023-07-24  8:53 ` [PULL 5/7] tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output Richard Henderson
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell

For user-only, the probe for page writability may race with another
thread's mprotect.  Take the mmap_lock around the operation.  This
is still faster than the start/end_exclusive fallback.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/ldst_atomicity.c.inc | 32 ++++++++++++++++++--------------
 1 file changed, 18 insertions(+), 14 deletions(-)

diff --git a/accel/tcg/ldst_atomicity.c.inc b/accel/tcg/ldst_atomicity.c.inc
index de70531a7a..e5c590a499 100644
--- a/accel/tcg/ldst_atomicity.c.inc
+++ b/accel/tcg/ldst_atomicity.c.inc
@@ -159,9 +159,11 @@ static uint64_t load_atomic8_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
      * another process, because the fallback start_exclusive solution
      * provides no protection across processes.
      */
-    if (!page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
-        uint64_t *p = __builtin_assume_aligned(pv, 8);
-        return *p;
+    WITH_MMAP_LOCK_GUARD() {
+        if (!page_check_range(h2g(pv), 8, PAGE_WRITE_ORG)) {
+            uint64_t *p = __builtin_assume_aligned(pv, 8);
+            return *p;
+        }
     }
 #endif
 
@@ -186,25 +188,27 @@ static Int128 load_atomic16_or_exit(CPUArchState *env, uintptr_t ra, void *pv)
         return atomic16_read_ro(p);
     }
 
-#ifdef CONFIG_USER_ONLY
     /*
      * We can only use cmpxchg to emulate a load if the page is writable.
      * If the page is not writable, then assume the value is immutable
      * and requires no locking.  This ignores the case of MAP_SHARED with
      * another process, because the fallback start_exclusive solution
      * provides no protection across processes.
+     *
+     * In system mode all guest pages are writable.  For user mode,
+     * we must take mmap_lock so that the query remains valid until
+     * the write is complete -- tests/tcg/multiarch/munmap-pthread.c
+     * is an example that can race.
      */
-    if (!page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) {
-        return *p;
-    }
+    WITH_MMAP_LOCK_GUARD() {
+#ifdef CONFIG_USER_ONLY
+        if (!page_check_range(h2g(p), 16, PAGE_WRITE_ORG)) {
+            return *p;
+        }
 #endif
-
-    /*
-     * In system mode all guest pages are writable, and for user-only
-     * we have just checked writability.  Try cmpxchg.
-     */
-    if (HAVE_ATOMIC128_RW) {
-        return atomic16_read_rw(p);
+        if (HAVE_ATOMIC128_RW) {
+            return atomic16_read_rw(p);
+        }
     }
 
     /* Ultimate fallback: re-execute in serial context. */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PULL 5/7] tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
                   ` (3 preceding siblings ...)
  2023-07-24  8:53 ` [PULL 4/7] accel/tcg: Take mmap_lock in load_atomic*_or_exit Richard Henderson
@ 2023-07-24  8:53 ` Richard Henderson
  2023-07-24  8:53 ` [PULL 6/7] accel/tcg: Zero-pad vaddr in tlb_debug output Richard Henderson
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Ilya Leoshkevich, qemu-stable

From: Ilya Leoshkevich <iii@linux.ibm.com>

i386 and s390x implementations of op_add2 require an earlyclobber,
which is currently missing. This breaks VCKSM in s390x guests. E.g., on
x86_64 the following op:

    add2_i32 tmp2,tmp3,tmp2,tmp3,tmp3,tmp2   dead: 0 2 3 4 5  pref=none,0xffff

is translated to:

    addl     %ebx, %r12d
    adcl     %r12d, %ebx

Introduce a new C_N1_O1_I4 constraint, and make sure that earlyclobber
of aliased outputs is honored.

Cc: qemu-stable@nongnu.org
Fixes: 82790a870992 ("tcg: Add markup for output requires new register")
Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230719221310.1968845-7-iii@linux.ibm.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/i386/tcg-target-con-set.h  | 5 ++++-
 tcg/s390x/tcg-target-con-set.h | 8 +++++---
 tcg/tcg.c                      | 8 +++++++-
 tcg/i386/tcg-target.c.inc      | 2 +-
 tcg/s390x/tcg-target.c.inc     | 4 ++--
 5 files changed, 19 insertions(+), 8 deletions(-)

diff --git a/tcg/i386/tcg-target-con-set.h b/tcg/i386/tcg-target-con-set.h
index 91ceb0e1da..5ea3a292f0 100644
--- a/tcg/i386/tcg-target-con-set.h
+++ b/tcg/i386/tcg-target-con-set.h
@@ -11,6 +11,9 @@
  *
  * C_N1_Im(...) defines a constraint set with 1 output and <m> inputs,
  * except that the output must use a new register.
+ *
+ * C_Nn_Om_Ik(...) defines a constraint set with <n + m> outputs and <k>
+ * inputs, except that the first <n> outputs must use new registers.
  */
 C_O0_I1(r)
 C_O0_I2(L, L)
@@ -53,4 +56,4 @@ C_O2_I1(r, r, L)
 C_O2_I2(a, d, a, r)
 C_O2_I2(r, r, L, L)
 C_O2_I3(a, d, 0, 1, r)
-C_O2_I4(r, r, 0, 1, re, re)
+C_N1_O1_I4(r, r, 0, 1, re, re)
diff --git a/tcg/s390x/tcg-target-con-set.h b/tcg/s390x/tcg-target-con-set.h
index cbad91b2b5..9a42037499 100644
--- a/tcg/s390x/tcg-target-con-set.h
+++ b/tcg/s390x/tcg-target-con-set.h
@@ -8,6 +8,9 @@
  * C_On_Im(...) defines a constraint set with <n> outputs and <m> inputs.
  * Each operand should be a sequence of constraint letters as defined by
  * tcg-target-con-str.h; the constraint combination is inclusive or.
+ *
+ * C_Nn_Om_Ik(...) defines a constraint set with <n + m> outputs and <k>
+ * inputs, except that the first <n> outputs must use new registers.
  */
 C_O0_I1(r)
 C_O0_I2(r, r)
@@ -41,6 +44,5 @@ C_O2_I1(o, m, r)
 C_O2_I2(o, m, 0, r)
 C_O2_I2(o, m, r, r)
 C_O2_I3(o, m, 0, 1, r)
-C_O2_I4(r, r, 0, 1, rA, r)
-C_O2_I4(r, r, 0, 1, ri, r)
-C_O2_I4(r, r, 0, 1, r, r)
+C_N1_O1_I4(r, r, 0, 1, ri, r)
+C_N1_O1_I4(r, r, 0, 1, rA, r)
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 652e8ea6b9..ddfe9a96cb 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -648,6 +648,7 @@ static void tcg_out_movext3(TCGContext *s, const TCGMovExtend *i1,
 #define C_O2_I2(O1, O2, I1, I2)         C_PFX4(c_o2_i2_, O1, O2, I1, I2),
 #define C_O2_I3(O1, O2, I1, I2, I3)     C_PFX5(c_o2_i3_, O1, O2, I1, I2, I3),
 #define C_O2_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_o2_i4_, O1, O2, I1, I2, I3, I4),
+#define C_N1_O1_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_n1_o1_i4_, O1, O2, I1, I2, I3, I4),
 
 typedef enum {
 #include "tcg-target-con-set.h"
@@ -668,6 +669,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
 #undef C_O2_I2
 #undef C_O2_I3
 #undef C_O2_I4
+#undef C_N1_O1_I4
 
 /* Put all of the constraint sets into an array, indexed by the enum. */
 
@@ -687,6 +689,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode);
 #define C_O2_I2(O1, O2, I1, I2)         { .args_ct_str = { #O1, #O2, #I1, #I2 } },
 #define C_O2_I3(O1, O2, I1, I2, I3)     { .args_ct_str = { #O1, #O2, #I1, #I2, #I3 } },
 #define C_O2_I4(O1, O2, I1, I2, I3, I4) { .args_ct_str = { #O1, #O2, #I1, #I2, #I3, #I4 } },
+#define C_N1_O1_I4(O1, O2, I1, I2, I3, I4) { .args_ct_str = { "&" #O1, #O2, #I1, #I2, #I3, #I4 } },
 
 static const TCGTargetOpDef constraint_sets[] = {
 #include "tcg-target-con-set.h"
@@ -706,6 +709,7 @@ static const TCGTargetOpDef constraint_sets[] = {
 #undef C_O2_I2
 #undef C_O2_I3
 #undef C_O2_I4
+#undef C_N1_O1_I4
 
 /* Expand the enumerator to be returned from tcg_target_op_def(). */
 
@@ -725,6 +729,7 @@ static const TCGTargetOpDef constraint_sets[] = {
 #define C_O2_I2(O1, O2, I1, I2)         C_PFX4(c_o2_i2_, O1, O2, I1, I2)
 #define C_O2_I3(O1, O2, I1, I2, I3)     C_PFX5(c_o2_i3_, O1, O2, I1, I2, I3)
 #define C_O2_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_o2_i4_, O1, O2, I1, I2, I3, I4)
+#define C_N1_O1_I4(O1, O2, I1, I2, I3, I4) C_PFX6(c_n1_o1_i4_, O1, O2, I1, I2, I3, I4)
 
 #include "tcg-target.c.inc"
 
@@ -4703,7 +4708,8 @@ static void tcg_reg_alloc_op(TCGContext *s, const TCGOp *op)
                  * dead after the instruction, we must allocate a new
                  * register and move it.
                  */
-                if (temp_readonly(ts) || !IS_DEAD_ARG(i)) {
+                if (temp_readonly(ts) || !IS_DEAD_ARG(i)
+                    || def->args_ct[arg_ct->alias_index].newreg) {
                     allocate_new_reg = true;
                 } else if (ts->val_type == TEMP_VAL_REG) {
                     /*
diff --git a/tcg/i386/tcg-target.c.inc b/tcg/i386/tcg-target.c.inc
index ab997b5fb3..77482da070 100644
--- a/tcg/i386/tcg-target.c.inc
+++ b/tcg/i386/tcg-target.c.inc
@@ -3335,7 +3335,7 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
     case INDEX_op_add2_i64:
     case INDEX_op_sub2_i32:
     case INDEX_op_sub2_i64:
-        return C_O2_I4(r, r, 0, 1, re, re);
+        return C_N1_O1_I4(r, r, 0, 1, re, re);
 
     case INDEX_op_ctz_i32:
     case INDEX_op_ctz_i64:
diff --git a/tcg/s390x/tcg-target.c.inc b/tcg/s390x/tcg-target.c.inc
index a878acd8ca..a94f7908d6 100644
--- a/tcg/s390x/tcg-target.c.inc
+++ b/tcg/s390x/tcg-target.c.inc
@@ -3229,11 +3229,11 @@ static TCGConstraintSetIndex tcg_target_op_def(TCGOpcode op)
 
     case INDEX_op_add2_i32:
     case INDEX_op_sub2_i32:
-        return C_O2_I4(r, r, 0, 1, ri, r);
+        return C_N1_O1_I4(r, r, 0, 1, ri, r);
 
     case INDEX_op_add2_i64:
     case INDEX_op_sub2_i64:
-        return C_O2_I4(r, r, 0, 1, rA, r);
+        return C_N1_O1_I4(r, r, 0, 1, rA, r);
 
     case INDEX_op_st_vec:
         return C_O0_I2(v, r);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PULL 6/7] accel/tcg: Zero-pad vaddr in tlb_debug output
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
                   ` (4 preceding siblings ...)
  2023-07-24  8:53 ` [PULL 5/7] tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output Richard Henderson
@ 2023-07-24  8:53 ` Richard Henderson
  2023-07-24  8:53 ` [PULL 7/7] accel/tcg: Fix type of 'last' for pageflags_{find,next} Richard Henderson
  2023-07-24 13:21 ` [PULL for-8.1-rc1 0/7] tcg patch queue Peter Maydell
  7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Anton Johansson

From: Anton Johansson <anjo@rev.ng>

In replacing target_ulong with vaddr and TARGET_FMT_lx with VADDR_PRIx,
the zero-padding of TARGET_FMT_lx got lost.  Readd 16-wide zero-padding
for logging consistency.

Suggested-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Anton Johansson <anjo@rev.ng>
Message-Id: <20230713120746.26897-1-anjo@rev.ng>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/cputlb.c | 20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/accel/tcg/cputlb.c b/accel/tcg/cputlb.c
index e0079c9a9d..ba44501a7c 100644
--- a/accel/tcg/cputlb.c
+++ b/accel/tcg/cputlb.c
@@ -497,8 +497,8 @@ static void tlb_flush_page_locked(CPUArchState *env, int midx, vaddr page)
 
     /* Check if we need to flush due to large pages.  */
     if ((page & lp_mask) == lp_addr) {
-        tlb_debug("forcing full flush midx %d (%"
-                  VADDR_PRIx "/%" VADDR_PRIx ")\n",
+        tlb_debug("forcing full flush midx %d (%016"
+                  VADDR_PRIx "/%016" VADDR_PRIx ")\n",
                   midx, lp_addr, lp_mask);
         tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
     } else {
@@ -527,7 +527,7 @@ static void tlb_flush_page_by_mmuidx_async_0(CPUState *cpu,
 
     assert_cpu_is_self(cpu);
 
-    tlb_debug("page addr: %" VADDR_PRIx " mmu_map:0x%x\n", addr, idxmap);
+    tlb_debug("page addr: %016" VADDR_PRIx " mmu_map:0x%x\n", addr, idxmap);
 
     qemu_spin_lock(&env_tlb(env)->c.lock);
     for (mmu_idx = 0; mmu_idx < NB_MMU_MODES; mmu_idx++) {
@@ -591,7 +591,7 @@ static void tlb_flush_page_by_mmuidx_async_2(CPUState *cpu,
 
 void tlb_flush_page_by_mmuidx(CPUState *cpu, vaddr addr, uint16_t idxmap)
 {
-    tlb_debug("addr: %" VADDR_PRIx " mmu_idx:%" PRIx16 "\n", addr, idxmap);
+    tlb_debug("addr: %016" VADDR_PRIx " mmu_idx:%" PRIx16 "\n", addr, idxmap);
 
     /* This should already be page aligned */
     addr &= TARGET_PAGE_MASK;
@@ -625,7 +625,7 @@ void tlb_flush_page(CPUState *cpu, vaddr addr)
 void tlb_flush_page_by_mmuidx_all_cpus(CPUState *src_cpu, vaddr addr,
                                        uint16_t idxmap)
 {
-    tlb_debug("addr: %" VADDR_PRIx " mmu_idx:%"PRIx16"\n", addr, idxmap);
+    tlb_debug("addr: %016" VADDR_PRIx " mmu_idx:%"PRIx16"\n", addr, idxmap);
 
     /* This should already be page aligned */
     addr &= TARGET_PAGE_MASK;
@@ -666,7 +666,7 @@ void tlb_flush_page_by_mmuidx_all_cpus_synced(CPUState *src_cpu,
                                               vaddr addr,
                                               uint16_t idxmap)
 {
-    tlb_debug("addr: %" VADDR_PRIx " mmu_idx:%"PRIx16"\n", addr, idxmap);
+    tlb_debug("addr: %016" VADDR_PRIx " mmu_idx:%"PRIx16"\n", addr, idxmap);
 
     /* This should already be page aligned */
     addr &= TARGET_PAGE_MASK;
@@ -728,7 +728,7 @@ static void tlb_flush_range_locked(CPUArchState *env, int midx,
      */
     if (mask < f->mask || len > f->mask) {
         tlb_debug("forcing full flush midx %d ("
-                  "%" VADDR_PRIx "/%" VADDR_PRIx "+%" VADDR_PRIx ")\n",
+                  "%016" VADDR_PRIx "/%016" VADDR_PRIx "+%016" VADDR_PRIx ")\n",
                   midx, addr, mask, len);
         tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
         return;
@@ -741,7 +741,7 @@ static void tlb_flush_range_locked(CPUArchState *env, int midx,
      */
     if (((addr + len - 1) & d->large_page_mask) == d->large_page_addr) {
         tlb_debug("forcing full flush midx %d ("
-                  "%" VADDR_PRIx "/%" VADDR_PRIx ")\n",
+                  "%016" VADDR_PRIx "/%016" VADDR_PRIx ")\n",
                   midx, d->large_page_addr, d->large_page_mask);
         tlb_flush_one_mmuidx_locked(env, midx, get_clock_realtime());
         return;
@@ -773,7 +773,7 @@ static void tlb_flush_range_by_mmuidx_async_0(CPUState *cpu,
 
     assert_cpu_is_self(cpu);
 
-    tlb_debug("range: %" VADDR_PRIx "/%u+%" VADDR_PRIx " mmu_map:0x%x\n",
+    tlb_debug("range: %016" VADDR_PRIx "/%u+%016" VADDR_PRIx " mmu_map:0x%x\n",
               d.addr, d.bits, d.len, d.idxmap);
 
     qemu_spin_lock(&env_tlb(env)->c.lock);
@@ -1165,7 +1165,7 @@ void tlb_set_page_full(CPUState *cpu, int mmu_idx,
                                                 &xlat, &sz, full->attrs, &prot);
     assert(sz >= TARGET_PAGE_SIZE);
 
-    tlb_debug("vaddr=%" VADDR_PRIx " paddr=0x" HWADDR_FMT_plx
+    tlb_debug("vaddr=%016" VADDR_PRIx " paddr=0x" HWADDR_FMT_plx
               " prot=%x idx=%d\n",
               addr, full->phys_addr, prot, mmu_idx);
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PULL 7/7] accel/tcg: Fix type of 'last' for pageflags_{find,next}
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
                   ` (5 preceding siblings ...)
  2023-07-24  8:53 ` [PULL 6/7] accel/tcg: Zero-pad vaddr in tlb_debug output Richard Henderson
@ 2023-07-24  8:53 ` Richard Henderson
  2023-07-24 13:21 ` [PULL for-8.1-rc1 0/7] tcg patch queue Peter Maydell
  7 siblings, 0 replies; 9+ messages in thread
From: Richard Henderson @ 2023-07-24  8:53 UTC (permalink / raw)
  To: qemu-devel; +Cc: peter.maydell, Luca Bonissi

From: Luca Bonissi <qemu@bonslack.org>

These should match 'start' as target_ulong, not target_long.

On 32bit targets, the parameter was sign-extended to uint64_t,
so only the first mmap within the upper 2GB memory can succeed.

Signed-off-by: Luca Bonissi <qemu@bonslack.org>
Message-Id: <327460e2-0ebd-9edb-426b-1df80d16c32a@bonslack.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 accel/tcg/user-exec.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/user-exec.c b/accel/tcg/user-exec.c
index ac38c2bf96..ab48cb41e4 100644
--- a/accel/tcg/user-exec.c
+++ b/accel/tcg/user-exec.c
@@ -144,7 +144,7 @@ typedef struct PageFlagsNode {
 
 static IntervalTreeRoot pageflags_root;
 
-static PageFlagsNode *pageflags_find(target_ulong start, target_long last)
+static PageFlagsNode *pageflags_find(target_ulong start, target_ulong last)
 {
     IntervalTreeNode *n;
 
@@ -153,7 +153,7 @@ static PageFlagsNode *pageflags_find(target_ulong start, target_long last)
 }
 
 static PageFlagsNode *pageflags_next(PageFlagsNode *p, target_ulong start,
-                                     target_long last)
+                                     target_ulong last)
 {
     IntervalTreeNode *n;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PULL for-8.1-rc1 0/7] tcg patch queue
  2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
                   ` (6 preceding siblings ...)
  2023-07-24  8:53 ` [PULL 7/7] accel/tcg: Fix type of 'last' for pageflags_{find,next} Richard Henderson
@ 2023-07-24 13:21 ` Peter Maydell
  7 siblings, 0 replies; 9+ messages in thread
From: Peter Maydell @ 2023-07-24 13:21 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, 24 Jul 2023 at 09:53, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> The following changes since commit d1181d29370a4318a9f11ea92065bea6bb159f83:
>
>   Merge tag 'pull-nbd-2023-07-19' of https://repo.or.cz/qemu/ericb into staging (2023-07-20 09:54:07 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230724
>
> for you to fetch changes up to 32b120394c578bc824f1db4835b3bffbeca88fae:
>
>   accel/tcg: Fix type of 'last' for pageflags_{find,next} (2023-07-24 09:48:49 +0100)
>
> ----------------------------------------------------------------
> accel/tcg: Zero-pad vaddr in tlb debug output
> accel/tcg: Fix type of 'last' for pageflags_{find,next}
> accel/tcg: Fix sense of read-only probes in ldst_atomicity
> accel/tcg: Take mmap_lock in load_atomic*_or_exit
> tcg: Add earlyclobber to op_add2 for x86 and s390x
> tcg/ppc: Fix race in goto_tb implementation
>

Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/8.1
for any user-visible changes.

-- PMM


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-07-24 13:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-24  8:53 [PULL for-8.1-rc1 0/7] tcg patch queue Richard Henderson
2023-07-24  8:53 ` [PULL 1/7] tcg/ppc: Fix race in goto_tb implementation Richard Henderson
2023-07-24  8:53 ` [PULL 2/7] include/exec: Add WITH_MMAP_LOCK_GUARD Richard Henderson
2023-07-24  8:53 ` [PULL 3/7] accel/tcg: Fix sense of read-only probes in ldst_atomicity Richard Henderson
2023-07-24  8:53 ` [PULL 4/7] accel/tcg: Take mmap_lock in load_atomic*_or_exit Richard Henderson
2023-07-24  8:53 ` [PULL 5/7] tcg/{i386, s390x}: Add earlyclobber to the op_add2's first output Richard Henderson
2023-07-24  8:53 ` [PULL 6/7] accel/tcg: Zero-pad vaddr in tlb_debug output Richard Henderson
2023-07-24  8:53 ` [PULL 7/7] accel/tcg: Fix type of 'last' for pageflags_{find,next} Richard Henderson
2023-07-24 13:21 ` [PULL for-8.1-rc1 0/7] tcg patch queue Peter Maydell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).