qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [PULL 00/35] misc patch queue
@ 2024-04-08 17:48 Richard Henderson
  2024-04-08 17:48 ` [PULL 01/35] tcg/optimize: Do not attempt to constant fold neg_vec Richard Henderson
                   ` (35 more replies)
  0 siblings, 36 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:48 UTC (permalink / raw)
  To: qemu-devel

This started out to be tcg and linux-user only, but then added
a few target bug fixes, and the trolled back through my inbox
and picked up some other safe patch sets that got lost.


r~


The following changes since commit ce64e6224affb8b4e4b019f76d2950270b391af5:

  Merge tag 'qemu-sparc-20240404' of https://github.com/mcayland/qemu into staging (2024-04-04 15:28:06 +0100)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-misc-20240408

for you to fetch changes up to 50dbeda88ab71f9d426b7f4b126c79c44860e475:

  util/bufferiszero: Simplify test_buffer_is_zero_next_accel (2024-04-08 06:27:58 -1000)

----------------------------------------------------------------
util/bufferiszero: Optimizations and cleanups, esp code removal
target/m68k: Semihosting for non-coldfire cpus
target/m68k: Fix fp accrued exception reporting
target/hppa: Fix IIAOQ, IIASQ for pa2.0
target/sh4: Fixes to mac.l and mac.w saturation
target/sh4: Fixes to illegal delay slot reporting
linux-user: Cleanups for do_setsockopt
linux-user: Add FITRIM ioctl
linux-user: Fix waitid return of siginfo_t and rusage
tcg/optimize: Do not attempt to constant fold neg_vec
accel/tcg: Improve can_do_io management, mmio bug fix

----------------------------------------------------------------
Alexander Monakov (5):
      util/bufferiszero: Remove SSE4.1 variant
      util/bufferiszero: Remove AVX512 variant
      util/bufferiszero: Reorganize for early test for acceleration
      util/bufferiszero: Remove useless prefetches
      util/bufferiszero: Optimize SSE2 and AVX2 variants

Keith Packard (3):
      target/m68k: Map FPU exceptions to FPSR register
      target/m68k: Pass semihosting arg to exit
      target/m68k: Support semihosting on non-ColdFire targets

Michael Tokarev (4):
      linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY
      linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline target_to_host_ip_mreq()
      linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used
      linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO

Michael Vogt (1):
      linux-user: Add FITRIM ioctl

Nguyen Dinh Phi (1):
      linux-user: replace calloc() with g_new0()

Richard Henderson (17):
      tcg/optimize: Do not attempt to constant fold neg_vec
      linux-user: Fix waitid return of siginfo_t and rusage
      target/hppa: Fix IIAOQ, IIASQ for pa2.0
      target/sh4: Merge mach and macl into a union
      target/m68k: Perform the semihosting test during translate
      tcg: Add TCGContext.emit_before_op
      accel/tcg: Add insn_start to DisasContextBase
      target/arm: Use insn_start from DisasContextBase
      target/hppa: Use insn_start from DisasContextBase
      target/i386: Preserve DisasContextBase.insn_start across rewind
      target/microblaze: Use insn_start from DisasContextBase
      target/riscv: Use insn_start from DisasContextBase
      target/s390x: Use insn_start from DisasContextBase
      accel/tcg: Improve can_do_io management
      util/bufferiszero: Improve scalar variant
      util/bufferiszero: Introduce biz_accel_fn typedef
      util/bufferiszero: Simplify test_buffer_is_zero_next_accel

Zack Buhman (4):
      target/sh4: mac.w: memory accesses are 16-bit words
      target/sh4: Fix mac.l with saturation enabled
      target/sh4: Fix mac.w with saturation enabled
      target/sh4: add missing CHECK_NOT_DELAY_SLOT

 include/exec/translator.h         |   4 +-
 include/qemu/cutils.h             |  32 +++-
 include/tcg/tcg.h                 |   6 +
 linux-user/ioctls.h               |   3 +
 linux-user/syscall_defs.h         |   1 +
 linux-user/syscall_types.h        |   5 +
 target/arm/tcg/translate.h        |  12 +-
 target/m68k/cpu.h                 |   5 +-
 target/m68k/helper.h              |   2 +
 target/sh4/cpu.h                  |  14 +-
 target/sh4/helper.h               |   4 +-
 accel/tcg/translator.c            |  47 ++---
 linux-user/main.c                 |   6 +-
 linux-user/syscall.c              |  95 +++++-----
 target/arm/tcg/translate-a64.c    |   2 +-
 target/arm/tcg/translate.c        |   2 +-
 target/hppa/int_helper.c          |  20 +-
 target/hppa/sys_helper.c          |  18 +-
 target/hppa/translate.c           |  10 +-
 target/i386/tcg/translate.c       |   3 +
 target/m68k/cpu.c                 |  12 +-
 target/m68k/fpu_helper.c          |  72 ++++++++
 target/m68k/helper.c              |   4 +-
 target/m68k/m68k-semi.c           |   4 +-
 target/m68k/op_helper.c           |  14 +-
 target/m68k/translate.c           |  54 +++++-
 target/microblaze/translate.c     |   8 +-
 target/riscv/translate.c          |  11 +-
 target/s390x/tcg/translate.c      |   4 +-
 target/sh4/op_helper.c            |  51 ++---
 target/sh4/translate.c            |   7 +-
 tcg/optimize.c                    |  17 +-
 tcg/tcg.c                         |  14 +-
 tests/tcg/aarch64/test-2150.c     |  12 ++
 tests/tcg/sh4/test-macl.c         |  67 +++++++
 tests/tcg/sh4/test-macw.c         |  61 ++++++
 util/bufferiszero.c               | 379 +++++++++++++++++---------------------
 tests/tcg/aarch64/Makefile.target |   2 +-
 tests/tcg/sh4/Makefile.target     |   8 +
 39 files changed, 696 insertions(+), 396 deletions(-)
 create mode 100644 tests/tcg/aarch64/test-2150.c
 create mode 100644 tests/tcg/sh4/test-macl.c
 create mode 100644 tests/tcg/sh4/test-macw.c


^ permalink raw reply	[flat|nested] 38+ messages in thread

* [PULL 01/35] tcg/optimize: Do not attempt to constant fold neg_vec
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
@ 2024-04-08 17:48 ` Richard Henderson
  2024-04-08 17:48 ` [PULL 02/35] linux-user: Fix waitid return of siginfo_t and rusage Richard Henderson
                   ` (34 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:48 UTC (permalink / raw)
  To: qemu-devel

Split out the tail of fold_neg to fold_neg_no_const so that we
can avoid attempting to constant fold vector negate.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2150
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 tcg/optimize.c                    | 17 ++++++++---------
 tests/tcg/aarch64/test-2150.c     | 12 ++++++++++++
 tests/tcg/aarch64/Makefile.target |  2 +-
 3 files changed, 21 insertions(+), 10 deletions(-)
 create mode 100644 tests/tcg/aarch64/test-2150.c

diff --git a/tcg/optimize.c b/tcg/optimize.c
index 275db77b42..2e9e5725a9 100644
--- a/tcg/optimize.c
+++ b/tcg/optimize.c
@@ -1990,16 +1990,10 @@ static bool fold_nand(OptContext *ctx, TCGOp *op)
     return false;
 }
 
-static bool fold_neg(OptContext *ctx, TCGOp *op)
+static bool fold_neg_no_const(OptContext *ctx, TCGOp *op)
 {
-    uint64_t z_mask;
-
-    if (fold_const1(ctx, op)) {
-        return true;
-    }
-
     /* Set to 1 all bits to the left of the rightmost.  */
-    z_mask = arg_info(op->args[1])->z_mask;
+    uint64_t z_mask = arg_info(op->args[1])->z_mask;
     ctx->z_mask = -(z_mask & -z_mask);
 
     /*
@@ -2010,6 +2004,11 @@ static bool fold_neg(OptContext *ctx, TCGOp *op)
     return true;
 }
 
+static bool fold_neg(OptContext *ctx, TCGOp *op)
+{
+    return fold_const1(ctx, op) || fold_neg_no_const(ctx, op);
+}
+
 static bool fold_nor(OptContext *ctx, TCGOp *op)
 {
     if (fold_const2_commutative(ctx, op) ||
@@ -2418,7 +2417,7 @@ static bool fold_sub_to_neg(OptContext *ctx, TCGOp *op)
     if (have_neg) {
         op->opc = neg_op;
         op->args[1] = op->args[2];
-        return fold_neg(ctx, op);
+        return fold_neg_no_const(ctx, op);
     }
     return false;
 }
diff --git a/tests/tcg/aarch64/test-2150.c b/tests/tcg/aarch64/test-2150.c
new file mode 100644
index 0000000000..fb86c11958
--- /dev/null
+++ b/tests/tcg/aarch64/test-2150.c
@@ -0,0 +1,12 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+/* See https://gitlab.com/qemu-project/qemu/-/issues/2150 */
+
+int main()
+{
+    asm volatile(
+        "movi     v6.4s, #1\n"
+        "movi     v7.4s, #0\n"
+        "sub      v6.2d, v7.2d, v6.2d\n"
+        : : : "v6", "v7");
+    return 0;
+}
diff --git a/tests/tcg/aarch64/Makefile.target b/tests/tcg/aarch64/Makefile.target
index 0efd565f05..70d728ae9a 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -10,7 +10,7 @@ VPATH 		+= $(AARCH64_SRC)
 
 # Base architecture tests
 AARCH64_TESTS=fcvt pcalign-a64 lse2-fault
-AARCH64_TESTS += test-2248
+AARCH64_TESTS += test-2248 test-2150
 
 fcvt: LDFLAGS+=-lm
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 02/35] linux-user: Fix waitid return of siginfo_t and rusage
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
  2024-04-08 17:48 ` [PULL 01/35] tcg/optimize: Do not attempt to constant fold neg_vec Richard Henderson
@ 2024-04-08 17:48 ` Richard Henderson
  2024-04-08 17:48 ` [PULL 03/35] linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY Richard Henderson
                   ` (33 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:48 UTC (permalink / raw)
  To: qemu-devel

The copy back to siginfo_t should be conditional only on arg3,
not the specific values that might have been written.
The copy back to rusage was missing entirely.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2262
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/syscall.c | 22 ++++++++++++++++------
 1 file changed, 16 insertions(+), 6 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index e12d969c2e..3df2b94d9a 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -9272,14 +9272,24 @@ static abi_long do_syscall1(CPUArchState *cpu_env, int num, abi_long arg1,
 #ifdef TARGET_NR_waitid
     case TARGET_NR_waitid:
         {
+            struct rusage ru;
             siginfo_t info;
-            info.si_pid = 0;
-            ret = get_errno(safe_waitid(arg1, arg2, &info, arg4, NULL));
-            if (!is_error(ret) && arg3 && info.si_pid != 0) {
-                if (!(p = lock_user(VERIFY_WRITE, arg3, sizeof(target_siginfo_t), 0)))
+
+            ret = get_errno(safe_waitid(arg1, arg2, (arg3 ? &info : NULL),
+                                        arg4, (arg5 ? &ru : NULL)));
+            if (!is_error(ret)) {
+                if (arg3) {
+                    p = lock_user(VERIFY_WRITE, arg3,
+                                  sizeof(target_siginfo_t), 0);
+                    if (!p) {
+                        return -TARGET_EFAULT;
+                    }
+                    host_to_target_siginfo(p, &info);
+                    unlock_user(p, arg3, sizeof(target_siginfo_t));
+                }
+                if (arg5 && host_to_target_rusage(arg5, &ru)) {
                     return -TARGET_EFAULT;
-                host_to_target_siginfo(p, &info);
-                unlock_user(p, arg3, sizeof(target_siginfo_t));
+                }
             }
         }
         return ret;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 03/35] linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
  2024-04-08 17:48 ` [PULL 01/35] tcg/optimize: Do not attempt to constant fold neg_vec Richard Henderson
  2024-04-08 17:48 ` [PULL 02/35] linux-user: Fix waitid return of siginfo_t and rusage Richard Henderson
@ 2024-04-08 17:48 ` Richard Henderson
  2024-04-08 17:48 ` [PULL 04/35] linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline target_to_host_ip_mreq() Richard Henderson
                   ` (32 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael Tokarev

From: Michael Tokarev <mjt@tls.msk.ru>

This setsockopt accepts zero-lengh optlen (current qemu implementation
does not allow this).  Also, there's no need to make a copy of the key,
it is enough to use lock_user() (which accepts zero length already).

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2197
Fixes: f31dddd2fc "linux-user: Add support for setsockopt() option SOL_ALG"
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20240331100737.2724186-2-mjt@tls.msk.ru>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/syscall.c | 9 ++-------
 1 file changed, 2 insertions(+), 7 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 3df2b94d9a..59fb3e911f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2277,18 +2277,13 @@ static abi_long do_setsockopt(int sockfd, int level, int optname,
         switch (optname) {
         case ALG_SET_KEY:
         {
-            char *alg_key = g_malloc(optlen);
-
+            char *alg_key = lock_user(VERIFY_READ, optval_addr, optlen, 1);
             if (!alg_key) {
-                return -TARGET_ENOMEM;
-            }
-            if (copy_from_user(alg_key, optval_addr, optlen)) {
-                g_free(alg_key);
                 return -TARGET_EFAULT;
             }
             ret = get_errno(setsockopt(sockfd, level, optname,
                                        alg_key, optlen));
-            g_free(alg_key);
+            unlock_user(alg_key, optval_addr, optlen);
             break;
         }
         case ALG_SET_AEAD_AUTHSIZE:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 04/35] linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline target_to_host_ip_mreq()
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (2 preceding siblings ...)
  2024-04-08 17:48 ` [PULL 03/35] linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY Richard Henderson
@ 2024-04-08 17:48 ` Richard Henderson
  2024-04-08 17:48 ` [PULL 05/35] linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used Richard Henderson
                   ` (31 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael Tokarev

From: Michael Tokarev <mjt@tls.msk.ru>

ip_mreq is declared at the beginning of do_setsockopt(), while
it is used in only one place.  Move its declaration to that very
place and replace pointer to alloca()-allocated memory with the
structure itself.

target_to_host_ip_mreq() is used only once, inline it.

This change also properly handles TARGET_EFAULT when the address
is wrong.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20240331100737.2724186-3-mjt@tls.msk.ru>
[rth: Fix braces, adjust optlen to match host structure size]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/syscall.c | 47 ++++++++++++++++++++++----------------------
 1 file changed, 23 insertions(+), 24 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 59fb3e911f..cca9cafe4f 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -1615,24 +1615,6 @@ static abi_long do_pipe(CPUArchState *cpu_env, abi_ulong pipedes,
     return get_errno(ret);
 }
 
-static inline abi_long target_to_host_ip_mreq(struct ip_mreqn *mreqn,
-                                              abi_ulong target_addr,
-                                              socklen_t len)
-{
-    struct target_ip_mreqn *target_smreqn;
-
-    target_smreqn = lock_user(VERIFY_READ, target_addr, len, 1);
-    if (!target_smreqn)
-        return -TARGET_EFAULT;
-    mreqn->imr_multiaddr.s_addr = target_smreqn->imr_multiaddr.s_addr;
-    mreqn->imr_address.s_addr = target_smreqn->imr_address.s_addr;
-    if (len == sizeof(struct target_ip_mreqn))
-        mreqn->imr_ifindex = tswapal(target_smreqn->imr_ifindex);
-    unlock_user(target_smreqn, target_addr, 0);
-
-    return 0;
-}
-
 static inline abi_long target_to_host_sockaddr(int fd, struct sockaddr *addr,
                                                abi_ulong target_addr,
                                                socklen_t len)
@@ -2067,7 +2049,6 @@ static abi_long do_setsockopt(int sockfd, int level, int optname,
 {
     abi_long ret;
     int val;
-    struct ip_mreqn *ip_mreq;
     struct ip_mreq_source *ip_mreq_source;
 
     switch(level) {
@@ -2111,15 +2092,33 @@ static abi_long do_setsockopt(int sockfd, int level, int optname,
             break;
         case IP_ADD_MEMBERSHIP:
         case IP_DROP_MEMBERSHIP:
+        {
+            struct ip_mreqn ip_mreq;
+            struct target_ip_mreqn *target_smreqn;
+
+            QEMU_BUILD_BUG_ON(sizeof(struct ip_mreq) !=
+                              sizeof(struct target_ip_mreq));
+
             if (optlen < sizeof (struct target_ip_mreq) ||
-                optlen > sizeof (struct target_ip_mreqn))
+                optlen > sizeof (struct target_ip_mreqn)) {
                 return -TARGET_EINVAL;
+            }
 
-            ip_mreq = (struct ip_mreqn *) alloca(optlen);
-            target_to_host_ip_mreq(ip_mreq, optval_addr, optlen);
-            ret = get_errno(setsockopt(sockfd, level, optname, ip_mreq, optlen));
+            target_smreqn = lock_user(VERIFY_READ, optval_addr, optlen, 1);
+            if (!target_smreqn) {
+                return -TARGET_EFAULT;
+            }
+            ip_mreq.imr_multiaddr.s_addr = target_smreqn->imr_multiaddr.s_addr;
+            ip_mreq.imr_address.s_addr = target_smreqn->imr_address.s_addr;
+            if (optlen == sizeof(struct target_ip_mreqn)) {
+                ip_mreq.imr_ifindex = tswapal(target_smreqn->imr_ifindex);
+                optlen = sizeof(struct ip_mreqn);
+            }
+            unlock_user(target_smreqn, optval_addr, 0);
+
+            ret = get_errno(setsockopt(sockfd, level, optname, &ip_mreq, optlen));
             break;
-
+        }
         case IP_BLOCK_SOURCE:
         case IP_UNBLOCK_SOURCE:
         case IP_ADD_SOURCE_MEMBERSHIP:
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 05/35] linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (3 preceding siblings ...)
  2024-04-08 17:48 ` [PULL 04/35] linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline target_to_host_ip_mreq() Richard Henderson
@ 2024-04-08 17:48 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 06/35] linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO Richard Henderson
                   ` (30 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:48 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael Tokarev

From: Michael Tokarev <mjt@tls.msk.ru>

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20240331100737.2724186-4-mjt@tls.msk.ru>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/syscall.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index cca9cafe4f..1fedf16650 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2049,7 +2049,6 @@ static abi_long do_setsockopt(int sockfd, int level, int optname,
 {
     abi_long ret;
     int val;
-    struct ip_mreq_source *ip_mreq_source;
 
     switch(level) {
     case SOL_TCP:
@@ -2123,6 +2122,9 @@ static abi_long do_setsockopt(int sockfd, int level, int optname,
         case IP_UNBLOCK_SOURCE:
         case IP_ADD_SOURCE_MEMBERSHIP:
         case IP_DROP_SOURCE_MEMBERSHIP:
+        {
+            struct ip_mreq_source *ip_mreq_source;
+
             if (optlen != sizeof (struct target_ip_mreq_source))
                 return -TARGET_EINVAL;
 
@@ -2133,7 +2135,7 @@ static abi_long do_setsockopt(int sockfd, int level, int optname,
             ret = get_errno(setsockopt(sockfd, level, optname, ip_mreq_source, optlen));
             unlock_user (ip_mreq_source, optval_addr, 0);
             break;
-
+        }
         default:
             goto unimplemented;
         }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 06/35] linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (4 preceding siblings ...)
  2024-04-08 17:48 ` [PULL 05/35] linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 07/35] linux-user: Add FITRIM ioctl Richard Henderson
                   ` (29 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael Tokarev, Philippe Mathieu-Daudé

From: Michael Tokarev <mjt@tls.msk.ru>

There's identical code for SO_SNDTIMEO and SO_RCVTIMEO, currently
implemented using an ugly goto into another switch case.  Eliminate
that using arithmetic if, making code flow more natural.

Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
Message-Id: <20240331100737.2724186-5-mjt@tls.msk.ru>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/syscall.c | 11 ++++-------
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 1fedf16650..41659b63f5 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2301,12 +2301,10 @@ static abi_long do_setsockopt(int sockfd, int level, int optname,
     case TARGET_SOL_SOCKET:
         switch (optname) {
         case TARGET_SO_RCVTIMEO:
+        case TARGET_SO_SNDTIMEO:
         {
                 struct timeval tv;
 
-                optname = SO_RCVTIMEO;
-
-set_timeout:
                 if (optlen != sizeof(struct target_timeval)) {
                     return -TARGET_EINVAL;
                 }
@@ -2315,13 +2313,12 @@ set_timeout:
                     return -TARGET_EFAULT;
                 }
 
-                ret = get_errno(setsockopt(sockfd, SOL_SOCKET, optname,
+                ret = get_errno(setsockopt(sockfd, SOL_SOCKET,
+                                optname == TARGET_SO_RCVTIMEO ?
+                                    SO_RCVTIMEO : SO_SNDTIMEO,
                                 &tv, sizeof(tv)));
                 return ret;
         }
-        case TARGET_SO_SNDTIMEO:
-                optname = SO_SNDTIMEO;
-                goto set_timeout;
         case TARGET_SO_ATTACH_FILTER:
         {
                 struct target_sock_fprog *tfprog;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 07/35] linux-user: Add FITRIM ioctl
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (5 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 06/35] linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 08/35] linux-user: replace calloc() with g_new0() Richard Henderson
                   ` (28 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Michael Vogt

From: Michael Vogt <mvogt@redhat.com>

Tiny patch to add the missing FITRIM ioctl.

Signed-off-by: Michael Vogt <mvogt@redhat.com>
Message-Id: <20240403092048.16023-2-michael.vogt@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/ioctls.h        | 3 +++
 linux-user/syscall_defs.h  | 1 +
 linux-user/syscall_types.h | 5 +++++
 3 files changed, 9 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 1aec9d5836..d508d0c04a 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -140,6 +140,9 @@
 #ifdef FITHAW
      IOCTL(FITHAW, IOC_W | IOC_R, TYPE_INT)
 #endif
+#ifdef FITRIM
+     IOCTL(FITRIM, IOC_W | IOC_R, MK_PTR(MK_STRUCT(STRUCT_fstrim_range)))
+#endif
 
      IOCTL(FIGETBSZ, IOC_R, MK_PTR(TYPE_LONG))
 #ifdef CONFIG_FIEMAP
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 744fda599e..ce0adb706e 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -945,6 +945,7 @@ struct target_rtc_pll_info {
 
 #define TARGET_FIFREEZE    TARGET_IOWR('X', 119, abi_int)
 #define TARGET_FITHAW    TARGET_IOWR('X', 120, abi_int)
+#define TARGET_FITRIM    TARGET_IOWR('X', 121, struct fstrim_range)
 
 /*
  * Note that the ioctl numbers for FS_IOC_<GET|SET><FLAGS|VERSION>
diff --git a/linux-user/syscall_types.h b/linux-user/syscall_types.h
index c3b43f8022..6dd7a80ce5 100644
--- a/linux-user/syscall_types.h
+++ b/linux-user/syscall_types.h
@@ -341,6 +341,11 @@ STRUCT(file_clone_range,
        TYPE_ULONGLONG, /* src_length */
        TYPE_ULONGLONG) /* dest_offset */
 
+STRUCT(fstrim_range,
+       TYPE_ULONGLONG, /* start */
+       TYPE_ULONGLONG, /* len */
+       TYPE_ULONGLONG) /* minlen */
+
 STRUCT(fiemap_extent,
        TYPE_ULONGLONG, /* fe_logical */
        TYPE_ULONGLONG, /* fe_physical */
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 08/35] linux-user: replace calloc() with g_new0()
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (6 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 07/35] linux-user: Add FITRIM ioctl Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 09/35] target/hppa: Fix IIAOQ, IIASQ for pa2.0 Richard Henderson
                   ` (27 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Nguyen Dinh Phi, Alex Bennée

From: Nguyen Dinh Phi <phind.uet@gmail.com>

Use glib allocation as recommended by the coding convention

Signed-off-by: Nguyen Dinh Phi <phind.uet@gmail.com>
Message-Id: <20240317171747.1642207-1-phind.uet@gmail.com>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 linux-user/main.c | 6 +-----
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 9277df2e9d..149e35432e 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -928,11 +928,7 @@ int main(int argc, char **argv, char **envp)
      * Prepare copy of argv vector for target.
      */
     target_argc = argc - optind;
-    target_argv = calloc(target_argc + 1, sizeof (char *));
-    if (target_argv == NULL) {
-        (void) fprintf(stderr, "Unable to allocate memory for target_argv\n");
-        exit(EXIT_FAILURE);
-    }
+    target_argv = g_new0(char *, target_argc + 1);
 
     /*
      * If argv0 is specified (using '-0' switch) we replace
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 09/35] target/hppa: Fix IIAOQ, IIASQ for pa2.0
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (7 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 08/35] linux-user: replace calloc() with g_new0() Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 10/35] target/sh4: mac.w: memory accesses are 16-bit words Richard Henderson
                   ` (26 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Sven Schnelle, Helge Deller

The contents of IIAOQ depend on PSW_W.
Follow the text in "Interruption Instruction Address Queues",
pages 2-13 through 2-15.

Tested-by: Sven Schnelle <svens@stackframe.org>
Tested-by: Helge Deller <deller@gmx.de>
Reported-by: Sven Schnelle <svens@stackframe.org>
Fixes: b10700d826c ("target/hppa: Update IIAOQ, IIASQ for pa2.0")
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/hppa/int_helper.c | 20 +++++++++++---------
 target/hppa/sys_helper.c | 18 +++++++++---------
 2 files changed, 20 insertions(+), 18 deletions(-)

diff --git a/target/hppa/int_helper.c b/target/hppa/int_helper.c
index 90437a92cd..a667ee380d 100644
--- a/target/hppa/int_helper.c
+++ b/target/hppa/int_helper.c
@@ -107,14 +107,10 @@ void hppa_cpu_do_interrupt(CPUState *cs)
 
     /* step 3 */
     /*
-     * For pa1.x, IIASQ is simply a copy of IASQ.
-     * For pa2.0, IIASQ is the top bits of the virtual address,
-     *            or zero if translation is disabled.
+     * IIASQ is the top bits of the virtual address, or zero if translation
+     * is disabled -- with PSW_W == 0, this will reduce to the space.
      */
-    if (!hppa_is_pa20(env)) {
-        env->cr[CR_IIASQ] = env->iasq_f >> 32;
-        env->cr_back[0] = env->iasq_b >> 32;
-    } else if (old_psw & PSW_C) {
+    if (old_psw & PSW_C) {
         env->cr[CR_IIASQ] =
             hppa_form_gva_psw(old_psw, env->iasq_f, env->iaoq_f) >> 32;
         env->cr_back[0] =
@@ -123,8 +119,14 @@ void hppa_cpu_do_interrupt(CPUState *cs)
         env->cr[CR_IIASQ] = 0;
         env->cr_back[0] = 0;
     }
-    env->cr[CR_IIAOQ] = env->iaoq_f;
-    env->cr_back[1] = env->iaoq_b;
+    /* IIAOQ is the full offset for wide mode, or 32 bits for narrow mode. */
+    if (old_psw & PSW_W) {
+        env->cr[CR_IIAOQ] = env->iaoq_f;
+        env->cr_back[1] = env->iaoq_b;
+    } else {
+        env->cr[CR_IIAOQ] = (uint32_t)env->iaoq_f;
+        env->cr_back[1] = (uint32_t)env->iaoq_b;
+    }
 
     if (old_psw & PSW_Q) {
         /* step 5 */
diff --git a/target/hppa/sys_helper.c b/target/hppa/sys_helper.c
index 208e51c086..22d6c89964 100644
--- a/target/hppa/sys_helper.c
+++ b/target/hppa/sys_helper.c
@@ -78,21 +78,21 @@ target_ulong HELPER(swap_system_mask)(CPUHPPAState *env, target_ulong nsm)
 
 void HELPER(rfi)(CPUHPPAState *env)
 {
-    env->iasq_f = (uint64_t)env->cr[CR_IIASQ] << 32;
-    env->iasq_b = (uint64_t)env->cr_back[0] << 32;
-    env->iaoq_f = env->cr[CR_IIAOQ];
-    env->iaoq_b = env->cr_back[1];
+    uint64_t mask;
+
+    cpu_hppa_put_psw(env, env->cr[CR_IPSW]);
 
     /*
      * For pa2.0, IIASQ is the top bits of the virtual address.
      * To recreate the space identifier, remove the offset bits.
+     * For pa1.x, the mask reduces to no change to space.
      */
-    if (hppa_is_pa20(env)) {
-        env->iasq_f &= ~env->iaoq_f;
-        env->iasq_b &= ~env->iaoq_b;
-    }
+    mask = gva_offset_mask(env->psw);
 
-    cpu_hppa_put_psw(env, env->cr[CR_IPSW]);
+    env->iaoq_f = env->cr[CR_IIAOQ];
+    env->iaoq_b = env->cr_back[1];
+    env->iasq_f = (env->cr[CR_IIASQ] << 32) & ~(env->iaoq_f & mask);
+    env->iasq_b = (env->cr_back[0] << 32) & ~(env->iaoq_b & mask);
 }
 
 static void getshadowregs(CPUHPPAState *env)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 10/35] target/sh4: mac.w: memory accesses are 16-bit words
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (8 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 09/35] target/hppa: Fix IIAOQ, IIASQ for pa2.0 Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 11/35] target/sh4: Merge mach and macl into a union Richard Henderson
                   ` (25 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zack Buhman, Yoshinori Sato, Philippe Mathieu-Daudé

From: Zack Buhman <zack@buhman.org>

Before this change, executing a code sequence such as:

           mova   tblm,r0
           mov    r0,r1
           mova   tbln,r0
           clrs
           clrmac
           mac.w  @r0+,@r1+
           mac.w  @r0+,@r1+

           .align 4
  tblm:    .word  0x1234
           .word  0x5678
  tbln:    .word  0x9abc
           .word  0xdefg

Does not result in correct behavior:

Expected behavior:
  first macw : macl = 0x1234 * 0x9abc + 0x0
               mach = 0x0

  second macw: macl = 0x5678 * 0xdefg + 0xb00a630
               mach = 0x0

Observed behavior (qemu-sh4eb, prior to this commit):

  first macw : macl = 0x5678 * 0xdefg + 0x0
               mach = 0x0

  second macw: (unaligned longword memory access, SIGBUS)

Various SH-4 ISA manuals also confirm that `mac.w` is a 16-bit word memory
access, not a 32-bit longword memory access.

Signed-off-by: Zack Buhman <zack@buhman.org>
Reviewed-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-Id: <20240402093756.27466-1-zack@buhman.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sh4/translate.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index a9b1bc7524..6643c14dde 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -816,10 +816,10 @@ static void _decode_opc(DisasContext * ctx)
             TCGv arg0, arg1;
             arg0 = tcg_temp_new();
             tcg_gen_qemu_ld_i32(arg0, REG(B7_4), ctx->memidx,
-                                MO_TESL | MO_ALIGN);
+                                MO_TESW | MO_ALIGN);
             arg1 = tcg_temp_new();
             tcg_gen_qemu_ld_i32(arg1, REG(B11_8), ctx->memidx,
-                                MO_TESL | MO_ALIGN);
+                                MO_TESW | MO_ALIGN);
             gen_helper_macw(tcg_env, arg0, arg1);
             tcg_gen_addi_i32(REG(B11_8), REG(B11_8), 2);
             tcg_gen_addi_i32(REG(B7_4), REG(B7_4), 2);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 11/35] target/sh4: Merge mach and macl into a union
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (9 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 10/35] target/sh4: mac.w: memory accesses are 16-bit words Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 12/35] target/sh4: Fix mac.l with saturation enabled Richard Henderson
                   ` (24 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

Allow host access to the entire 64-bit accumulator.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/sh4/cpu.h | 14 ++++++++++++--
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/sh4/cpu.h b/target/sh4/cpu.h
index 9211da6bde..d928bcf006 100644
--- a/target/sh4/cpu.h
+++ b/target/sh4/cpu.h
@@ -155,12 +155,22 @@ typedef struct CPUArchState {
     uint32_t pc;                /* program counter */
     uint32_t delayed_pc;        /* target of delayed branch */
     uint32_t delayed_cond;      /* condition of delayed branch */
-    uint32_t mach;              /* multiply and accumulate high */
-    uint32_t macl;              /* multiply and accumulate low */
     uint32_t pr;                /* procedure register */
     uint32_t fpscr;             /* floating point status/control register */
     uint32_t fpul;              /* floating point communication register */
 
+    /* multiply and accumulate: high, low and combined. */
+    union {
+        uint64_t mac;
+        struct {
+#if HOST_BIG_ENDIAN
+            uint32_t mach, macl;
+#else
+            uint32_t macl, mach;
+#endif
+        };
+    };
+
     /* float point status register */
     float_status fp_status;
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 12/35] target/sh4: Fix mac.l with saturation enabled
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (10 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 11/35] target/sh4: Merge mach and macl into a union Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 13/35] target/sh4: Fix mac.w " Richard Henderson
                   ` (23 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zack Buhman, Philippe Mathieu-Daudé

From: Zack Buhman <zack@buhman.org>

The saturation arithmetic logic in helper_macl is not correct.
I tested and verified this behavior on a SH7091.

Signed-off-by: Zack Buhman <zack@buhman.org>
Message-Id: <20240404162641.27528-2-zack@buhman.org>
[rth: Reformat helper_macl, add a test case.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 target/sh4/helper.h           |  2 +-
 target/sh4/op_helper.c        | 23 ++++++------
 tests/tcg/sh4/test-macl.c     | 67 +++++++++++++++++++++++++++++++++++
 tests/tcg/sh4/Makefile.target |  5 +++
 4 files changed, 86 insertions(+), 11 deletions(-)
 create mode 100644 tests/tcg/sh4/test-macl.c

diff --git a/target/sh4/helper.h b/target/sh4/helper.h
index 8d792f6b55..64056e4a39 100644
--- a/target/sh4/helper.h
+++ b/target/sh4/helper.h
@@ -11,7 +11,7 @@ DEF_HELPER_3(movcal, void, env, i32, i32)
 DEF_HELPER_1(discard_movcal_backup, void, env)
 DEF_HELPER_2(ocbi, void, env, i32)
 
-DEF_HELPER_3(macl, void, env, i32, i32)
+DEF_HELPER_3(macl, void, env, s32, s32)
 DEF_HELPER_3(macw, void, env, i32, i32)
 
 DEF_HELPER_2(ld_fpscr, void, env, i32)
diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c
index 4559d0d376..d0bae0cc00 100644
--- a/target/sh4/op_helper.c
+++ b/target/sh4/op_helper.c
@@ -158,20 +158,23 @@ void helper_ocbi(CPUSH4State *env, uint32_t address)
     }
 }
 
-void helper_macl(CPUSH4State *env, uint32_t arg0, uint32_t arg1)
+void helper_macl(CPUSH4State *env, int32_t arg0, int32_t arg1)
 {
+    const int64_t min = -(1ll << 47);
+    const int64_t max = (1ll << 47) - 1;
+    int64_t mul = (int64_t)arg0 * arg1;
+    int64_t mac = env->mac;
     int64_t res;
 
-    res = ((uint64_t) env->mach << 32) | env->macl;
-    res += (int64_t) (int32_t) arg0 *(int64_t) (int32_t) arg1;
-    env->mach = (res >> 32) & 0xffffffff;
-    env->macl = res & 0xffffffff;
-    if (env->sr & (1u << SR_S)) {
-        if (res < 0)
-            env->mach |= 0xffff0000;
-        else
-            env->mach &= 0x00007fff;
+    if (!(env->sr & (1u << SR_S))) {
+        res = mac + mul;
+    } else if (sadd64_overflow(mac, mul, &res)) {
+        res = mac < 0 ? min : max;
+    } else {
+        res = MIN(MAX(res, min), max);
     }
+
+    env->mac = res;
 }
 
 void helper_macw(CPUSH4State *env, uint32_t arg0, uint32_t arg1)
diff --git a/tests/tcg/sh4/test-macl.c b/tests/tcg/sh4/test-macl.c
new file mode 100644
index 0000000000..b66c854365
--- /dev/null
+++ b/tests/tcg/sh4/test-macl.c
@@ -0,0 +1,67 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+#define MACL_S_MIN  (-(1ll << 47))
+#define MACL_S_MAX  ((1ll << 47) - 1)
+
+int64_t mac_l(int64_t mac, const int32_t *a, const int32_t *b)
+{
+    register uint32_t macl __asm__("macl") = mac;
+    register uint32_t mach __asm__("mach") = mac >> 32;
+
+    asm volatile("mac.l @%0+,@%1+"
+                 : "+r"(a), "+r"(b), "+x"(macl), "+x"(mach));
+
+    return ((uint64_t)mach << 32) | macl;
+}
+
+typedef struct {
+    int64_t mac;
+    int32_t a, b;
+    int64_t res[2];
+} Test;
+
+__attribute__((noinline))
+void test(const Test *t, int sat)
+{
+    int64_t res;
+
+    if (sat) {
+        asm volatile("sets");
+    } else {
+        asm volatile("clrs");
+    }
+    res = mac_l(t->mac, &t->a, &t->b);
+
+    if (res != t->res[sat]) {
+        fprintf(stderr, "%#llx + (%#x * %#x) = %#llx -- got %#llx\n",
+                t->mac, t->a, t->b, t->res[sat], res);
+        abort();
+    }
+}
+
+int main()
+{
+    static const Test tests[] = {
+        { 0x00007fff12345678ll, INT32_MAX, INT32_MAX,
+          { 0x40007ffe12345679ll, MACL_S_MAX } },
+        { MACL_S_MIN, -1, 1,
+          { 0xffff7fffffffffffll, MACL_S_MIN } },
+        { INT64_MIN, -1, 1,
+          { INT64_MAX, MACL_S_MIN } },
+        { 0x00007fff00000000ll, INT32_MAX, INT32_MAX,
+          { 0x40007ffe00000001ll, MACL_S_MAX } },
+        { 4, 1, 2, { 6, 6 } },
+        { -4, -1, -2, { -2, -2 } },
+    };
+
+    for (int i = 0; i < sizeof(tests) / sizeof(tests[0]); ++i) {
+        for (int j = 0; j < 2; ++j) {
+            test(&tests[i], j);
+        }
+    }
+    return 0;
+}
diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target
index 16eaa850a8..9a11c10924 100644
--- a/tests/tcg/sh4/Makefile.target
+++ b/tests/tcg/sh4/Makefile.target
@@ -9,3 +9,8 @@ run-signals: signals
 	$(call skip-test, $<, "BROKEN")
 run-plugin-signals-with-%:
 	$(call skip-test, $<, "BROKEN")
+
+VPATH += $(SRC_PATH)/tests/tcg/sh4
+
+test-macl: CFLAGS += -O -g
+TESTS += test-macl
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 13/35] target/sh4: Fix mac.w with saturation enabled
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (11 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 12/35] target/sh4: Fix mac.l with saturation enabled Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 14/35] target/sh4: add missing CHECK_NOT_DELAY_SLOT Richard Henderson
                   ` (22 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zack Buhman, Yoshinori Sato, Philippe Mathieu-Daudé

From: Zack Buhman <zack@buhman.org>

The saturation arithmetic logic in helper_macw is not correct.
I tested and verified this behavior on a SH7091.

Reviewd-by: Yoshinori Sato <ysato@users.sourceforge.jp>
Signed-off-by: Zack Buhman <zack@buhman.org>
Message-Id: <20240405233802.29128-3-zack@buhman.org>
[rth: Reformat helper_macw, add a test case.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
---
 target/sh4/helper.h           |  2 +-
 target/sh4/op_helper.c        | 28 +++++++++-------
 tests/tcg/sh4/test-macw.c     | 61 +++++++++++++++++++++++++++++++++++
 tests/tcg/sh4/Makefile.target |  3 ++
 4 files changed, 82 insertions(+), 12 deletions(-)
 create mode 100644 tests/tcg/sh4/test-macw.c

diff --git a/target/sh4/helper.h b/target/sh4/helper.h
index 64056e4a39..29011d3dbb 100644
--- a/target/sh4/helper.h
+++ b/target/sh4/helper.h
@@ -12,7 +12,7 @@ DEF_HELPER_1(discard_movcal_backup, void, env)
 DEF_HELPER_2(ocbi, void, env, i32)
 
 DEF_HELPER_3(macl, void, env, s32, s32)
-DEF_HELPER_3(macw, void, env, i32, i32)
+DEF_HELPER_3(macw, void, env, s32, s32)
 
 DEF_HELPER_2(ld_fpscr, void, env, i32)
 
diff --git a/target/sh4/op_helper.c b/target/sh4/op_helper.c
index d0bae0cc00..99394b714c 100644
--- a/target/sh4/op_helper.c
+++ b/target/sh4/op_helper.c
@@ -177,22 +177,28 @@ void helper_macl(CPUSH4State *env, int32_t arg0, int32_t arg1)
     env->mac = res;
 }
 
-void helper_macw(CPUSH4State *env, uint32_t arg0, uint32_t arg1)
+void helper_macw(CPUSH4State *env, int32_t arg0, int32_t arg1)
 {
-    int64_t res;
+    /* Inputs are already sign-extended from 16 bits. */
+    int32_t mul = arg0 * arg1;
 
-    res = ((uint64_t) env->mach << 32) | env->macl;
-    res += (int64_t) (int16_t) arg0 *(int64_t) (int16_t) arg1;
-    env->mach = (res >> 32) & 0xffffffff;
-    env->macl = res & 0xffffffff;
     if (env->sr & (1u << SR_S)) {
-        if (res < -0x80000000) {
+        /*
+         * In saturation arithmetic mode, the accumulator is 32-bit
+         * with carry. MACH is not considered during the addition
+         * operation nor the 32-bit saturation logic.
+         */
+        int32_t res, macl = env->macl;
+
+        if (sadd32_overflow(macl, mul, &res)) {
+            res = macl < 0 ? INT32_MIN : INT32_MAX;
+            /* If overflow occurs, the MACH register is set to 1. */
             env->mach = 1;
-            env->macl = 0x80000000;
-        } else if (res > 0x000000007fffffff) {
-            env->mach = 1;
-            env->macl = 0x7fffffff;
         }
+        env->macl = res;
+    } else {
+        /* In non-saturation arithmetic mode, the accumulator is 64-bit */
+        env->mac += mul;
     }
 }
 
diff --git a/tests/tcg/sh4/test-macw.c b/tests/tcg/sh4/test-macw.c
new file mode 100644
index 0000000000..4eceec8634
--- /dev/null
+++ b/tests/tcg/sh4/test-macw.c
@@ -0,0 +1,61 @@
+/* SPDX-License-Identifier: GPL-2.0-or-later */
+
+#include <stdint.h>
+#include <stdlib.h>
+#include <stdio.h>
+
+int64_t mac_w(int64_t mac, const int16_t *a, const int16_t *b)
+{
+    register uint32_t macl __asm__("macl") = mac;
+    register uint32_t mach __asm__("mach") = mac >> 32;
+
+    asm volatile("mac.w @%0+,@%1+"
+                 : "+r"(a), "+r"(b), "+x"(macl), "+x"(mach));
+
+    return ((uint64_t)mach << 32) | macl;
+}
+
+typedef struct {
+    int64_t mac;
+    int16_t a, b;
+    int64_t res[2];
+} Test;
+
+__attribute__((noinline))
+void test(const Test *t, int sat)
+{
+    int64_t res;
+
+    if (sat) {
+        asm volatile("sets");
+    } else {
+        asm volatile("clrs");
+    }
+    res = mac_w(t->mac, &t->a, &t->b);
+
+    if (res != t->res[sat]) {
+        fprintf(stderr, "%#llx + (%#x * %#x) = %#llx -- got %#llx\n",
+                t->mac, t->a, t->b, t->res[sat], res);
+        abort();
+    }
+}
+
+int main()
+{
+    static const Test tests[] = {
+        { 0, 2, 3, { 6, 6 } },
+        { 0x123456787ffffffell, 2, -3,
+          { 0x123456787ffffff8ll, 0x123456787ffffff8ll } },
+        { 0xabcdef127ffffffall, 2, 3,
+          { 0xabcdef1280000000ll, 0x000000017fffffffll } },
+        { 0xfffffffffll, INT16_MAX, INT16_MAX,
+          { 0x103fff0000ll, 0xf3fff0000ll } },
+    };
+
+    for (int i = 0; i < sizeof(tests) / sizeof(tests[0]); ++i) {
+        for (int j = 0; j < 2; ++j) {
+            test(&tests[i], j);
+        }
+    }
+    return 0;
+}
diff --git a/tests/tcg/sh4/Makefile.target b/tests/tcg/sh4/Makefile.target
index 9a11c10924..4d09291c0c 100644
--- a/tests/tcg/sh4/Makefile.target
+++ b/tests/tcg/sh4/Makefile.target
@@ -14,3 +14,6 @@ VPATH += $(SRC_PATH)/tests/tcg/sh4
 
 test-macl: CFLAGS += -O -g
 TESTS += test-macl
+
+test-macw: CFLAGS += -O -g
+TESTS += test-macw
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 14/35] target/sh4: add missing CHECK_NOT_DELAY_SLOT
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (12 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 13/35] target/sh4: Fix mac.w " Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 15/35] target/m68k: Map FPU exceptions to FPSR register Richard Henderson
                   ` (21 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Zack Buhman, Yoshinori Sato

From: Zack Buhman <zack@buhman.org>

CHECK_NOT_DELAY_SLOT is correctly applied to the branch-related
instructions, but not to the PC-relative mov* instructions.

I verified the existence of an illegal slot exception on a SH7091 when
any of these instructions are attempted inside a delay slot.

This also matches the behavior described in the SH-4 ISA manual.

Signed-off-by: Zack Buhman <zack@buhman.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240407150705.5965-1-zack@buhman.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewd-by: Yoshinori Sato <ysato@users.sourceforge.jp>
---
 target/sh4/translate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/sh4/translate.c b/target/sh4/translate.c
index 6643c14dde..ebb6c901bf 100644
--- a/target/sh4/translate.c
+++ b/target/sh4/translate.c
@@ -523,6 +523,7 @@ static void _decode_opc(DisasContext * ctx)
         tcg_gen_movi_i32(REG(B11_8), B7_0s);
         return;
     case 0x9000: /* mov.w @(disp,PC),Rn */
+        CHECK_NOT_DELAY_SLOT
         {
             TCGv addr = tcg_constant_i32(ctx->base.pc_next + 4 + B7_0 * 2);
             tcg_gen_qemu_ld_i32(REG(B11_8), addr, ctx->memidx,
@@ -530,6 +531,7 @@ static void _decode_opc(DisasContext * ctx)
         }
         return;
     case 0xd000: /* mov.l @(disp,PC),Rn */
+        CHECK_NOT_DELAY_SLOT
         {
             TCGv addr = tcg_constant_i32((ctx->base.pc_next + 4 + B7_0 * 4) & ~3);
             tcg_gen_qemu_ld_i32(REG(B11_8), addr, ctx->memidx,
@@ -1236,6 +1238,7 @@ static void _decode_opc(DisasContext * ctx)
         }
         return;
     case 0xc700: /* mova @(disp,PC),R0 */
+        CHECK_NOT_DELAY_SLOT
         tcg_gen_movi_i32(REG(0), ((ctx->base.pc_next & 0xfffffffc) +
                                   4 + B7_0 * 4) & ~3);
         return;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 15/35] target/m68k: Map FPU exceptions to FPSR register
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (13 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 14/35] target/sh4: add missing CHECK_NOT_DELAY_SLOT Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 16/35] target/m68k: Pass semihosting arg to exit Richard Henderson
                   ` (20 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Keith Packard

From: Keith Packard <keithp@keithp.com>

Add helpers for reading/writing the 68881 FPSR register so that
changes in floating point exception state can be seen by the
application.

Call these helpers in pre_load/post_load hooks to synchronize
exception state.

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20230803035231.429697-1-keithp@keithp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/m68k/cpu.h        |  3 +-
 target/m68k/helper.h     |  2 ++
 target/m68k/cpu.c        | 12 +++++--
 target/m68k/fpu_helper.c | 72 ++++++++++++++++++++++++++++++++++++++++
 target/m68k/helper.c     |  4 +--
 target/m68k/translate.c  |  4 +--
 6 files changed, 90 insertions(+), 7 deletions(-)

diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h
index 346427e144..e184239a81 100644
--- a/target/m68k/cpu.h
+++ b/target/m68k/cpu.h
@@ -199,7 +199,8 @@ void cpu_m68k_set_ccr(CPUM68KState *env, uint32_t);
 void cpu_m68k_set_sr(CPUM68KState *env, uint32_t);
 void cpu_m68k_restore_fp_status(CPUM68KState *env);
 void cpu_m68k_set_fpcr(CPUM68KState *env, uint32_t val);
-
+uint32_t cpu_m68k_get_fpsr(CPUM68KState *env);
+void cpu_m68k_set_fpsr(CPUM68KState *env, uint32_t val);
 
 /*
  * Instead of computing the condition codes after each m68k instruction,
diff --git a/target/m68k/helper.h b/target/m68k/helper.h
index 2bbe0dc032..95aa5e53bb 100644
--- a/target/m68k/helper.h
+++ b/target/m68k/helper.h
@@ -54,6 +54,8 @@ DEF_HELPER_4(fsdiv, void, env, fp, fp, fp)
 DEF_HELPER_4(fddiv, void, env, fp, fp, fp)
 DEF_HELPER_4(fsgldiv, void, env, fp, fp, fp)
 DEF_HELPER_FLAGS_3(fcmp, TCG_CALL_NO_RWG, void, env, fp, fp)
+DEF_HELPER_2(set_fpsr, void, env, i32)
+DEF_HELPER_1(get_fpsr, i32, env)
 DEF_HELPER_FLAGS_2(set_fpcr, TCG_CALL_NO_RWG, void, env, i32)
 DEF_HELPER_FLAGS_2(ftst, TCG_CALL_NO_RWG, void, env, fp)
 DEF_HELPER_3(fconst, void, env, fp, i32)
diff --git a/target/m68k/cpu.c b/target/m68k/cpu.c
index 7c8efbb42c..df49ff1880 100644
--- a/target/m68k/cpu.c
+++ b/target/m68k/cpu.c
@@ -390,12 +390,19 @@ static const VMStateDescription vmstate_freg = {
     }
 };
 
+static int fpu_pre_save(void *opaque)
+{
+    M68kCPU *s = opaque;
+
+    s->env.fpsr = cpu_m68k_get_fpsr(&s->env);
+    return 0;
+}
+
 static int fpu_post_load(void *opaque, int version)
 {
     M68kCPU *s = opaque;
 
-    cpu_m68k_restore_fp_status(&s->env);
-
+    cpu_m68k_set_fpsr(&s->env, s->env.fpsr);
     return 0;
 }
 
@@ -404,6 +411,7 @@ const VMStateDescription vmmstate_fpu = {
     .version_id = 1,
     .minimum_version_id = 1,
     .needed = fpu_needed,
+    .pre_save = fpu_pre_save,
     .post_load = fpu_post_load,
     .fields = (const VMStateField[]) {
         VMSTATE_UINT32(env.fpcr, M68kCPU),
diff --git a/target/m68k/fpu_helper.c b/target/m68k/fpu_helper.c
index ab120b5f59..8314791f50 100644
--- a/target/m68k/fpu_helper.c
+++ b/target/m68k/fpu_helper.c
@@ -164,6 +164,78 @@ void HELPER(set_fpcr)(CPUM68KState *env, uint32_t val)
     cpu_m68k_set_fpcr(env, val);
 }
 
+/* Convert host exception flags to cpu_m68k form.  */
+static int cpu_m68k_exceptbits_from_host(int host_bits)
+{
+    int target_bits = 0;
+
+    if (host_bits & float_flag_invalid) {
+        target_bits |= 0x80;
+    }
+    if (host_bits & float_flag_overflow) {
+        target_bits |= 0x40;
+    }
+    if (host_bits & (float_flag_underflow | float_flag_output_denormal)) {
+        target_bits |= 0x20;
+    }
+    if (host_bits & float_flag_divbyzero) {
+        target_bits |= 0x10;
+    }
+    if (host_bits & float_flag_inexact) {
+        target_bits |= 0x08;
+    }
+    return target_bits;
+}
+
+/* Convert cpu_m68k exception flags to target form.  */
+static int cpu_m68k_exceptbits_to_host(int target_bits)
+{
+    int host_bits = 0;
+
+    if (target_bits & 0x80) {
+        host_bits |= float_flag_invalid;
+    }
+    if (target_bits & 0x40) {
+        host_bits |= float_flag_overflow;
+    }
+    if (target_bits & 0x20) {
+        host_bits |= float_flag_underflow;
+    }
+    if (target_bits & 0x10) {
+        host_bits |= float_flag_divbyzero;
+    }
+    if (target_bits & 0x08) {
+        host_bits |= float_flag_inexact;
+    }
+    return host_bits;
+}
+
+uint32_t cpu_m68k_get_fpsr(CPUM68KState *env)
+{
+    int host_flags = get_float_exception_flags(&env->fp_status);
+    int target_flags = cpu_m68k_exceptbits_from_host(host_flags);
+    int except = (env->fpsr & ~(0xf8)) | target_flags;
+    return except;
+}
+
+uint32_t HELPER(get_fpsr)(CPUM68KState *env)
+{
+    return cpu_m68k_get_fpsr(env);
+}
+
+void cpu_m68k_set_fpsr(CPUM68KState *env, uint32_t val)
+{
+    env->fpsr = val;
+
+    int host_flags = cpu_m68k_exceptbits_to_host((int) env->fpsr);
+    set_float_exception_flags(host_flags, &env->fp_status);
+}
+
+void HELPER(set_fpsr)(CPUM68KState *env, uint32_t val)
+{
+    cpu_m68k_set_fpsr(env, val);
+}
+
 #define PREC_BEGIN(prec)                                        \
     do {                                                        \
         FloatX80RoundPrec old =                                 \
diff --git a/target/m68k/helper.c b/target/m68k/helper.c
index 1a475f082a..7a91f33b17 100644
--- a/target/m68k/helper.c
+++ b/target/m68k/helper.c
@@ -87,7 +87,7 @@ static int m68k_fpu_gdb_get_reg(CPUState *cs, GByteArray *mem_buf, int n)
     case 8: /* fpcontrol */
         return gdb_get_reg32(mem_buf, env->fpcr);
     case 9: /* fpstatus */
-        return gdb_get_reg32(mem_buf, env->fpsr);
+        return gdb_get_reg32(mem_buf, cpu_m68k_get_fpsr(env));
     case 10: /* fpiar, not implemented */
         return gdb_get_reg32(mem_buf, 0);
     }
@@ -109,7 +109,7 @@ static int m68k_fpu_gdb_set_reg(CPUState *cs, uint8_t *mem_buf, int n)
         cpu_m68k_set_fpcr(env, ldl_p(mem_buf));
         return 4;
     case 9: /* fpstatus */
-        env->fpsr = ldl_p(mem_buf);
+        cpu_m68k_set_fpsr(env, ldl_p(mem_buf));
         return 4;
     case 10: /* fpiar, not implemented */
         return 4;
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 6ae3df43bc..8a194f2f21 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -4686,7 +4686,7 @@ static void gen_load_fcr(DisasContext *s, TCGv res, int reg)
         tcg_gen_movi_i32(res, 0);
         break;
     case M68K_FPSR:
-        tcg_gen_ld_i32(res, tcg_env, offsetof(CPUM68KState, fpsr));
+        gen_helper_get_fpsr(res, tcg_env);
         break;
     case M68K_FPCR:
         tcg_gen_ld_i32(res, tcg_env, offsetof(CPUM68KState, fpcr));
@@ -4700,7 +4700,7 @@ static void gen_store_fcr(DisasContext *s, TCGv val, int reg)
     case M68K_FPIAR:
         break;
     case M68K_FPSR:
-        tcg_gen_st_i32(val, tcg_env, offsetof(CPUM68KState, fpsr));
+        gen_helper_set_fpsr(tcg_env, val);
         break;
     case M68K_FPCR:
         gen_helper_set_fpcr(tcg_env, val);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 16/35] target/m68k: Pass semihosting arg to exit
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (14 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 15/35] target/m68k: Map FPU exceptions to FPSR register Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 17/35] target/m68k: Perform the semihosting test during translate Richard Henderson
                   ` (19 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Keith Packard, Peter Maydell

From: Keith Packard <keithp@keithp.com>

Instead of using d0 (the semihost function number), use d1 (the
provide exit status).

Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20230802161914.395443-2-keithp@keithp.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/m68k/m68k-semi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/m68k/m68k-semi.c b/target/m68k/m68k-semi.c
index 546cff2246..6fbbd140f3 100644
--- a/target/m68k/m68k-semi.c
+++ b/target/m68k/m68k-semi.c
@@ -132,8 +132,8 @@ void do_m68k_semihosting(CPUM68KState *env, int nr)
     args = env->dregs[1];
     switch (nr) {
     case HOSTED_EXIT:
-        gdb_exit(env->dregs[0]);
-        exit(env->dregs[0]);
+        gdb_exit(env->dregs[1]);
+        exit(env->dregs[1]);
 
     case HOSTED_OPEN:
         GET_ARG(0);
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 17/35] target/m68k: Perform the semihosting test during translate
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (15 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 16/35] target/m68k: Pass semihosting arg to exit Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 18/35] target/m68k: Support semihosting on non-ColdFire targets Richard Henderson
                   ` (18 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel

Replace EXCP_HALT_INSN by EXCP_SEMIHOSTING.  Perform the pre-
and post-insn tests during translate, leaving only the actual
semihosting operation for the exception.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/m68k/cpu.h       |  2 +-
 target/m68k/op_helper.c | 14 ++-----------
 target/m68k/translate.c | 45 +++++++++++++++++++++++++++++++++++++----
 3 files changed, 44 insertions(+), 17 deletions(-)

diff --git a/target/m68k/cpu.h b/target/m68k/cpu.h
index e184239a81..b5bbeedb7a 100644
--- a/target/m68k/cpu.h
+++ b/target/m68k/cpu.h
@@ -66,7 +66,7 @@
 #define EXCP_MMU_ACCESS     58  /* MMU Access Level Violation Error */
 
 #define EXCP_RTE            0x100
-#define EXCP_HALT_INSN      0x101
+#define EXCP_SEMIHOSTING    0x101
 
 #define M68K_DTTR0   0
 #define M68K_DTTR1   1
diff --git a/target/m68k/op_helper.c b/target/m68k/op_helper.c
index 125f6c1b08..15bad5dd46 100644
--- a/target/m68k/op_helper.c
+++ b/target/m68k/op_helper.c
@@ -202,18 +202,8 @@ static void cf_interrupt_all(CPUM68KState *env, int is_hw)
             /* Return from an exception.  */
             cf_rte(env);
             return;
-        case EXCP_HALT_INSN:
-            if (semihosting_enabled((env->sr & SR_S) == 0)
-                    && (env->pc & 3) == 0
-                    && cpu_lduw_code(env, env->pc - 4) == 0x4e71
-                    && cpu_ldl_code(env, env->pc) == 0x4e7bf000) {
-                env->pc += 4;
-                do_m68k_semihosting(env, env->dregs[0]);
-                return;
-            }
-            cs->halted = 1;
-            cs->exception_index = EXCP_HLT;
-            cpu_loop_exit(cs);
+        case EXCP_SEMIHOSTING:
+            do_m68k_semihosting(env, env->dregs[0]);
             return;
         }
     }
diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 8a194f2f21..8f61ff1238 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -26,12 +26,11 @@
 #include "qemu/log.h"
 #include "qemu/qemu-print.h"
 #include "exec/translator.h"
-
 #include "exec/helper-proto.h"
 #include "exec/helper-gen.h"
-
 #include "exec/log.h"
 #include "fpu/softfloat.h"
+#include "semihosting/semihost.h"
 
 #define HELPER_H "helper.h"
 #include "exec/helper-info.c.inc"
@@ -1401,6 +1400,40 @@ static void gen_jmp_tb(DisasContext *s, int n, target_ulong dest,
     s->base.is_jmp = DISAS_NORETURN;
 }
 
+#ifndef CONFIG_USER_ONLY
+static bool semihosting_test(DisasContext *s)
+{
+    uint32_t test;
+
+    if (!semihosting_enabled(IS_USER(s))) {
+        return false;
+    }
+
+    /*
+     * "The semihosting instruction is immediately preceded by a
+     * nop aligned to a 4-byte boundary..."
+     * The preceding 2-byte (aligned) nop plus the 2-byte halt/bkpt
+     * means that we have advanced 4 bytes from the required nop.
+     */
+    if (s->pc % 4 != 0) {
+        return false;
+    }
+    test = cpu_lduw_code(s->env, s->pc - 4);
+    if (test != 0x4e71) {
+        return false;
+    }
+    /* "... and followed by an invalid sentinel instruction movec %sp,0." */
+    test = translator_ldl(s->env, &s->base, s->pc);
+    if (test != 0x4e7bf000) {
+        return false;
+    }
+
+    /* Consume the sentinel. */
+    s->pc += 4;
+    return true;
+}
+#endif /* !CONFIG_USER_ONLY */
+
 DISAS_INSN(scc)
 {
     DisasCompare c;
@@ -4465,8 +4498,12 @@ DISAS_INSN(halt)
         gen_exception(s, s->base.pc_next, EXCP_PRIVILEGE);
         return;
     }
-
-    gen_exception(s, s->pc, EXCP_HALT_INSN);
+    if (semihosting_test(s)) {
+        gen_exception(s, s->pc, EXCP_SEMIHOSTING);
+        return;
+    }
+    tcg_gen_movi_i32(cpu_halted, 1);
+    gen_exception(s, s->pc, EXCP_HLT);
 }
 
 DISAS_INSN(stop)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 18/35] target/m68k: Support semihosting on non-ColdFire targets
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (16 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 17/35] target/m68k: Perform the semihosting test during translate Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 19/35] tcg: Add TCGContext.emit_before_op Richard Henderson
                   ` (17 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Keith Packard

From: Keith Packard <keithp@keithp.com>

According to the m68k semihosting spec:

"The instruction used to trigger a semihosting request depends on the
 m68k processor variant.  On ColdFire, "halt" is used; on other processors
 (which don't implement "halt"), "bkpt #0" may be used."

Add support for non-CodeFire processors by matching BKPT #0 instructions.

Signed-off-by: Keith Packard <keithp@keithp.com>
[rth: Use semihosting_test()]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/m68k/translate.c | 5 +++++
 1 file changed, 5 insertions(+)

diff --git a/target/m68k/translate.c b/target/m68k/translate.c
index 8f61ff1238..659543020b 100644
--- a/target/m68k/translate.c
+++ b/target/m68k/translate.c
@@ -2646,6 +2646,11 @@ DISAS_INSN(bkpt)
 #if defined(CONFIG_USER_ONLY)
     gen_exception(s, s->base.pc_next, EXCP_DEBUG);
 #else
+    /* BKPT #0 is the alternate semihosting instruction. */
+    if ((insn & 7) == 0 && semihosting_test(s)) {
+        gen_exception(s, s->pc, EXCP_SEMIHOSTING);
+        return;
+    }
     gen_exception(s, s->base.pc_next, EXCP_ILLEGAL);
 #endif
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 19/35] tcg: Add TCGContext.emit_before_op
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (17 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 18/35] target/m68k: Support semihosting on non-ColdFire targets Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 20/35] accel/tcg: Add insn_start to DisasContextBase Richard Henderson
                   ` (16 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé, Pierrick Bouvier

Allow operations to be emitted via normal expanders
into the middle of the opcode stream.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/tcg/tcg.h |  6 ++++++
 tcg/tcg.c         | 14 ++++++++++++--
 2 files changed, 18 insertions(+), 2 deletions(-)

diff --git a/include/tcg/tcg.h b/include/tcg/tcg.h
index 451f3fec41..05a1912f8a 100644
--- a/include/tcg/tcg.h
+++ b/include/tcg/tcg.h
@@ -553,6 +553,12 @@ struct TCGContext {
     QTAILQ_HEAD(, TCGOp) ops, free_ops;
     QSIMPLEQ_HEAD(, TCGLabel) labels;
 
+    /*
+     * When clear, new ops are added to the tail of @ops.
+     * When set, new ops are added in front of @emit_before_op.
+     */
+    TCGOp *emit_before_op;
+
     /* Tells which temporary holds a given register.
        It does not take into account fixed registers */
     TCGTemp *reg_to_temp[TCG_TARGET_NB_REGS];
diff --git a/tcg/tcg.c b/tcg/tcg.c
index d6670237fb..0c0bb9d169 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1521,6 +1521,7 @@ void tcg_func_start(TCGContext *s)
 
     QTAILQ_INIT(&s->ops);
     QTAILQ_INIT(&s->free_ops);
+    s->emit_before_op = NULL;
     QSIMPLEQ_INIT(&s->labels);
 
     tcg_debug_assert(s->addr_type == TCG_TYPE_I32 ||
@@ -2332,7 +2333,11 @@ static void tcg_gen_callN(TCGHelperInfo *info, TCGTemp *ret, TCGTemp **args)
     op->args[pi++] = (uintptr_t)info;
     tcg_debug_assert(pi == total_args);
 
-    QTAILQ_INSERT_TAIL(&tcg_ctx->ops, op, link);
+    if (tcg_ctx->emit_before_op) {
+        QTAILQ_INSERT_BEFORE(tcg_ctx->emit_before_op, op, link);
+    } else {
+        QTAILQ_INSERT_TAIL(&tcg_ctx->ops, op, link);
+    }
 
     tcg_debug_assert(n_extend < ARRAY_SIZE(extend_free));
     for (i = 0; i < n_extend; ++i) {
@@ -3215,7 +3220,12 @@ static TCGOp *tcg_op_alloc(TCGOpcode opc, unsigned nargs)
 TCGOp *tcg_emit_op(TCGOpcode opc, unsigned nargs)
 {
     TCGOp *op = tcg_op_alloc(opc, nargs);
-    QTAILQ_INSERT_TAIL(&tcg_ctx->ops, op, link);
+
+    if (tcg_ctx->emit_before_op) {
+        QTAILQ_INSERT_BEFORE(tcg_ctx->emit_before_op, op, link);
+    } else {
+        QTAILQ_INSERT_TAIL(&tcg_ctx->ops, op, link);
+    }
     return op;
 }
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 20/35] accel/tcg: Add insn_start to DisasContextBase
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (18 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 19/35] tcg: Add TCGContext.emit_before_op Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 21/35] target/arm: Use insn_start from DisasContextBase Richard Henderson
                   ` (15 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

This is currently target-specific for many; begin making it
target independent.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/translator.h | 3 +++
 accel/tcg/translator.c    | 2 ++
 2 files changed, 5 insertions(+)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index 51624feb10..ceaeca8c91 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -74,6 +74,8 @@ typedef enum DisasJumpType {
  * @singlestep_enabled: "Hardware" single stepping enabled.
  * @saved_can_do_io: Known value of cpu->neg.can_do_io, or -1 for unknown.
  * @plugin_enabled: TCG plugin enabled in this TB.
+ * @insn_start: The last op emitted by the insn_start hook,
+ *              which is expected to be INDEX_op_insn_start.
  *
  * Architecture-agnostic disassembly context.
  */
@@ -87,6 +89,7 @@ typedef struct DisasContextBase {
     bool singlestep_enabled;
     int8_t saved_can_do_io;
     bool plugin_enabled;
+    struct TCGOp *insn_start;
     void *host_addr[2];
 } DisasContextBase;
 
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index 38c34009a5..ae61c154c2 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -140,6 +140,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     db->max_insns = *max_insns;
     db->singlestep_enabled = cflags & CF_SINGLE_STEP;
     db->saved_can_do_io = -1;
+    db->insn_start = NULL;
     db->host_addr[0] = host_pc;
     db->host_addr[1] = NULL;
 
@@ -157,6 +158,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     while (true) {
         *max_insns = ++db->num_insns;
         ops->insn_start(db, cpu);
+        db->insn_start = tcg_last_op();
         tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
         if (plugin_enabled) {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 21/35] target/arm: Use insn_start from DisasContextBase
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (19 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 20/35] accel/tcg: Add insn_start to DisasContextBase Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 22/35] target/hppa: " Richard Henderson
                   ` (14 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

To keep the multiple update check, replace insn_start
with insn_start_updated.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/arm/tcg/translate.h     | 12 ++++++------
 target/arm/tcg/translate-a64.c |  2 +-
 target/arm/tcg/translate.c     |  2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/target/arm/tcg/translate.h b/target/arm/tcg/translate.h
index 93be745cf3..dc66ff2190 100644
--- a/target/arm/tcg/translate.h
+++ b/target/arm/tcg/translate.h
@@ -165,10 +165,10 @@ typedef struct DisasContext {
     uint8_t gm_blocksize;
     /* True if this page is guarded.  */
     bool guarded_page;
+    /* True if the current insn_start has been updated. */
+    bool insn_start_updated;
     /* Bottom two bits of XScale c15_cpar coprocessor access control reg */
     int c15_cpar;
-    /* TCG op of the current insn_start.  */
-    TCGOp *insn_start;
     /* Offset from VNCR_EL2 when FEAT_NV2 redirects this reg to memory */
     uint32_t nv2_redirect_offset;
 } DisasContext;
@@ -276,10 +276,10 @@ static inline void disas_set_insn_syndrome(DisasContext *s, uint32_t syn)
     syn &= ARM_INSN_START_WORD2_MASK;
     syn >>= ARM_INSN_START_WORD2_SHIFT;
 
-    /* We check and clear insn_start_idx to catch multiple updates.  */
-    assert(s->insn_start != NULL);
-    tcg_set_insn_start_param(s->insn_start, 2, syn);
-    s->insn_start = NULL;
+    /* Check for multiple updates.  */
+    assert(!s->insn_start_updated);
+    s->insn_start_updated = true;
+    tcg_set_insn_start_param(s->base.insn_start, 2, syn);
 }
 
 static inline int curr_insn_len(DisasContext *s)
diff --git a/target/arm/tcg/translate-a64.c b/target/arm/tcg/translate-a64.c
index 340265beb0..2666d52711 100644
--- a/target/arm/tcg/translate-a64.c
+++ b/target/arm/tcg/translate-a64.c
@@ -14179,7 +14179,7 @@ static void aarch64_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
         pc_arg &= ~TARGET_PAGE_MASK;
     }
     tcg_gen_insn_start(pc_arg, 0, 0);
-    dc->insn_start = tcg_last_op();
+    dc->insn_start_updated = false;
 }
 
 static void aarch64_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
diff --git a/target/arm/tcg/translate.c b/target/arm/tcg/translate.c
index 69585e6003..dc49a8d806 100644
--- a/target/arm/tcg/translate.c
+++ b/target/arm/tcg/translate.c
@@ -9273,7 +9273,7 @@ static void arm_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
         condexec_bits = (dc->condexec_cond << 4) | (dc->condexec_mask >> 1);
     }
     tcg_gen_insn_start(pc_arg, condexec_bits, 0);
-    dc->insn_start = tcg_last_op();
+    dc->insn_start_updated = false;
 }
 
 static bool arm_check_kernelpage(DisasContext *dc)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 22/35] target/hppa: Use insn_start from DisasContextBase
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (20 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 21/35] target/arm: Use insn_start from DisasContextBase Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 23/35] target/i386: Preserve DisasContextBase.insn_start across rewind Richard Henderson
                   ` (13 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

To keep the multiple update check, replace insn_start
with insn_start_updated.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/hppa/translate.c | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/target/hppa/translate.c b/target/hppa/translate.c
index 8a1a8bc3aa..42fa480950 100644
--- a/target/hppa/translate.c
+++ b/target/hppa/translate.c
@@ -44,7 +44,6 @@ typedef struct DisasCond {
 typedef struct DisasContext {
     DisasContextBase base;
     CPUState *cs;
-    TCGOp *insn_start;
 
     uint64_t iaoq_f;
     uint64_t iaoq_b;
@@ -62,6 +61,7 @@ typedef struct DisasContext {
     int privilege;
     bool psw_n_nonzero;
     bool is_pa20;
+    bool insn_start_updated;
 
 #ifdef CONFIG_USER_ONLY
     MemOp unalign;
@@ -300,9 +300,9 @@ void hppa_translate_init(void)
 
 static void set_insn_breg(DisasContext *ctx, int breg)
 {
-    assert(ctx->insn_start != NULL);
-    tcg_set_insn_start_param(ctx->insn_start, 2, breg);
-    ctx->insn_start = NULL;
+    assert(!ctx->insn_start_updated);
+    ctx->insn_start_updated = true;
+    tcg_set_insn_start_param(ctx->base.insn_start, 2, breg);
 }
 
 static DisasCond cond_make_f(void)
@@ -4694,7 +4694,7 @@ static void hppa_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
     DisasContext *ctx = container_of(dcbase, DisasContext, base);
 
     tcg_gen_insn_start(ctx->iaoq_f, ctx->iaoq_b, 0);
-    ctx->insn_start = tcg_last_op();
+    ctx->insn_start_updated = false;
 }
 
 static void hppa_tr_translate_insn(DisasContextBase *dcbase, CPUState *cs)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 23/35] target/i386: Preserve DisasContextBase.insn_start across rewind
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (21 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 22/35] target/hppa: " Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 24/35] target/microblaze: Use insn_start from DisasContextBase Richard Henderson
                   ` (12 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini

When aborting translation of the current insn, restore the
previous value of insn_start.

Acked-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/i386/tcg/translate.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/i386/tcg/translate.c b/target/i386/tcg/translate.c
index 07f642dc9e..76a42c679c 100644
--- a/target/i386/tcg/translate.c
+++ b/target/i386/tcg/translate.c
@@ -139,6 +139,7 @@ typedef struct DisasContext {
     TCGv_i64 tmp1_i64;
 
     sigjmp_buf jmpbuf;
+    TCGOp *prev_insn_start;
     TCGOp *prev_insn_end;
 } DisasContext;
 
@@ -3123,6 +3124,7 @@ static bool disas_insn(DisasContext *s, CPUState *cpu)
         /* END TODO */
         s->base.num_insns--;
         tcg_remove_ops_after(s->prev_insn_end);
+        s->base.insn_start = s->prev_insn_start;
         s->base.is_jmp = DISAS_TOO_MANY;
         return false;
     default:
@@ -6995,6 +6997,7 @@ static void i386_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
     DisasContext *dc = container_of(dcbase, DisasContext, base);
     target_ulong pc_arg = dc->base.pc_next;
 
+    dc->prev_insn_start = dc->base.insn_start;
     dc->prev_insn_end = tcg_last_op();
     if (tb_cflags(dcbase->tb) & CF_PCREL) {
         pc_arg &= ~TARGET_PAGE_MASK;
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 24/35] target/microblaze: Use insn_start from DisasContextBase
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (22 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 23/35] target/i386: Preserve DisasContextBase.insn_start across rewind Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 25/35] target/riscv: " Richard Henderson
                   ` (11 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/microblaze/translate.c | 8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index 4e52ef32db..fc451befae 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -62,9 +62,6 @@ typedef struct DisasContext {
     DisasContextBase base;
     const MicroBlazeCPUConfig *cfg;
 
-    /* TCG op of the current insn_start.  */
-    TCGOp *insn_start;
-
     TCGv_i32 r0;
     bool r0_set;
 
@@ -699,14 +696,14 @@ static TCGv compute_ldst_addr_ea(DisasContext *dc, int ra, int rb)
 static void record_unaligned_ess(DisasContext *dc, int rd,
                                  MemOp size, bool store)
 {
-    uint32_t iflags = tcg_get_insn_start_param(dc->insn_start, 1);
+    uint32_t iflags = tcg_get_insn_start_param(dc->base.insn_start, 1);
 
     iflags |= ESR_ESS_FLAG;
     iflags |= rd << 5;
     iflags |= store * ESR_S;
     iflags |= (size == MO_32) * ESR_W;
 
-    tcg_set_insn_start_param(dc->insn_start, 1, iflags);
+    tcg_set_insn_start_param(dc->base.insn_start, 1, iflags);
 }
 #endif
 
@@ -1624,7 +1621,6 @@ static void mb_tr_insn_start(DisasContextBase *dcb, CPUState *cs)
     DisasContext *dc = container_of(dcb, DisasContext, base);
 
     tcg_gen_insn_start(dc->base.pc_next, dc->tb_flags & ~MSR_TB_MASK);
-    dc->insn_start = tcg_last_op();
 }
 
 static void mb_tr_translate_insn(DisasContextBase *dcb, CPUState *cs)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 25/35] target/riscv: Use insn_start from DisasContextBase
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (23 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 24/35] target/microblaze: Use insn_start from DisasContextBase Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 26/35] target/s390x: " Richard Henderson
                   ` (10 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

To keep the multiple update check, replace insn_start
with insn_start_updated.

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/riscv/translate.c | 11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 9d57089fcc..9ff09ebdb6 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -115,8 +115,7 @@ typedef struct DisasContext {
     bool itrigger;
     /* FRM is known to contain a valid value. */
     bool frm_valid;
-    /* TCG of the current insn_start */
-    TCGOp *insn_start;
+    bool insn_start_updated;
 } DisasContext;
 
 static inline bool has_ext(DisasContext *ctx, uint32_t ext)
@@ -207,9 +206,9 @@ static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
 
 static void decode_save_opc(DisasContext *ctx)
 {
-    assert(ctx->insn_start != NULL);
-    tcg_set_insn_start_param(ctx->insn_start, 1, ctx->opcode);
-    ctx->insn_start = NULL;
+    assert(!ctx->insn_start_updated);
+    ctx->insn_start_updated = true;
+    tcg_set_insn_start_param(ctx->base.insn_start, 1, ctx->opcode);
 }
 
 static void gen_pc_plus_diff(TCGv target, DisasContext *ctx,
@@ -1224,7 +1223,7 @@ static void riscv_tr_insn_start(DisasContextBase *dcbase, CPUState *cpu)
     }
 
     tcg_gen_insn_start(pc_next, 0);
-    ctx->insn_start = tcg_last_op();
+    ctx->insn_start_updated = false;
 }
 
 static void riscv_tr_translate_insn(DisasContextBase *dcbase, CPUState *cpu)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 26/35] target/s390x: Use insn_start from DisasContextBase
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (24 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 25/35] target/riscv: " Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 27/35] accel/tcg: Improve can_do_io management Richard Henderson
                   ` (9 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Philippe Mathieu-Daudé

Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 target/s390x/tcg/translate.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/target/s390x/tcg/translate.c b/target/s390x/tcg/translate.c
index 57b7db1ee9..90a74ee795 100644
--- a/target/s390x/tcg/translate.c
+++ b/target/s390x/tcg/translate.c
@@ -141,7 +141,6 @@ struct DisasFields {
 struct DisasContext {
     DisasContextBase base;
     const DisasInsn *insn;
-    TCGOp *insn_start;
     DisasFields fields;
     uint64_t ex_value;
     /*
@@ -6314,7 +6313,7 @@ static DisasJumpType translate_one(CPUS390XState *env, DisasContext *s)
     insn = extract_insn(env, s);
 
     /* Update insn_start now that we know the ILEN.  */
-    tcg_set_insn_start_param(s->insn_start, 2, s->ilen);
+    tcg_set_insn_start_param(s->base.insn_start, 2, s->ilen);
 
     /* Not found means unimplemented/illegal opcode.  */
     if (insn == NULL) {
@@ -6468,7 +6467,6 @@ static void s390x_tr_insn_start(DisasContextBase *dcbase, CPUState *cs)
 
     /* Delay the set of ilen until we've read the insn. */
     tcg_gen_insn_start(dc->base.pc_next, dc->cc_op, 0);
-    dc->insn_start = tcg_last_op();
 }
 
 static target_ulong get_next_pc(CPUS390XState *env, DisasContext *s,
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 27/35] accel/tcg: Improve can_do_io management
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (25 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 26/35] target/s390x: " Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 28/35] util/bufferiszero: Remove SSE4.1 variant Richard Henderson
                   ` (8 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Jørgen Hansen, Philippe Mathieu-Daudé

We already attempted to set and clear can_do_io before the first
and last insns, but only used the initial value of max_insns and
the call to translator_io_start to find those insns.

Now that we track insn_start in DisasContextBase, and now that
we have emit_before_op, we can wait until we have finished
translation to identify the true first and last insns and emit
the sets of can_do_io at that time.

This fixes the case of a translation block which crossed a page
boundary, and for which the second page turned out to be mmio.
In this case we truncate the block, and the previous logic for
can_do_io could leave a block with a single insn with can_do_io
set to false, which would fail an assertion in cpu_io_recompile.

Reported-by: Jørgen Hansen <Jorgen.Hansen@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Tested-by: Jørgen Hansen <Jorgen.Hansen@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/exec/translator.h |  1 -
 accel/tcg/translator.c    | 45 ++++++++++++++++++++-------------------
 2 files changed, 23 insertions(+), 23 deletions(-)

diff --git a/include/exec/translator.h b/include/exec/translator.h
index ceaeca8c91..2c4fb818e7 100644
--- a/include/exec/translator.h
+++ b/include/exec/translator.h
@@ -87,7 +87,6 @@ typedef struct DisasContextBase {
     int num_insns;
     int max_insns;
     bool singlestep_enabled;
-    int8_t saved_can_do_io;
     bool plugin_enabled;
     struct TCGOp *insn_start;
     void *host_addr[2];
diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index ae61c154c2..9de0bc34c8 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -18,20 +18,14 @@
 
 static void set_can_do_io(DisasContextBase *db, bool val)
 {
-    if (db->saved_can_do_io != val) {
-        db->saved_can_do_io = val;
-
-        QEMU_BUILD_BUG_ON(sizeof_field(CPUState, neg.can_do_io) != 1);
-        tcg_gen_st8_i32(tcg_constant_i32(val), tcg_env,
-                        offsetof(ArchCPU, parent_obj.neg.can_do_io) -
-                        offsetof(ArchCPU, env));
-    }
+    QEMU_BUILD_BUG_ON(sizeof_field(CPUState, neg.can_do_io) != 1);
+    tcg_gen_st8_i32(tcg_constant_i32(val), tcg_env,
+                    offsetof(ArchCPU, parent_obj.neg.can_do_io) -
+                    offsetof(ArchCPU, env));
 }
 
 bool translator_io_start(DisasContextBase *db)
 {
-    set_can_do_io(db, true);
-
     /*
      * Ensure that this instruction will be the last in the TB.
      * The target may override this to something more forceful.
@@ -84,13 +78,6 @@ static TCGOp *gen_tb_start(DisasContextBase *db, uint32_t cflags)
                          - offsetof(ArchCPU, env));
     }
 
-    /*
-     * cpu->neg.can_do_io is set automatically here at the beginning of
-     * each translation block.  The cost is minimal, plus it would be
-     * very easy to forget doing it in the translator.
-     */
-    set_can_do_io(db, db->max_insns == 1);
-
     return icount_start_insn;
 }
 
@@ -129,6 +116,7 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
 {
     uint32_t cflags = tb_cflags(tb);
     TCGOp *icount_start_insn;
+    TCGOp *first_insn_start = NULL;
     bool plugin_enabled;
 
     /* Initialize DisasContext */
@@ -139,7 +127,6 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     db->num_insns = 0;
     db->max_insns = *max_insns;
     db->singlestep_enabled = cflags & CF_SINGLE_STEP;
-    db->saved_can_do_io = -1;
     db->insn_start = NULL;
     db->host_addr[0] = host_pc;
     db->host_addr[1] = NULL;
@@ -159,6 +146,9 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
         *max_insns = ++db->num_insns;
         ops->insn_start(db, cpu);
         db->insn_start = tcg_last_op();
+        if (first_insn_start == NULL) {
+            first_insn_start = db->insn_start;
+        }
         tcg_debug_assert(db->is_jmp == DISAS_NEXT);  /* no early exit */
 
         if (plugin_enabled) {
@@ -171,10 +161,6 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
          * done next -- either exiting this loop or locate the start of
          * the next instruction.
          */
-        if (db->num_insns == db->max_insns) {
-            /* Accept I/O on the last instruction.  */
-            set_can_do_io(db, true);
-        }
         ops->translate_insn(db, cpu);
 
         /*
@@ -207,6 +193,21 @@ void translator_loop(CPUState *cpu, TranslationBlock *tb, int *max_insns,
     ops->tb_stop(db, cpu);
     gen_tb_end(tb, cflags, icount_start_insn, db->num_insns);
 
+    /*
+     * Manage can_do_io for the translation block: set to false before
+     * the first insn and set to true before the last insn.
+     */
+    if (db->num_insns == 1) {
+        tcg_debug_assert(first_insn_start == db->insn_start);
+    } else {
+        tcg_debug_assert(first_insn_start != db->insn_start);
+        tcg_ctx->emit_before_op = first_insn_start;
+        set_can_do_io(db, false);
+    }
+    tcg_ctx->emit_before_op = db->insn_start;
+    set_can_do_io(db, true);
+    tcg_ctx->emit_before_op = NULL;
+
     if (plugin_enabled) {
         plugin_gen_tb_end(cpu, db->num_insns);
     }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 28/35] util/bufferiszero: Remove SSE4.1 variant
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (26 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 27/35] accel/tcg: Improve can_do_io management Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 29/35] util/bufferiszero: Remove AVX512 variant Richard Henderson
                   ` (7 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexander Monakov, Mikhail Romanov

From: Alexander Monakov <amonakov@ispras.ru>

The SSE4.1 variant is virtually identical to the SSE2 variant, except
for using 'PTEST+JNZ' in place of 'PCMPEQB+PMOVMSKB+CMP+JNE' for testing
if an SSE register is all zeroes. The PTEST instruction decodes to two
uops, so it can be handled only by the complex decoder, and since
CMP+JNE are macro-fused, both sequences decode to three uops. The uops
comprising the PTEST instruction dispatch to p0 and p5 on Intel CPUs, so
PCMPEQB+PMOVMSKB is comparatively more flexible from dispatch
standpoint.

Hence, the use of PTEST brings no benefit from throughput standpoint.
Its latency is not important, since it feeds only a conditional jump,
which terminates the dependency chain.

I never observed PTEST variants to be faster on real hardware.

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240206204809.9859-2-amonakov@ispras.ru>
---
 util/bufferiszero.c | 29 -----------------------------
 1 file changed, 29 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 3e6a5dfd63..f5a3634f9a 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -100,34 +100,6 @@ buffer_zero_sse2(const void *buf, size_t len)
 }
 
 #ifdef CONFIG_AVX2_OPT
-static bool __attribute__((target("sse4")))
-buffer_zero_sse4(const void *buf, size_t len)
-{
-    __m128i t = _mm_loadu_si128(buf);
-    __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16);
-    __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16);
-
-    /* Loop over 16-byte aligned blocks of 64.  */
-    while (likely(p <= e)) {
-        __builtin_prefetch(p);
-        if (unlikely(!_mm_testz_si128(t, t))) {
-            return false;
-        }
-        t = p[-4] | p[-3] | p[-2] | p[-1];
-        p += 4;
-    }
-
-    /* Finish the aligned tail.  */
-    t |= e[-3];
-    t |= e[-2];
-    t |= e[-1];
-
-    /* Finish the unaligned tail.  */
-    t |= _mm_loadu_si128(buf + len - 16);
-
-    return _mm_testz_si128(t, t);
-}
-
 static bool __attribute__((target("avx2")))
 buffer_zero_avx2(const void *buf, size_t len)
 {
@@ -221,7 +193,6 @@ select_accel_cpuinfo(unsigned info)
 #endif
 #ifdef CONFIG_AVX2_OPT
         { CPUINFO_AVX2,    128, buffer_zero_avx2 },
-        { CPUINFO_SSE4,     64, buffer_zero_sse4 },
 #endif
         { CPUINFO_SSE2,     64, buffer_zero_sse2 },
         { CPUINFO_ALWAYS,    0, buffer_zero_int },
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 29/35] util/bufferiszero: Remove AVX512 variant
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (27 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 28/35] util/bufferiszero: Remove SSE4.1 variant Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 30/35] util/bufferiszero: Reorganize for early test for acceleration Richard Henderson
                   ` (6 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexander Monakov, Mikhail Romanov

From: Alexander Monakov <amonakov@ispras.ru>

Thanks to early checks in the inline buffer_is_zero wrapper, the SIMD
routines are invoked much more rarely in normal use when most buffers
are non-zero. This makes use of AVX512 unprofitable, as it incurs extra
frequency and voltage transition periods during which the CPU operates
at reduced performance, as described in
https://travisdowns.github.io/blog/2020/01/17/avxfreq1.html

Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240206204809.9859-4-amonakov@ispras.ru>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 util/bufferiszero.c | 38 +++-----------------------------------
 1 file changed, 3 insertions(+), 35 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index f5a3634f9a..641d5f9b9e 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -64,7 +64,7 @@ buffer_zero_int(const void *buf, size_t len)
     }
 }
 
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
+#if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
 #include <immintrin.h>
 
 /* Note that each of these vectorized functions require len >= 64.  */
@@ -128,41 +128,12 @@ buffer_zero_avx2(const void *buf, size_t len)
 }
 #endif /* CONFIG_AVX2_OPT */
 
-#ifdef CONFIG_AVX512F_OPT
-static bool __attribute__((target("avx512f")))
-buffer_zero_avx512(const void *buf, size_t len)
-{
-    /* Begin with an unaligned head of 64 bytes.  */
-    __m512i t = _mm512_loadu_si512(buf);
-    __m512i *p = (__m512i *)(((uintptr_t)buf + 5 * 64) & -64);
-    __m512i *e = (__m512i *)(((uintptr_t)buf + len) & -64);
-
-    /* Loop over 64-byte aligned blocks of 256.  */
-    while (p <= e) {
-        __builtin_prefetch(p);
-        if (unlikely(_mm512_test_epi64_mask(t, t))) {
-            return false;
-        }
-        t = p[-4] | p[-3] | p[-2] | p[-1];
-        p += 4;
-    }
-
-    t |= _mm512_loadu_si512(buf + len - 4 * 64);
-    t |= _mm512_loadu_si512(buf + len - 3 * 64);
-    t |= _mm512_loadu_si512(buf + len - 2 * 64);
-    t |= _mm512_loadu_si512(buf + len - 1 * 64);
-
-    return !_mm512_test_epi64_mask(t, t);
-
-}
-#endif /* CONFIG_AVX512F_OPT */
-
 /*
  * Make sure that these variables are appropriately initialized when
  * SSE2 is enabled on the compiler command-line, but the compiler is
  * too old to support CONFIG_AVX2_OPT.
  */
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
+#if defined(CONFIG_AVX2_OPT)
 # define INIT_USED     0
 # define INIT_LENGTH   0
 # define INIT_ACCEL    buffer_zero_int
@@ -188,9 +159,6 @@ select_accel_cpuinfo(unsigned info)
         unsigned len;
         bool (*fn)(const void *, size_t);
     } all[] = {
-#ifdef CONFIG_AVX512F_OPT
-        { CPUINFO_AVX512F, 256, buffer_zero_avx512 },
-#endif
 #ifdef CONFIG_AVX2_OPT
         { CPUINFO_AVX2,    128, buffer_zero_avx2 },
 #endif
@@ -208,7 +176,7 @@ select_accel_cpuinfo(unsigned info)
     return 0;
 }
 
-#if defined(CONFIG_AVX512F_OPT) || defined(CONFIG_AVX2_OPT)
+#if defined(CONFIG_AVX2_OPT)
 static void __attribute__((constructor)) init_accel(void)
 {
     used_accel = select_accel_cpuinfo(cpuinfo_init());
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 30/35] util/bufferiszero: Reorganize for early test for acceleration
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (28 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 29/35] util/bufferiszero: Remove AVX512 variant Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 31/35] util/bufferiszero: Remove useless prefetches Richard Henderson
                   ` (5 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexander Monakov, Mikhail Romanov

From: Alexander Monakov <amonakov@ispras.ru>

Test for length >= 256 inline, where is is often a constant.
Before calling into the accelerated routine, sample three bytes
from the buffer, which handles most non-zero buffers.

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Message-Id: <20240206204809.9859-3-amonakov@ispras.ru>
[rth: Use __builtin_constant_p; move the indirect call out of line.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 include/qemu/cutils.h | 32 ++++++++++++++++-
 util/bufferiszero.c   | 84 +++++++++++++++++--------------------------
 2 files changed, 63 insertions(+), 53 deletions(-)

diff --git a/include/qemu/cutils.h b/include/qemu/cutils.h
index 92c927a6a3..741dade7cf 100644
--- a/include/qemu/cutils.h
+++ b/include/qemu/cutils.h
@@ -187,9 +187,39 @@ char *freq_to_str(uint64_t freq_hz);
 /* used to print char* safely */
 #define STR_OR_NULL(str) ((str) ? (str) : "null")
 
-bool buffer_is_zero(const void *buf, size_t len);
+/*
+ * Check if a buffer is all zeroes.
+ */
+
+bool buffer_is_zero_ool(const void *vbuf, size_t len);
+bool buffer_is_zero_ge256(const void *vbuf, size_t len);
 bool test_buffer_is_zero_next_accel(void);
 
+static inline bool buffer_is_zero_sample3(const char *buf, size_t len)
+{
+    /*
+     * For any reasonably sized buffer, these three samples come from
+     * three different cachelines.  In qemu-img usage, we find that
+     * each byte eliminates more than half of all buffer testing.
+     * It is therefore critical to performance that the byte tests
+     * short-circuit, so that we do not pull in additional cache lines.
+     * Do not "optimize" this to !(a | b | c).
+     */
+    return !buf[0] && !buf[len - 1] && !buf[len / 2];
+}
+
+#ifdef __OPTIMIZE__
+static inline bool buffer_is_zero(const void *buf, size_t len)
+{
+    return (__builtin_constant_p(len) && len >= 256
+            ? buffer_is_zero_sample3(buf, len) &&
+              buffer_is_zero_ge256(buf, len)
+            : buffer_is_zero_ool(buf, len));
+}
+#else
+#define buffer_is_zero  buffer_is_zero_ool
+#endif
+
 /*
  * Implementation of ULEB128 (http://en.wikipedia.org/wiki/LEB128)
  * Input is limited to 14-bit numbers
diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 641d5f9b9e..972f394cbd 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -26,8 +26,9 @@
 #include "qemu/bswap.h"
 #include "host/cpuinfo.h"
 
-static bool
-buffer_zero_int(const void *buf, size_t len)
+static bool (*buffer_is_zero_accel)(const void *, size_t);
+
+static bool buffer_is_zero_integer(const void *buf, size_t len)
 {
     if (unlikely(len < 8)) {
         /* For a very small buffer, simply accumulate all the bytes.  */
@@ -128,60 +129,38 @@ buffer_zero_avx2(const void *buf, size_t len)
 }
 #endif /* CONFIG_AVX2_OPT */
 
-/*
- * Make sure that these variables are appropriately initialized when
- * SSE2 is enabled on the compiler command-line, but the compiler is
- * too old to support CONFIG_AVX2_OPT.
- */
-#if defined(CONFIG_AVX2_OPT)
-# define INIT_USED     0
-# define INIT_LENGTH   0
-# define INIT_ACCEL    buffer_zero_int
-#else
-# ifndef __SSE2__
-#  error "ISA selection confusion"
-# endif
-# define INIT_USED     CPUINFO_SSE2
-# define INIT_LENGTH   64
-# define INIT_ACCEL    buffer_zero_sse2
-#endif
-
-static unsigned used_accel = INIT_USED;
-static unsigned length_to_accel = INIT_LENGTH;
-static bool (*buffer_accel)(const void *, size_t) = INIT_ACCEL;
-
 static unsigned __attribute__((noinline))
 select_accel_cpuinfo(unsigned info)
 {
     /* Array is sorted in order of algorithm preference. */
     static const struct {
         unsigned bit;
-        unsigned len;
         bool (*fn)(const void *, size_t);
     } all[] = {
 #ifdef CONFIG_AVX2_OPT
-        { CPUINFO_AVX2,    128, buffer_zero_avx2 },
+        { CPUINFO_AVX2,    buffer_zero_avx2 },
 #endif
-        { CPUINFO_SSE2,     64, buffer_zero_sse2 },
-        { CPUINFO_ALWAYS,    0, buffer_zero_int },
+        { CPUINFO_SSE2,    buffer_zero_sse2 },
+        { CPUINFO_ALWAYS,  buffer_is_zero_integer },
     };
 
     for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) {
         if (info & all[i].bit) {
-            length_to_accel = all[i].len;
-            buffer_accel = all[i].fn;
+            buffer_is_zero_accel = all[i].fn;
             return all[i].bit;
         }
     }
     return 0;
 }
 
-#if defined(CONFIG_AVX2_OPT)
+static unsigned used_accel;
+
 static void __attribute__((constructor)) init_accel(void)
 {
     used_accel = select_accel_cpuinfo(cpuinfo_init());
 }
-#endif /* CONFIG_AVX2_OPT */
+
+#define INIT_ACCEL NULL
 
 bool test_buffer_is_zero_next_accel(void)
 {
@@ -194,36 +173,37 @@ bool test_buffer_is_zero_next_accel(void)
     used_accel |= used;
     return used;
 }
-
-static bool select_accel_fn(const void *buf, size_t len)
-{
-    if (likely(len >= length_to_accel)) {
-        return buffer_accel(buf, len);
-    }
-    return buffer_zero_int(buf, len);
-}
-
 #else
-#define select_accel_fn  buffer_zero_int
 bool test_buffer_is_zero_next_accel(void)
 {
     return false;
 }
+
+#define INIT_ACCEL buffer_is_zero_integer
 #endif
 
-/*
- * Checks if a buffer is all zeroes
- */
-bool buffer_is_zero(const void *buf, size_t len)
+static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL;
+
+bool buffer_is_zero_ool(const void *buf, size_t len)
 {
     if (unlikely(len == 0)) {
         return true;
     }
+    if (!buffer_is_zero_sample3(buf, len)) {
+        return false;
+    }
+    /* All bytes are covered for any len <= 3.  */
+    if (unlikely(len <= 3)) {
+        return true;
+    }
 
-    /* Fetch the beginning of the buffer while we select the accelerator.  */
-    __builtin_prefetch(buf);
-
-    /* Use an optimized zero check if possible.  Note that this also
-       includes a check for an unrolled loop over 64-bit integers.  */
-    return select_accel_fn(buf, len);
+    if (likely(len >= 256)) {
+        return buffer_is_zero_accel(buf, len);
+    }
+    return buffer_is_zero_integer(buf, len);
+}
+
+bool buffer_is_zero_ge256(const void *buf, size_t len)
+{
+    return buffer_is_zero_accel(buf, len);
 }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 31/35] util/bufferiszero: Remove useless prefetches
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (29 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 30/35] util/bufferiszero: Reorganize for early test for acceleration Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 32/35] util/bufferiszero: Optimize SSE2 and AVX2 variants Richard Henderson
                   ` (4 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexander Monakov, Mikhail Romanov

From: Alexander Monakov <amonakov@ispras.ru>

Use of prefetching in bufferiszero.c is quite questionable:

- prefetches are issued just a few CPU cycles before the corresponding
  line would be hit by demand loads;

- they are done for simple access patterns, i.e. where hardware
  prefetchers can perform better;

- they compete for load ports in loops that should be limited by load
  port throughput rather than ALU throughput.

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240206204809.9859-5-amonakov@ispras.ru>
---
 util/bufferiszero.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 972f394cbd..00118d649e 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -50,7 +50,6 @@ static bool buffer_is_zero_integer(const void *buf, size_t len)
         const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8);
 
         for (; p + 8 <= e; p += 8) {
-            __builtin_prefetch(p + 8);
             if (t) {
                 return false;
             }
@@ -80,7 +79,6 @@ buffer_zero_sse2(const void *buf, size_t len)
 
     /* Loop over 16-byte aligned blocks of 64.  */
     while (likely(p <= e)) {
-        __builtin_prefetch(p);
         t = _mm_cmpeq_epi8(t, zero);
         if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) {
             return false;
@@ -111,7 +109,6 @@ buffer_zero_avx2(const void *buf, size_t len)
 
     /* Loop over 32-byte aligned blocks of 128.  */
     while (p <= e) {
-        __builtin_prefetch(p);
         if (unlikely(!_mm256_testz_si256(t, t))) {
             return false;
         }
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 32/35] util/bufferiszero: Optimize SSE2 and AVX2 variants
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (30 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 31/35] util/bufferiszero: Remove useless prefetches Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 33/35] util/bufferiszero: Improve scalar variant Richard Henderson
                   ` (3 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel; +Cc: Alexander Monakov, Mikhail Romanov

From: Alexander Monakov <amonakov@ispras.ru>

Increase unroll factor in SIMD loops from 4x to 8x in order to move
their bottlenecks from ALU port contention to load issue rate (two loads
per cycle on popular x86 implementations).

Avoid using out-of-bounds pointers in loop boundary conditions.

Follow SSE2 implementation strategy in the AVX2 variant. Avoid use of
PTEST, which is not profitable there (like in the removed SSE4 variant).

Signed-off-by: Alexander Monakov <amonakov@ispras.ru>
Signed-off-by: Mikhail Romanov <mmromanov@ispras.ru>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <20240206204809.9859-6-amonakov@ispras.ru>
---
 util/bufferiszero.c | 111 +++++++++++++++++++++++++++++---------------
 1 file changed, 73 insertions(+), 38 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 00118d649e..02df82b4ff 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -67,62 +67,97 @@ static bool buffer_is_zero_integer(const void *buf, size_t len)
 #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
 #include <immintrin.h>
 
-/* Note that each of these vectorized functions require len >= 64.  */
+/* Helper for preventing the compiler from reassociating
+   chains of binary vector operations.  */
+#define SSE_REASSOC_BARRIER(vec0, vec1) asm("" : "+x"(vec0), "+x"(vec1))
+
+/* Note that these vectorized functions may assume len >= 256.  */
 
 static bool __attribute__((target("sse2")))
 buffer_zero_sse2(const void *buf, size_t len)
 {
-    __m128i t = _mm_loadu_si128(buf);
-    __m128i *p = (__m128i *)(((uintptr_t)buf + 5 * 16) & -16);
-    __m128i *e = (__m128i *)(((uintptr_t)buf + len) & -16);
-    __m128i zero = _mm_setzero_si128();
+    /* Unaligned loads at head/tail.  */
+    __m128i v = *(__m128i_u *)(buf);
+    __m128i w = *(__m128i_u *)(buf + len - 16);
+    /* Align head/tail to 16-byte boundaries.  */
+    const __m128i *p = QEMU_ALIGN_PTR_DOWN(buf + 16, 16);
+    const __m128i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 16);
+    __m128i zero = { 0 };
 
-    /* Loop over 16-byte aligned blocks of 64.  */
-    while (likely(p <= e)) {
-        t = _mm_cmpeq_epi8(t, zero);
-        if (unlikely(_mm_movemask_epi8(t) != 0xFFFF)) {
+    /* Collect a partial block at tail end.  */
+    v |= e[-1]; w |= e[-2];
+    SSE_REASSOC_BARRIER(v, w);
+    v |= e[-3]; w |= e[-4];
+    SSE_REASSOC_BARRIER(v, w);
+    v |= e[-5]; w |= e[-6];
+    SSE_REASSOC_BARRIER(v, w);
+    v |= e[-7]; v |= w;
+
+    /*
+     * Loop over complete 128-byte blocks.
+     * With the head and tail removed, e - p >= 14, so the loop
+     * must iterate at least once.
+     */
+    do {
+        v = _mm_cmpeq_epi8(v, zero);
+        if (unlikely(_mm_movemask_epi8(v) != 0xFFFF)) {
             return false;
         }
-        t = p[-4] | p[-3] | p[-2] | p[-1];
-        p += 4;
-    }
+        v = p[0]; w = p[1];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= p[2]; w |= p[3];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= p[4]; w |= p[5];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= p[6]; w |= p[7];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= w;
+        p += 8;
+    } while (p < e - 7);
 
-    /* Finish the aligned tail.  */
-    t |= e[-3];
-    t |= e[-2];
-    t |= e[-1];
-
-    /* Finish the unaligned tail.  */
-    t |= _mm_loadu_si128(buf + len - 16);
-
-    return _mm_movemask_epi8(_mm_cmpeq_epi8(t, zero)) == 0xFFFF;
+    return _mm_movemask_epi8(_mm_cmpeq_epi8(v, zero)) == 0xFFFF;
 }
 
 #ifdef CONFIG_AVX2_OPT
 static bool __attribute__((target("avx2")))
 buffer_zero_avx2(const void *buf, size_t len)
 {
-    /* Begin with an unaligned head of 32 bytes.  */
-    __m256i t = _mm256_loadu_si256(buf);
-    __m256i *p = (__m256i *)(((uintptr_t)buf + 5 * 32) & -32);
-    __m256i *e = (__m256i *)(((uintptr_t)buf + len) & -32);
+    /* Unaligned loads at head/tail.  */
+    __m256i v = *(__m256i_u *)(buf);
+    __m256i w = *(__m256i_u *)(buf + len - 32);
+    /* Align head/tail to 32-byte boundaries.  */
+    const __m256i *p = QEMU_ALIGN_PTR_DOWN(buf + 32, 32);
+    const __m256i *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 32);
+    __m256i zero = { 0 };
 
-    /* Loop over 32-byte aligned blocks of 128.  */
-    while (p <= e) {
-        if (unlikely(!_mm256_testz_si256(t, t))) {
+    /* Collect a partial block at tail end.  */
+    v |= e[-1]; w |= e[-2];
+    SSE_REASSOC_BARRIER(v, w);
+    v |= e[-3]; w |= e[-4];
+    SSE_REASSOC_BARRIER(v, w);
+    v |= e[-5]; w |= e[-6];
+    SSE_REASSOC_BARRIER(v, w);
+    v |= e[-7]; v |= w;
+
+    /* Loop over complete 256-byte blocks.  */
+    for (; p < e - 7; p += 8) {
+        /* PTEST is not profitable here.  */
+        v = _mm256_cmpeq_epi8(v, zero);
+        if (unlikely(_mm256_movemask_epi8(v) != 0xFFFFFFFF)) {
             return false;
         }
-        t = p[-4] | p[-3] | p[-2] | p[-1];
-        p += 4;
-    } ;
+        v = p[0]; w = p[1];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= p[2]; w |= p[3];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= p[4]; w |= p[5];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= p[6]; w |= p[7];
+        SSE_REASSOC_BARRIER(v, w);
+        v |= w;
+    }
 
-    /* Finish the last block of 128 unaligned.  */
-    t |= _mm256_loadu_si256(buf + len - 4 * 32);
-    t |= _mm256_loadu_si256(buf + len - 3 * 32);
-    t |= _mm256_loadu_si256(buf + len - 2 * 32);
-    t |= _mm256_loadu_si256(buf + len - 1 * 32);
-
-    return _mm256_testz_si256(t, t);
+    return _mm256_movemask_epi8(_mm256_cmpeq_epi8(v, zero)) == 0xFFFFFFFF;
 }
 #endif /* CONFIG_AVX2_OPT */
 
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 33/35] util/bufferiszero: Improve scalar variant
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (31 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 32/35] util/bufferiszero: Optimize SSE2 and AVX2 variants Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 34/35] util/bufferiszero: Introduce biz_accel_fn typedef Richard Henderson
                   ` (2 subsequent siblings)
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel

Split less-than and greater-than 256 cases.
Use unaligned accesses for head and tail.
Avoid using out-of-bounds pointers in loop boundary conditions.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 util/bufferiszero.c | 85 +++++++++++++++++++++++++++------------------
 1 file changed, 51 insertions(+), 34 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index 02df82b4ff..c9a7ded016 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -28,40 +28,57 @@
 
 static bool (*buffer_is_zero_accel)(const void *, size_t);
 
-static bool buffer_is_zero_integer(const void *buf, size_t len)
+static bool buffer_is_zero_int_lt256(const void *buf, size_t len)
 {
-    if (unlikely(len < 8)) {
-        /* For a very small buffer, simply accumulate all the bytes.  */
-        const unsigned char *p = buf;
-        const unsigned char *e = buf + len;
-        unsigned char t = 0;
+    uint64_t t;
+    const uint64_t *p, *e;
 
-        do {
-            t |= *p++;
-        } while (p < e);
-
-        return t == 0;
-    } else {
-        /* Otherwise, use the unaligned memory access functions to
-           handle the beginning and end of the buffer, with a couple
-           of loops handling the middle aligned section.  */
-        uint64_t t = ldq_he_p(buf);
-        const uint64_t *p = (uint64_t *)(((uintptr_t)buf + 8) & -8);
-        const uint64_t *e = (uint64_t *)(((uintptr_t)buf + len) & -8);
-
-        for (; p + 8 <= e; p += 8) {
-            if (t) {
-                return false;
-            }
-            t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7];
-        }
-        while (p < e) {
-            t |= *p++;
-        }
-        t |= ldq_he_p(buf + len - 8);
-
-        return t == 0;
+    /*
+     * Use unaligned memory access functions to handle
+     * the beginning and end of the buffer.
+     */
+    if (unlikely(len <= 8)) {
+        return (ldl_he_p(buf) | ldl_he_p(buf + len - 4)) == 0;
     }
+
+    t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);
+    p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8);
+    e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8);
+
+    /* Read 0 to 31 aligned words from the middle. */
+    while (p < e) {
+        t |= *p++;
+    }
+    return t == 0;
+}
+
+static bool buffer_is_zero_int_ge256(const void *buf, size_t len)
+{
+    /*
+     * Use unaligned memory access functions to handle
+     * the beginning and end of the buffer.
+     */
+    uint64_t t = ldq_he_p(buf) | ldq_he_p(buf + len - 8);
+    const uint64_t *p = QEMU_ALIGN_PTR_DOWN(buf + 8, 8);
+    const uint64_t *e = QEMU_ALIGN_PTR_DOWN(buf + len - 1, 8);
+
+    /* Collect a partial block at the tail end. */
+    t |= e[-7] | e[-6] | e[-5] | e[-4] | e[-3] | e[-2] | e[-1];
+
+    /*
+     * Loop over 64 byte blocks.
+     * With the head and tail removed, e - p >= 30,
+     * so the loop must iterate at least 3 times.
+     */
+    do {
+        if (t) {
+            return false;
+        }
+        t = p[0] | p[1] | p[2] | p[3] | p[4] | p[5] | p[6] | p[7];
+        p += 8;
+    } while (p < e - 7);
+
+    return t == 0;
 }
 
 #if defined(CONFIG_AVX2_OPT) || defined(__SSE2__)
@@ -173,7 +190,7 @@ select_accel_cpuinfo(unsigned info)
         { CPUINFO_AVX2,    buffer_zero_avx2 },
 #endif
         { CPUINFO_SSE2,    buffer_zero_sse2 },
-        { CPUINFO_ALWAYS,  buffer_is_zero_integer },
+        { CPUINFO_ALWAYS,  buffer_is_zero_int_ge256 },
     };
 
     for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) {
@@ -211,7 +228,7 @@ bool test_buffer_is_zero_next_accel(void)
     return false;
 }
 
-#define INIT_ACCEL buffer_is_zero_integer
+#define INIT_ACCEL buffer_is_zero_int_ge256
 #endif
 
 static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL;
@@ -232,7 +249,7 @@ bool buffer_is_zero_ool(const void *buf, size_t len)
     if (likely(len >= 256)) {
         return buffer_is_zero_accel(buf, len);
     }
-    return buffer_is_zero_integer(buf, len);
+    return buffer_is_zero_int_lt256(buf, len);
 }
 
 bool buffer_is_zero_ge256(const void *buf, size_t len)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 34/35] util/bufferiszero: Introduce biz_accel_fn typedef
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (32 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 33/35] util/bufferiszero: Improve scalar variant Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-08 17:49 ` [PULL 35/35] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Richard Henderson
  2024-04-09  8:50 ` [PULL 00/35] misc patch queue Peter Maydell
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 util/bufferiszero.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index c9a7ded016..eb8030a3f0 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -26,7 +26,8 @@
 #include "qemu/bswap.h"
 #include "host/cpuinfo.h"
 
-static bool (*buffer_is_zero_accel)(const void *, size_t);
+typedef bool (*biz_accel_fn)(const void *, size_t);
+static biz_accel_fn buffer_is_zero_accel;
 
 static bool buffer_is_zero_int_lt256(const void *buf, size_t len)
 {
@@ -178,13 +179,15 @@ buffer_zero_avx2(const void *buf, size_t len)
 }
 #endif /* CONFIG_AVX2_OPT */
 
+
+
 static unsigned __attribute__((noinline))
 select_accel_cpuinfo(unsigned info)
 {
     /* Array is sorted in order of algorithm preference. */
     static const struct {
         unsigned bit;
-        bool (*fn)(const void *, size_t);
+        biz_accel_fn fn;
     } all[] = {
 #ifdef CONFIG_AVX2_OPT
         { CPUINFO_AVX2,    buffer_zero_avx2 },
@@ -231,7 +234,7 @@ bool test_buffer_is_zero_next_accel(void)
 #define INIT_ACCEL buffer_is_zero_int_ge256
 #endif
 
-static bool (*buffer_is_zero_accel)(const void *, size_t) = INIT_ACCEL;
+static biz_accel_fn buffer_is_zero_accel = INIT_ACCEL;
 
 bool buffer_is_zero_ool(const void *buf, size_t len)
 {
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* [PULL 35/35] util/bufferiszero: Simplify test_buffer_is_zero_next_accel
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (33 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 34/35] util/bufferiszero: Introduce biz_accel_fn typedef Richard Henderson
@ 2024-04-08 17:49 ` Richard Henderson
  2024-04-09  8:50 ` [PULL 00/35] misc patch queue Peter Maydell
  35 siblings, 0 replies; 38+ messages in thread
From: Richard Henderson @ 2024-04-08 17:49 UTC (permalink / raw)
  To: qemu-devel

Because the three alternatives are monotonic, we don't
need to keep a couple of bitmasks, just identify the
strongest alternative at startup.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
---
 util/bufferiszero.c | 56 ++++++++++++++++++---------------------------
 1 file changed, 22 insertions(+), 34 deletions(-)

diff --git a/util/bufferiszero.c b/util/bufferiszero.c
index eb8030a3f0..ff003dc40e 100644
--- a/util/bufferiszero.c
+++ b/util/bufferiszero.c
@@ -179,51 +179,39 @@ buffer_zero_avx2(const void *buf, size_t len)
 }
 #endif /* CONFIG_AVX2_OPT */
 
-
-
-static unsigned __attribute__((noinline))
-select_accel_cpuinfo(unsigned info)
-{
-    /* Array is sorted in order of algorithm preference. */
-    static const struct {
-        unsigned bit;
-        biz_accel_fn fn;
-    } all[] = {
+static biz_accel_fn const accel_table[] = {
+    buffer_is_zero_int_ge256,
+    buffer_zero_sse2,
 #ifdef CONFIG_AVX2_OPT
-        { CPUINFO_AVX2,    buffer_zero_avx2 },
+    buffer_zero_avx2,
 #endif
-        { CPUINFO_SSE2,    buffer_zero_sse2 },
-        { CPUINFO_ALWAYS,  buffer_is_zero_int_ge256 },
-    };
-
-    for (unsigned i = 0; i < ARRAY_SIZE(all); ++i) {
-        if (info & all[i].bit) {
-            buffer_is_zero_accel = all[i].fn;
-            return all[i].bit;
-        }
-    }
-    return 0;
-}
-
-static unsigned used_accel;
+};
+static unsigned accel_index;
 
 static void __attribute__((constructor)) init_accel(void)
 {
-    used_accel = select_accel_cpuinfo(cpuinfo_init());
+    unsigned info = cpuinfo_init();
+    unsigned index = (info & CPUINFO_SSE2 ? 1 : 0);
+
+#ifdef CONFIG_AVX2_OPT
+    if (info & CPUINFO_AVX2) {
+        index = 2;
+    }
+#endif
+
+    accel_index = index;
+    buffer_is_zero_accel = accel_table[index];
 }
 
 #define INIT_ACCEL NULL
 
 bool test_buffer_is_zero_next_accel(void)
 {
-    /*
-     * Accumulate the accelerators that we've already tested, and
-     * remove them from the set to test this round.  We'll get back
-     * a zero from select_accel_cpuinfo when there are no more.
-     */
-    unsigned used = select_accel_cpuinfo(cpuinfo & ~used_accel);
-    used_accel |= used;
-    return used;
+    if (accel_index != 0) {
+        buffer_is_zero_accel = accel_table[--accel_index];
+        return true;
+    }
+    return false;
 }
 #else
 bool test_buffer_is_zero_next_accel(void)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 38+ messages in thread

* Re: [PULL 00/35] misc patch queue
  2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
                   ` (34 preceding siblings ...)
  2024-04-08 17:49 ` [PULL 35/35] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Richard Henderson
@ 2024-04-09  8:50 ` Peter Maydell
  2024-04-09  9:53   ` Philippe Mathieu-Daudé
  35 siblings, 1 reply; 38+ messages in thread
From: Peter Maydell @ 2024-04-09  8:50 UTC (permalink / raw)
  To: Richard Henderson; +Cc: qemu-devel

On Mon, 8 Apr 2024 at 18:51, Richard Henderson
<richard.henderson@linaro.org> wrote:
>
> This started out to be tcg and linux-user only, but then added
> a few target bug fixes, and the trolled back through my inbox
> and picked up some other safe patch sets that got lost.
>
>
> r~
>
>
> The following changes since commit ce64e6224affb8b4e4b019f76d2950270b391af5:
>
>   Merge tag 'qemu-sparc-20240404' of https://github.com/mcayland/qemu into staging (2024-04-04 15:28:06 +0100)
>
> are available in the Git repository at:
>
>   https://gitlab.com/rth7680/qemu.git tags/pull-misc-20240408
>
> for you to fetch changes up to 50dbeda88ab71f9d426b7f4b126c79c44860e475:
>
>   util/bufferiszero: Simplify test_buffer_is_zero_next_accel (2024-04-08 06:27:58 -1000)
>
> ----------------------------------------------------------------
> util/bufferiszero: Optimizations and cleanups, esp code removal
> target/m68k: Semihosting for non-coldfire cpus
> target/m68k: Fix fp accrued exception reporting
> target/hppa: Fix IIAOQ, IIASQ for pa2.0
> target/sh4: Fixes to mac.l and mac.w saturation
> target/sh4: Fixes to illegal delay slot reporting
> linux-user: Cleanups for do_setsockopt
> linux-user: Add FITRIM ioctl
> linux-user: Fix waitid return of siginfo_t and rusage
> tcg/optimize: Do not attempt to constant fold neg_vec
> accel/tcg: Improve can_do_io management, mmio bug fix

This pullreq alone is more patches than have been committed
in the entirety of the rc2-to-rc3 cycle so far. Are they
really all important enough to put into rc3 on the day
we tag it?

thanks
-- PMM


^ permalink raw reply	[flat|nested] 38+ messages in thread

* Re: [PULL 00/35] misc patch queue
  2024-04-09  8:50 ` [PULL 00/35] misc patch queue Peter Maydell
@ 2024-04-09  9:53   ` Philippe Mathieu-Daudé
  0 siblings, 0 replies; 38+ messages in thread
From: Philippe Mathieu-Daudé @ 2024-04-09  9:53 UTC (permalink / raw)
  To: Peter Maydell, Richard Henderson; +Cc: qemu-devel

On 9/4/24 10:50, Peter Maydell wrote:
> On Mon, 8 Apr 2024 at 18:51, Richard Henderson
> <richard.henderson@linaro.org> wrote:
>>
>> This started out to be tcg and linux-user only, but then added
>> a few target bug fixes, and the trolled back through my inbox
>> and picked up some other safe patch sets that got lost.
>>
>>
>> r~
>>
>>
>> The following changes since commit ce64e6224affb8b4e4b019f76d2950270b391af5:
>>
>>    Merge tag 'qemu-sparc-20240404' of https://github.com/mcayland/qemu into staging (2024-04-04 15:28:06 +0100)
>>
>> are available in the Git repository at:
>>
>>    https://gitlab.com/rth7680/qemu.git tags/pull-misc-20240408
>>
>> for you to fetch changes up to 50dbeda88ab71f9d426b7f4b126c79c44860e475:
>>
>>    util/bufferiszero: Simplify test_buffer_is_zero_next_accel (2024-04-08 06:27:58 -1000)
>>
>> ----------------------------------------------------------------

> This pullreq alone is more patches than have been committed
> in the entirety of the rc2-to-rc3 cycle so far. Are they
> really all important enough to put into rc3 on the day
> we tag it?

My 2 cents, I'd keep these for 9.0:

 >> target/m68k: Fix fp accrued exception reporting
 >> target/hppa: Fix IIAOQ, IIASQ for pa2.0
 >> target/sh4: Fixes to mac.l and mac.w saturation
 >> target/sh4: Fixes to illegal delay slot reporting
 >> linux-user: Fix waitid return of siginfo_t and rusage
 >> tcg/optimize: Do not attempt to constant fold neg_vec
 >> accel/tcg: Improve can_do_io management, mmio bug fix

Postponing these for 9.1:

 >> util/bufferiszero: Optimizations and cleanups, esp code removal
 >> target/m68k: Semihosting for non-coldfire cpus
 >> linux-user: Cleanups for do_setsockopt
 >> linux-user: Add FITRIM ioctl

> thanks
> -- PMM
> 



^ permalink raw reply	[flat|nested] 38+ messages in thread

end of thread, other threads:[~2024-04-09  9:54 UTC | newest]

Thread overview: 38+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-04-08 17:48 [PULL 00/35] misc patch queue Richard Henderson
2024-04-08 17:48 ` [PULL 01/35] tcg/optimize: Do not attempt to constant fold neg_vec Richard Henderson
2024-04-08 17:48 ` [PULL 02/35] linux-user: Fix waitid return of siginfo_t and rusage Richard Henderson
2024-04-08 17:48 ` [PULL 03/35] linux-user: do_setsockopt: fix SOL_ALG.ALG_SET_KEY Richard Henderson
2024-04-08 17:48 ` [PULL 04/35] linux-user: do_setsockopt: make ip_mreq local to the place it is used and inline target_to_host_ip_mreq() Richard Henderson
2024-04-08 17:48 ` [PULL 05/35] linux-user: do_setsockopt: make ip_mreq_source local to the place where it is used Richard Henderson
2024-04-08 17:49 ` [PULL 06/35] linux-user: do_setsockopt: eliminate goto in switch for SO_SNDTIMEO Richard Henderson
2024-04-08 17:49 ` [PULL 07/35] linux-user: Add FITRIM ioctl Richard Henderson
2024-04-08 17:49 ` [PULL 08/35] linux-user: replace calloc() with g_new0() Richard Henderson
2024-04-08 17:49 ` [PULL 09/35] target/hppa: Fix IIAOQ, IIASQ for pa2.0 Richard Henderson
2024-04-08 17:49 ` [PULL 10/35] target/sh4: mac.w: memory accesses are 16-bit words Richard Henderson
2024-04-08 17:49 ` [PULL 11/35] target/sh4: Merge mach and macl into a union Richard Henderson
2024-04-08 17:49 ` [PULL 12/35] target/sh4: Fix mac.l with saturation enabled Richard Henderson
2024-04-08 17:49 ` [PULL 13/35] target/sh4: Fix mac.w " Richard Henderson
2024-04-08 17:49 ` [PULL 14/35] target/sh4: add missing CHECK_NOT_DELAY_SLOT Richard Henderson
2024-04-08 17:49 ` [PULL 15/35] target/m68k: Map FPU exceptions to FPSR register Richard Henderson
2024-04-08 17:49 ` [PULL 16/35] target/m68k: Pass semihosting arg to exit Richard Henderson
2024-04-08 17:49 ` [PULL 17/35] target/m68k: Perform the semihosting test during translate Richard Henderson
2024-04-08 17:49 ` [PULL 18/35] target/m68k: Support semihosting on non-ColdFire targets Richard Henderson
2024-04-08 17:49 ` [PULL 19/35] tcg: Add TCGContext.emit_before_op Richard Henderson
2024-04-08 17:49 ` [PULL 20/35] accel/tcg: Add insn_start to DisasContextBase Richard Henderson
2024-04-08 17:49 ` [PULL 21/35] target/arm: Use insn_start from DisasContextBase Richard Henderson
2024-04-08 17:49 ` [PULL 22/35] target/hppa: " Richard Henderson
2024-04-08 17:49 ` [PULL 23/35] target/i386: Preserve DisasContextBase.insn_start across rewind Richard Henderson
2024-04-08 17:49 ` [PULL 24/35] target/microblaze: Use insn_start from DisasContextBase Richard Henderson
2024-04-08 17:49 ` [PULL 25/35] target/riscv: " Richard Henderson
2024-04-08 17:49 ` [PULL 26/35] target/s390x: " Richard Henderson
2024-04-08 17:49 ` [PULL 27/35] accel/tcg: Improve can_do_io management Richard Henderson
2024-04-08 17:49 ` [PULL 28/35] util/bufferiszero: Remove SSE4.1 variant Richard Henderson
2024-04-08 17:49 ` [PULL 29/35] util/bufferiszero: Remove AVX512 variant Richard Henderson
2024-04-08 17:49 ` [PULL 30/35] util/bufferiszero: Reorganize for early test for acceleration Richard Henderson
2024-04-08 17:49 ` [PULL 31/35] util/bufferiszero: Remove useless prefetches Richard Henderson
2024-04-08 17:49 ` [PULL 32/35] util/bufferiszero: Optimize SSE2 and AVX2 variants Richard Henderson
2024-04-08 17:49 ` [PULL 33/35] util/bufferiszero: Improve scalar variant Richard Henderson
2024-04-08 17:49 ` [PULL 34/35] util/bufferiszero: Introduce biz_accel_fn typedef Richard Henderson
2024-04-08 17:49 ` [PULL 35/35] util/bufferiszero: Simplify test_buffer_is_zero_next_accel Richard Henderson
2024-04-09  8:50 ` [PULL 00/35] misc patch queue Peter Maydell
2024-04-09  9:53   ` Philippe Mathieu-Daudé

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).